Matches in SemOpenAlex for { <https://semopenalex.org/work/W3123364362> ?p ?o ?g. }
Showing items 1 to 91 of
91
with 100 items per page.
- W3123364362 endingPage "1131" @default.
- W3123364362 startingPage "1131" @default.
- W3123364362 abstract "Autonomous learning of robotic skills seems to be more natural and more practical than engineered skills, analogous to the learning process of human individuals. Policy gradient methods are a type of reinforcement learning technique which have great potential in solving robot skills learning problems. However, policy gradient methods require too many instances of robot online interaction with the environment in order to learn a good policy, which means lower efficiency of the learning process and a higher likelihood of damage to both the robot and the environment. In this paper, we propose a two-phase (imitation phase and practice phase) framework for efficient learning of robot walking skills, in which we pay more attention to the quality of skill learning and sample efficiency at the same time. The training starts with what we call the first stage or the imitation phase of learning, updating the parameters of the policy network in a supervised learning manner. The training set used in the policy network learning is composed of the experienced trajectories output by the iterative linear Gaussian controller. This paper also refers to these trajectories as near-optimal experiences. In the second stage, or the practice phase, the experiences for policy network learning are collected directly from online interactions, and the policy network parameters are updated with model-free reinforcement learning. The experiences from both stages are stored in the weighted replay buffer, and they are arranged in order according to the experience scoring algorithm proposed in this paper. The proposed framework is tested on a biped robot walking task in a MATLAB simulation environment. The results show that the sample efficiency of the proposed framework is much higher than ordinary policy gradient algorithms. The algorithm proposed in this paper achieved the highest cumulative reward, and the robot learned better walking skills autonomously. In addition, the weighted replay buffer method can be made as a general module for other model-free reinforcement learning algorithms. Our framework provides a new way to combine model-based reinforcement learning with model-free reinforcement learning to efficiently update the policy network parameters in the process of robot skills learning." @default.
- W3123364362 created "2021-02-01" @default.
- W3123364362 creator A5002792306 @default.
- W3123364362 creator A5013134577 @default.
- W3123364362 creator A5041330231 @default.
- W3123364362 creator A5045236703 @default.
- W3123364362 date "2021-01-26" @default.
- W3123364362 modified "2023-09-25" @default.
- W3123364362 title "Efficient Robot Skills Learning with Weighted Near-Optimal Experiences Policy Optimization" @default.
- W3123364362 cites W1945123189 @default.
- W3123364362 cites W1966936262 @default.
- W3123364362 cites W1975230295 @default.
- W3123364362 cites W1977655452 @default.
- W3123364362 cites W2020871213 @default.
- W3123364362 cites W2025915818 @default.
- W3123364362 cites W2041176007 @default.
- W3123364362 cites W2041376653 @default.
- W3123364362 cites W2052578957 @default.
- W3123364362 cites W2166302491 @default.
- W3123364362 cites W2739330054 @default.
- W3123364362 cites W2901112449 @default.
- W3123364362 cites W3008492644 @default.
- W3123364362 cites W3017212538 @default.
- W3123364362 doi "https://doi.org/10.3390/app11031131" @default.
- W3123364362 hasPublicationYear "2021" @default.
- W3123364362 type Work @default.
- W3123364362 sameAs 3123364362 @default.
- W3123364362 citedByCount "1" @default.
- W3123364362 countsByYear W31233643622022 @default.
- W3123364362 crossrefType "journal-article" @default.
- W3123364362 hasAuthorship W3123364362A5002792306 @default.
- W3123364362 hasAuthorship W3123364362A5013134577 @default.
- W3123364362 hasAuthorship W3123364362A5041330231 @default.
- W3123364362 hasAuthorship W3123364362A5045236703 @default.
- W3123364362 hasBestOaLocation W31233643621 @default.
- W3123364362 hasConcept C111919701 @default.
- W3123364362 hasConcept C117619785 @default.
- W3123364362 hasConcept C119857082 @default.
- W3123364362 hasConcept C126388530 @default.
- W3123364362 hasConcept C154945302 @default.
- W3123364362 hasConcept C15744967 @default.
- W3123364362 hasConcept C177264268 @default.
- W3123364362 hasConcept C188116033 @default.
- W3123364362 hasConcept C188888258 @default.
- W3123364362 hasConcept C199360897 @default.
- W3123364362 hasConcept C19966478 @default.
- W3123364362 hasConcept C2775924081 @default.
- W3123364362 hasConcept C41008148 @default.
- W3123364362 hasConcept C77805123 @default.
- W3123364362 hasConcept C90509273 @default.
- W3123364362 hasConcept C97541855 @default.
- W3123364362 hasConcept C98045186 @default.
- W3123364362 hasConceptScore W3123364362C111919701 @default.
- W3123364362 hasConceptScore W3123364362C117619785 @default.
- W3123364362 hasConceptScore W3123364362C119857082 @default.
- W3123364362 hasConceptScore W3123364362C126388530 @default.
- W3123364362 hasConceptScore W3123364362C154945302 @default.
- W3123364362 hasConceptScore W3123364362C15744967 @default.
- W3123364362 hasConceptScore W3123364362C177264268 @default.
- W3123364362 hasConceptScore W3123364362C188116033 @default.
- W3123364362 hasConceptScore W3123364362C188888258 @default.
- W3123364362 hasConceptScore W3123364362C199360897 @default.
- W3123364362 hasConceptScore W3123364362C19966478 @default.
- W3123364362 hasConceptScore W3123364362C2775924081 @default.
- W3123364362 hasConceptScore W3123364362C41008148 @default.
- W3123364362 hasConceptScore W3123364362C77805123 @default.
- W3123364362 hasConceptScore W3123364362C90509273 @default.
- W3123364362 hasConceptScore W3123364362C97541855 @default.
- W3123364362 hasConceptScore W3123364362C98045186 @default.
- W3123364362 hasIssue "3" @default.
- W3123364362 hasLocation W31233643621 @default.
- W3123364362 hasLocation W31233643622 @default.
- W3123364362 hasOpenAccess W3123364362 @default.
- W3123364362 hasPrimaryLocation W31233643621 @default.
- W3123364362 hasRelatedWork W1534851618 @default.
- W3123364362 hasRelatedWork W1542094515 @default.
- W3123364362 hasRelatedWork W2134151045 @default.
- W3123364362 hasRelatedWork W2383495909 @default.
- W3123364362 hasRelatedWork W3022038857 @default.
- W3123364362 hasRelatedWork W3102644508 @default.
- W3123364362 hasRelatedWork W3124696433 @default.
- W3123364362 hasRelatedWork W3167597551 @default.
- W3123364362 hasRelatedWork W4287822492 @default.
- W3123364362 hasRelatedWork W4309675293 @default.
- W3123364362 hasVolume "11" @default.
- W3123364362 isParatext "false" @default.
- W3123364362 isRetracted "false" @default.
- W3123364362 magId "3123364362" @default.
- W3123364362 workType "article" @default.