Matches in SemOpenAlex for { <https://semopenalex.org/work/W4383109304> ?p ?o ?g. }
Showing items 1 to 84 of
84
with 100 items per page.
- W4383109304 abstract "In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems. Sparse rewards are common in continuous control robotics tasks such as manipulation and navigation and make the learning problem hard due to the non-trivial estimation of value functions over the state space. This demands either reward shaping or expert demonstrations for the sparse reward environment. However, obtaining high-quality demonstrations is quite expensive and sometimes even impossible. We propose a heavy-tailed policy parametrization along with a modified momentum-based policy gradient tracking scheme (HT-SPG) to induce a stable exploratory behavior in the algorithm. The proposed algorithm does not require access to expert demonstrations. We test the performance of HT-SPG on various benchmark tasks of continuous control with sparse rewards such as 1D Mario, Pathological Mountain Car, Sparse Pendulum in OpenAI Gym, and Sparse MuJoCo environments (Hopper-v2, Half-Cheetah, Walker-2D). We show consistent performance improvement across all tasks in terms of high average cumulative reward without requiring access to expert demonstrations. We further demonstrate that a navigation policy trained using HT-SPG can be easily transferred into a Clearpath Husky robot to perform real-world navigation tasks." @default.
- W4383109304 created "2023-07-05" @default.
- W4383109304 creator A5004194238 @default.
- W4383109304 creator A5025896653 @default.
- W4383109304 creator A5039563144 @default.
- W4383109304 creator A5052013511 @default.
- W4383109304 creator A5080472165 @default.
- W4383109304 creator A5083140373 @default.
- W4383109304 creator A5086188394 @default.
- W4383109304 date "2023-05-29" @default.
- W4383109304 modified "2023-09-28" @default.
- W4383109304 title "Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policy Optimization" @default.
- W4383109304 cites W1548439805 @default.
- W4383109304 cites W2198041288 @default.
- W4383109304 cites W2604382266 @default.
- W4383109304 cites W2788862220 @default.
- W4383109304 cites W2938421504 @default.
- W4383109304 cites W2963099939 @default.
- W4383109304 cites W2963523627 @default.
- W4383109304 cites W2964319688 @default.
- W4383109304 cites W2999862950 @default.
- W4383109304 cites W3103182070 @default.
- W4383109304 cites W3103362336 @default.
- W4383109304 cites W3109546547 @default.
- W4383109304 cites W3122232370 @default.
- W4383109304 cites W3130717831 @default.
- W4383109304 cites W3131482536 @default.
- W4383109304 cites W3158799570 @default.
- W4383109304 cites W3204691825 @default.
- W4383109304 cites W4249775755 @default.
- W4383109304 doi "https://doi.org/10.1109/icra48891.2023.10161186" @default.
- W4383109304 hasPublicationYear "2023" @default.
- W4383109304 type Work @default.
- W4383109304 citedByCount "0" @default.
- W4383109304 crossrefType "proceedings-article" @default.
- W4383109304 hasAuthorship W4383109304A5004194238 @default.
- W4383109304 hasAuthorship W4383109304A5025896653 @default.
- W4383109304 hasAuthorship W4383109304A5039563144 @default.
- W4383109304 hasAuthorship W4383109304A5052013511 @default.
- W4383109304 hasAuthorship W4383109304A5080472165 @default.
- W4383109304 hasAuthorship W4383109304A5083140373 @default.
- W4383109304 hasAuthorship W4383109304A5086188394 @default.
- W4383109304 hasConcept C105795698 @default.
- W4383109304 hasConcept C119857082 @default.
- W4383109304 hasConcept C13280743 @default.
- W4383109304 hasConcept C154945302 @default.
- W4383109304 hasConcept C185798385 @default.
- W4383109304 hasConcept C205649164 @default.
- W4383109304 hasConcept C2775924081 @default.
- W4383109304 hasConcept C33923547 @default.
- W4383109304 hasConcept C34413123 @default.
- W4383109304 hasConcept C41008148 @default.
- W4383109304 hasConcept C72434380 @default.
- W4383109304 hasConcept C90509273 @default.
- W4383109304 hasConcept C97541855 @default.
- W4383109304 hasConceptScore W4383109304C105795698 @default.
- W4383109304 hasConceptScore W4383109304C119857082 @default.
- W4383109304 hasConceptScore W4383109304C13280743 @default.
- W4383109304 hasConceptScore W4383109304C154945302 @default.
- W4383109304 hasConceptScore W4383109304C185798385 @default.
- W4383109304 hasConceptScore W4383109304C205649164 @default.
- W4383109304 hasConceptScore W4383109304C2775924081 @default.
- W4383109304 hasConceptScore W4383109304C33923547 @default.
- W4383109304 hasConceptScore W4383109304C34413123 @default.
- W4383109304 hasConceptScore W4383109304C41008148 @default.
- W4383109304 hasConceptScore W4383109304C72434380 @default.
- W4383109304 hasConceptScore W4383109304C90509273 @default.
- W4383109304 hasConceptScore W4383109304C97541855 @default.
- W4383109304 hasLocation W43831093041 @default.
- W4383109304 hasOpenAccess W4383109304 @default.
- W4383109304 hasPrimaryLocation W43831093041 @default.
- W4383109304 hasRelatedWork W1974960581 @default.
- W4383109304 hasRelatedWork W2002867377 @default.
- W4383109304 hasRelatedWork W2065963568 @default.
- W4383109304 hasRelatedWork W2197326744 @default.
- W4383109304 hasRelatedWork W2540676782 @default.
- W4383109304 hasRelatedWork W2907045084 @default.
- W4383109304 hasRelatedWork W3022038857 @default.
- W4383109304 hasRelatedWork W3211352205 @default.
- W4383109304 hasRelatedWork W4306666666 @default.
- W4383109304 hasRelatedWork W4319083788 @default.
- W4383109304 isParatext "false" @default.
- W4383109304 isRetracted "false" @default.
- W4383109304 workType "article" @default.