Matches in SemOpenAlex for { <https://semopenalex.org/work/W3093348818> ?p ?o ?g. }
- W3093348818 abstract "A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance. Although the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm mitigates the overestimation issue, it can lead to a large underestimation bias. In this paper, we propose to use the Boltzmann softmax operator for value function estimation in continuous control. We first theoretically analyze the softmax operator in continuous action space. Then, we uncover an important property of the softmax operator in actor-critic algorithms, i.e., it helps to smooth the optimization landscape, which sheds new light on the benefits of the operator. We also design two new algorithms, Softmax Deep Deterministic Policy Gradients (SD2) and Softmax Deep Double Deterministic Policy Gradients (SD3), by building the softmax operator upon single and double estimators, which can effectively improve the overestimation and underestimation bias. We conduct extensive experiments on challenging continuous control tasks, and results show that SD3 outperforms state-of-the-art methods." @default.
- W3093348818 created "2020-10-22" @default.
- W3093348818 creator A5016589992 @default.
- W3093348818 creator A5082905458 @default.
- W3093348818 creator A5090888372 @default.
- W3093348818 date "2020-10-19" @default.
- W3093348818 modified "2023-09-24" @default.
- W3093348818 title "Softmax Deep Double Deterministic Policy Gradients" @default.
- W3093348818 cites W1626977535 @default.
- W3093348818 cites W1850240193 @default.
- W3093348818 cites W2121863487 @default.
- W3093348818 cites W2145339207 @default.
- W3093348818 cites W2146989110 @default.
- W3093348818 cites W2155968351 @default.
- W3093348818 cites W2158782408 @default.
- W3093348818 cites W2165131254 @default.
- W3093348818 cites W2165150801 @default.
- W3093348818 cites W2173248099 @default.
- W3093348818 cites W2257979135 @default.
- W3093348818 cites W2596758708 @default.
- W3093348818 cites W2616944917 @default.
- W3093348818 cites W2781726626 @default.
- W3093348818 cites W2786928559 @default.
- W3093348818 cites W2787938642 @default.
- W3093348818 cites W2789824229 @default.
- W3093348818 cites W2791924686 @default.
- W3093348818 cites W2798705390 @default.
- W3093348818 cites W2949561945 @default.
- W3093348818 cites W2949608212 @default.
- W3093348818 cites W2950471160 @default.
- W3093348818 cites W2951589367 @default.
- W3093348818 cites W2951799422 @default.
- W3093348818 cites W2962821147 @default.
- W3093348818 cites W2963169817 @default.
- W3093348818 cites W2964337555 @default.
- W3093348818 cites W2970961171 @default.
- W3093348818 cites W3006670279 @default.
- W3093348818 cites W3011120880 @default.
- W3093348818 cites W3035064526 @default.
- W3093348818 cites W51508254 @default.
- W3093348818 cites W3089091950 @default.
- W3093348818 hasPublicationYear "2020" @default.
- W3093348818 type Work @default.
- W3093348818 sameAs 3093348818 @default.
- W3093348818 citedByCount "2" @default.
- W3093348818 countsByYear W30933488182021 @default.
- W3093348818 countsByYear W30933488182022 @default.
- W3093348818 crossrefType "posted-content" @default.
- W3093348818 hasAuthorship W3093348818A5016589992 @default.
- W3093348818 hasAuthorship W3093348818A5082905458 @default.
- W3093348818 hasAuthorship W3093348818A5090888372 @default.
- W3093348818 hasConcept C104317684 @default.
- W3093348818 hasConcept C105795698 @default.
- W3093348818 hasConcept C108583219 @default.
- W3093348818 hasConcept C11413529 @default.
- W3093348818 hasConcept C126255220 @default.
- W3093348818 hasConcept C154945302 @default.
- W3093348818 hasConcept C158448853 @default.
- W3093348818 hasConcept C17020691 @default.
- W3093348818 hasConcept C185429906 @default.
- W3093348818 hasConcept C185592680 @default.
- W3093348818 hasConcept C188441871 @default.
- W3093348818 hasConcept C33923547 @default.
- W3093348818 hasConcept C41008148 @default.
- W3093348818 hasConcept C55493867 @default.
- W3093348818 hasConcept C86339819 @default.
- W3093348818 hasConcept C97541855 @default.
- W3093348818 hasConceptScore W3093348818C104317684 @default.
- W3093348818 hasConceptScore W3093348818C105795698 @default.
- W3093348818 hasConceptScore W3093348818C108583219 @default.
- W3093348818 hasConceptScore W3093348818C11413529 @default.
- W3093348818 hasConceptScore W3093348818C126255220 @default.
- W3093348818 hasConceptScore W3093348818C154945302 @default.
- W3093348818 hasConceptScore W3093348818C158448853 @default.
- W3093348818 hasConceptScore W3093348818C17020691 @default.
- W3093348818 hasConceptScore W3093348818C185429906 @default.
- W3093348818 hasConceptScore W3093348818C185592680 @default.
- W3093348818 hasConceptScore W3093348818C188441871 @default.
- W3093348818 hasConceptScore W3093348818C33923547 @default.
- W3093348818 hasConceptScore W3093348818C41008148 @default.
- W3093348818 hasConceptScore W3093348818C55493867 @default.
- W3093348818 hasConceptScore W3093348818C86339819 @default.
- W3093348818 hasConceptScore W3093348818C97541855 @default.
- W3093348818 hasLocation W30933488181 @default.
- W3093348818 hasOpenAccess W3093348818 @default.
- W3093348818 hasPrimaryLocation W30933488181 @default.
- W3093348818 hasRelatedWork W1550698229 @default.
- W3093348818 hasRelatedWork W2618911538 @default.
- W3093348818 hasRelatedWork W2773456610 @default.
- W3093348818 hasRelatedWork W2786303200 @default.
- W3093348818 hasRelatedWork W2896073909 @default.
- W3093348818 hasRelatedWork W2921114252 @default.
- W3093348818 hasRelatedWork W2952573015 @default.
- W3093348818 hasRelatedWork W2963169817 @default.
- W3093348818 hasRelatedWork W2964337555 @default.
- W3093348818 hasRelatedWork W3000642679 @default.
- W3093348818 hasRelatedWork W3012857547 @default.
- W3093348818 hasRelatedWork W3036162308 @default.
- W3093348818 hasRelatedWork W3037812429 @default.
- W3093348818 hasRelatedWork W3104180471 @default.