Matches in SemOpenAlex for { <https://semopenalex.org/work/W3126217842> ?p ?o ?g. }
- W3126217842 abstract "In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control. This comes after a series of breakthroughs to address function approximation errors, which previously led to poor performance. These insights encourage the use of pessimistic value updates. However, this discourages exploration and runs counter to theoretical support for the efficacy of optimism in the face of uncertainty. So which approach is best? In this work, we show that the optimal degree of optimism can vary both across tasks and over the course of learning. Inspired by this insight, we introduce a novel deep actor-critic algorithm, Dynamic Optimistic and Pessimistic Estimation (DOPE) to switch between optimistic and pessimistic value learning online by formulating the selection as a multi-arm bandit problem. We show in a series of challenging continuous control tasks that DOPE outperforms existing state-of-the-art methods, which rely on a fixed degree of optimism. Since our changes are simple to implement, we believe these insights can be extended to a number of off-policy algorithms." @default.
- W3126217842 created "2021-02-15" @default.
- W3126217842 creator A5006761546 @default.
- W3126217842 creator A5034132308 @default.
- W3126217842 creator A5054854070 @default.
- W3126217842 creator A5083828420 @default.
- W3126217842 date "2021-02-07" @default.
- W3126217842 modified "2023-09-27" @default.
- W3126217842 title "Deep Reinforcement Learning with Dynamic Optimism." @default.
- W3126217842 cites W1505937442 @default.
- W3126217842 cites W1570963478 @default.
- W3126217842 cites W1583155004 @default.
- W3126217842 cites W1625390266 @default.
- W3126217842 cites W1850488217 @default.
- W3126217842 cites W2115211925 @default.
- W3126217842 cites W2121863487 @default.
- W3126217842 cites W2141559645 @default.
- W3126217842 cites W2145339207 @default.
- W3126217842 cites W2173248099 @default.
- W3126217842 cites W2257979135 @default.
- W3126217842 cites W2787938642 @default.
- W3126217842 cites W2963423916 @default.
- W3126217842 cites W2963757175 @default.
- W3126217842 cites W2963767098 @default.
- W3126217842 cites W2963938771 @default.
- W3126217842 cites W2964054583 @default.
- W3126217842 cites W2964163363 @default.
- W3126217842 cites W2970190219 @default.
- W3126217842 cites W2970961171 @default.
- W3126217842 cites W2995102855 @default.
- W3126217842 cites W3004082694 @default.
- W3126217842 cites W3013618273 @default.
- W3126217842 cites W3034973310 @default.
- W3126217842 cites W3035273634 @default.
- W3126217842 cites W3035403520 @default.
- W3126217842 cites W3035880215 @default.
- W3126217842 cites W3043013488 @default.
- W3126217842 cites W3046395471 @default.
- W3126217842 cites W3118955349 @default.
- W3126217842 cites W3125616589 @default.
- W3126217842 cites W51508254 @default.
- W3126217842 cites W2770298516 @default.
- W3126217842 cites W2951873965 @default.
- W3126217842 hasPublicationYear "2021" @default.
- W3126217842 type Work @default.
- W3126217842 sameAs 3126217842 @default.
- W3126217842 citedByCount "1" @default.
- W3126217842 countsByYear W31262178422021 @default.
- W3126217842 crossrefType "posted-content" @default.
- W3126217842 hasAuthorship W3126217842A5006761546 @default.
- W3126217842 hasAuthorship W3126217842A5034132308 @default.
- W3126217842 hasAuthorship W3126217842A5054854070 @default.
- W3126217842 hasAuthorship W3126217842A5083828420 @default.
- W3126217842 hasConcept C111472728 @default.
- W3126217842 hasConcept C119857082 @default.
- W3126217842 hasConcept C126255220 @default.
- W3126217842 hasConcept C138885662 @default.
- W3126217842 hasConcept C14036430 @default.
- W3126217842 hasConcept C144024400 @default.
- W3126217842 hasConcept C14646407 @default.
- W3126217842 hasConcept C154945302 @default.
- W3126217842 hasConcept C15744967 @default.
- W3126217842 hasConcept C196340769 @default.
- W3126217842 hasConcept C204017024 @default.
- W3126217842 hasConcept C2775924081 @default.
- W3126217842 hasConcept C2776291640 @default.
- W3126217842 hasConcept C2779304628 @default.
- W3126217842 hasConcept C33923547 @default.
- W3126217842 hasConcept C36289849 @default.
- W3126217842 hasConcept C41008148 @default.
- W3126217842 hasConcept C77805123 @default.
- W3126217842 hasConcept C78458016 @default.
- W3126217842 hasConcept C86803240 @default.
- W3126217842 hasConcept C97541855 @default.
- W3126217842 hasConcept C9992130 @default.
- W3126217842 hasConceptScore W3126217842C111472728 @default.
- W3126217842 hasConceptScore W3126217842C119857082 @default.
- W3126217842 hasConceptScore W3126217842C126255220 @default.
- W3126217842 hasConceptScore W3126217842C138885662 @default.
- W3126217842 hasConceptScore W3126217842C14036430 @default.
- W3126217842 hasConceptScore W3126217842C144024400 @default.
- W3126217842 hasConceptScore W3126217842C14646407 @default.
- W3126217842 hasConceptScore W3126217842C154945302 @default.
- W3126217842 hasConceptScore W3126217842C15744967 @default.
- W3126217842 hasConceptScore W3126217842C196340769 @default.
- W3126217842 hasConceptScore W3126217842C204017024 @default.
- W3126217842 hasConceptScore W3126217842C2775924081 @default.
- W3126217842 hasConceptScore W3126217842C2776291640 @default.
- W3126217842 hasConceptScore W3126217842C2779304628 @default.
- W3126217842 hasConceptScore W3126217842C33923547 @default.
- W3126217842 hasConceptScore W3126217842C36289849 @default.
- W3126217842 hasConceptScore W3126217842C41008148 @default.
- W3126217842 hasConceptScore W3126217842C77805123 @default.
- W3126217842 hasConceptScore W3126217842C78458016 @default.
- W3126217842 hasConceptScore W3126217842C86803240 @default.
- W3126217842 hasConceptScore W3126217842C97541855 @default.
- W3126217842 hasConceptScore W3126217842C9992130 @default.
- W3126217842 hasLocation W31262178421 @default.
- W3126217842 hasOpenAccess W3126217842 @default.
- W3126217842 hasPrimaryLocation W31262178421 @default.