Matches in SemOpenAlex for { <https://semopenalex.org/work/W3170872007> ?p ?o ?g. }
- W3170872007 endingPage "7663" @default.
- W3170872007 startingPage "7655" @default.
- W3170872007 abstract "How to obtain good value estimation is a critical problem in Reinforcement Learning (RL). Current value estimation methods in continuous control, such as DDPG and TD3, suffer from unnecessary over- or under- estimation. In this paper, we explore the potential of double actors, which has been neglected for a long time, for better value estimation in the continuous setting. First, we interestingly find that double actors improve the exploration ability of the agent. Next, we uncover the bias alleviation property of double actors in handling overestimation with single critic, and underestimation with double critics respectively. Finally, to mitigate the potentially pessimistic value estimate in double critics, we propose to regularize the critics under double actors architecture. Together, we present Double Actors Regularized Critics (DARC) algorithm. Extensive experiments on challenging continuous control benchmarks, MuJoCo and PyBullet, show that DARC significantly outperforms current baselines with higher average return and better sample efficiency." @default.
- W3170872007 created "2021-06-22" @default.
- W3170872007 creator A5013716631 @default.
- W3170872007 creator A5027912723 @default.
- W3170872007 creator A5067453635 @default.
- W3170872007 creator A5075993453 @default.
- W3170872007 date "2022-06-28" @default.
- W3170872007 modified "2023-09-30" @default.
- W3170872007 title "Efficient Continuous Control with Double Actors and Regularized Critics" @default.
- W3170872007 cites W1547105496 @default.
- W3170872007 cites W1646707810 @default.
- W3170872007 cites W192920577 @default.
- W3170872007 cites W1981025032 @default.
- W3170872007 cites W2080889016 @default.
- W3170872007 cites W2082261506 @default.
- W3170872007 cites W2121863487 @default.
- W3170872007 cites W2123845705 @default.
- W3170872007 cites W2136064843 @default.
- W3170872007 cites W2145339207 @default.
- W3170872007 cites W2155968351 @default.
- W3170872007 cites W2156737235 @default.
- W3170872007 cites W2158782408 @default.
- W3170872007 cites W2165150801 @default.
- W3170872007 cites W2173248099 @default.
- W3170872007 cites W2596758708 @default.
- W3170872007 cites W2740912559 @default.
- W3170872007 cites W2781726626 @default.
- W3170872007 cites W2786928559 @default.
- W3170872007 cites W2787938642 @default.
- W3170872007 cites W2789824229 @default.
- W3170872007 cites W2798705390 @default.
- W3170872007 cites W2904246096 @default.
- W3170872007 cites W2942034515 @default.
- W3170872007 cites W2946723315 @default.
- W3170872007 cites W2963423916 @default.
- W3170872007 cites W2970961171 @default.
- W3170872007 cites W2979605429 @default.
- W3170872007 cites W2980125442 @default.
- W3170872007 cites W2991355586 @default.
- W3170872007 cites W2993587506 @default.
- W3170872007 cites W3011120880 @default.
- W3170872007 cites W3020832238 @default.
- W3170872007 cites W3021619395 @default.
- W3170872007 cites W3034379033 @default.
- W3170872007 cites W3093348818 @default.
- W3170872007 cites W3111464135 @default.
- W3170872007 cites W51508254 @default.
- W3170872007 doi "https://doi.org/10.1609/aaai.v36i7.20732" @default.
- W3170872007 hasPublicationYear "2022" @default.
- W3170872007 type Work @default.
- W3170872007 sameAs 3170872007 @default.
- W3170872007 citedByCount "7" @default.
- W3170872007 countsByYear W31708720072021 @default.
- W3170872007 countsByYear W31708720072022 @default.
- W3170872007 countsByYear W31708720072023 @default.
- W3170872007 crossrefType "journal-article" @default.
- W3170872007 hasAuthorship W3170872007A5013716631 @default.
- W3170872007 hasAuthorship W3170872007A5027912723 @default.
- W3170872007 hasAuthorship W3170872007A5067453635 @default.
- W3170872007 hasAuthorship W3170872007A5075993453 @default.
- W3170872007 hasBestOaLocation W31708720071 @default.
- W3170872007 hasConcept C111472728 @default.
- W3170872007 hasConcept C119599485 @default.
- W3170872007 hasConcept C119857082 @default.
- W3170872007 hasConcept C126255220 @default.
- W3170872007 hasConcept C127413603 @default.
- W3170872007 hasConcept C138885662 @default.
- W3170872007 hasConcept C148043351 @default.
- W3170872007 hasConcept C154945302 @default.
- W3170872007 hasConcept C159985019 @default.
- W3170872007 hasConcept C162324750 @default.
- W3170872007 hasConcept C17744445 @default.
- W3170872007 hasConcept C187736073 @default.
- W3170872007 hasConcept C189950617 @default.
- W3170872007 hasConcept C192562407 @default.
- W3170872007 hasConcept C195094911 @default.
- W3170872007 hasConcept C199539241 @default.
- W3170872007 hasConcept C2775924081 @default.
- W3170872007 hasConcept C2776291640 @default.
- W3170872007 hasConcept C2778488155 @default.
- W3170872007 hasConcept C3018216069 @default.
- W3170872007 hasConcept C3019074581 @default.
- W3170872007 hasConcept C33923547 @default.
- W3170872007 hasConcept C41008148 @default.
- W3170872007 hasConcept C47446073 @default.
- W3170872007 hasConcept C96250715 @default.
- W3170872007 hasConcept C97541855 @default.
- W3170872007 hasConcept C9992130 @default.
- W3170872007 hasConceptScore W3170872007C111472728 @default.
- W3170872007 hasConceptScore W3170872007C119599485 @default.
- W3170872007 hasConceptScore W3170872007C119857082 @default.
- W3170872007 hasConceptScore W3170872007C126255220 @default.
- W3170872007 hasConceptScore W3170872007C127413603 @default.
- W3170872007 hasConceptScore W3170872007C138885662 @default.
- W3170872007 hasConceptScore W3170872007C148043351 @default.
- W3170872007 hasConceptScore W3170872007C154945302 @default.
- W3170872007 hasConceptScore W3170872007C159985019 @default.
- W3170872007 hasConceptScore W3170872007C162324750 @default.