Matches in SemOpenAlex for { <https://semopenalex.org/work/W2914362802> ?p ?o ?g. }
- W2914362802 abstract "We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary entropy measures that generalize Shannon entropy, such as Tsallis entropy and Renyi entropy, can be utilized to properly randomize action selection while fulfilling the goal of maximizing expected long-term rewards. Our theory gives birth to two new algorithms, i.e., Tsallis entropy Actor-Critic (TAC) and Renyi entropy Actor-Critic (RAC). Theoretical analysis shows that these algorithms can be more effective than SAC. Moreover, they pave the way for us to develop a new Ensemble Actor-Critic (EAC) algorithm in this paper that features the use of a bootstrap mechanism for deep environment exploration as well as a new value-function based mechanism for high-level action selection. Empirically we show that TAC, RAC and EAC can achieve state-of-the-art performance on a range of benchmark control tasks, outperforming SAC and several cutting-edge learning algorithms in terms of both sample efficiency and effectiveness." @default.
- W2914362802 created "2019-02-21" @default.
- W2914362802 creator A5040575168 @default.
- W2914362802 creator A5045067652 @default.
- W2914362802 date "2019-02-14" @default.
- W2914362802 modified "2023-09-23" @default.
- W2914362802 title "Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning" @default.
- W2914362802 cites W1771410628 @default.
- W2914362802 cites W2004778468 @default.
- W2914362802 cites W2012587148 @default.
- W2914362802 cites W2017054435 @default.
- W2914362802 cites W2043806097 @default.
- W2914362802 cites W2094387729 @default.
- W2914362802 cites W2107544712 @default.
- W2914362802 cites W2109246257 @default.
- W2914362802 cites W2121863487 @default.
- W2914362802 cites W2145339207 @default.
- W2914362802 cites W2158782408 @default.
- W2914362802 cites W2173248099 @default.
- W2914362802 cites W2290354866 @default.
- W2914362802 cites W2460299708 @default.
- W2914362802 cites W2554120691 @default.
- W2914362802 cites W2575705757 @default.
- W2914362802 cites W2593044849 @default.
- W2914362802 cites W2623431351 @default.
- W2914362802 cites W2736601468 @default.
- W2914362802 cites W2749928749 @default.
- W2914362802 cites W2754517384 @default.
- W2914362802 cites W2767313115 @default.
- W2914362802 cites W2774354230 @default.
- W2914362802 cites W2778821583 @default.
- W2914362802 cites W2798189842 @default.
- W2914362802 cites W2798273187 @default.
- W2914362802 cites W2807908072 @default.
- W2914362802 cites W2891797170 @default.
- W2914362802 cites W2897013919 @default.
- W2914362802 cites W2949561945 @default.
- W2914362802 cites W2949608212 @default.
- W2914362802 cites W2962777832 @default.
- W2914362802 cites W2962902376 @default.
- W2914362802 cites W2963313316 @default.
- W2914362802 cites W2963407617 @default.
- W2914362802 cites W2963576857 @default.
- W2914362802 cites W2963674921 @default.
- W2914362802 cites W2963849886 @default.
- W2914362802 cites W2963938771 @default.
- W2914362802 cites W2964043796 @default.
- W2914362802 cites W64088143 @default.
- W2914362802 hasPublicationYear "2019" @default.
- W2914362802 type Work @default.
- W2914362802 sameAs 2914362802 @default.
- W2914362802 citedByCount "4" @default.
- W2914362802 countsByYear W29143628022020 @default.
- W2914362802 countsByYear W29143628022021 @default.
- W2914362802 crossrefType "posted-content" @default.
- W2914362802 hasAuthorship W2914362802A5040575168 @default.
- W2914362802 hasAuthorship W2914362802A5045067652 @default.
- W2914362802 hasConcept C106301342 @default.
- W2914362802 hasConcept C117521176 @default.
- W2914362802 hasConcept C119857082 @default.
- W2914362802 hasConcept C121332964 @default.
- W2914362802 hasConcept C126255220 @default.
- W2914362802 hasConcept C13280743 @default.
- W2914362802 hasConcept C14646407 @default.
- W2914362802 hasConcept C153180895 @default.
- W2914362802 hasConcept C154945302 @default.
- W2914362802 hasConcept C166109690 @default.
- W2914362802 hasConcept C169760540 @default.
- W2914362802 hasConcept C185798385 @default.
- W2914362802 hasConcept C205649164 @default.
- W2914362802 hasConcept C26760741 @default.
- W2914362802 hasConcept C33923547 @default.
- W2914362802 hasConcept C41008148 @default.
- W2914362802 hasConcept C62520636 @default.
- W2914362802 hasConcept C86803240 @default.
- W2914362802 hasConcept C9679016 @default.
- W2914362802 hasConcept C97541855 @default.
- W2914362802 hasConceptScore W2914362802C106301342 @default.
- W2914362802 hasConceptScore W2914362802C117521176 @default.
- W2914362802 hasConceptScore W2914362802C119857082 @default.
- W2914362802 hasConceptScore W2914362802C121332964 @default.
- W2914362802 hasConceptScore W2914362802C126255220 @default.
- W2914362802 hasConceptScore W2914362802C13280743 @default.
- W2914362802 hasConceptScore W2914362802C14646407 @default.
- W2914362802 hasConceptScore W2914362802C153180895 @default.
- W2914362802 hasConceptScore W2914362802C154945302 @default.
- W2914362802 hasConceptScore W2914362802C166109690 @default.
- W2914362802 hasConceptScore W2914362802C169760540 @default.
- W2914362802 hasConceptScore W2914362802C185798385 @default.
- W2914362802 hasConceptScore W2914362802C205649164 @default.
- W2914362802 hasConceptScore W2914362802C26760741 @default.
- W2914362802 hasConceptScore W2914362802C33923547 @default.
- W2914362802 hasConceptScore W2914362802C41008148 @default.
- W2914362802 hasConceptScore W2914362802C62520636 @default.
- W2914362802 hasConceptScore W2914362802C86803240 @default.
- W2914362802 hasConceptScore W2914362802C9679016 @default.
- W2914362802 hasConceptScore W2914362802C97541855 @default.
- W2914362802 hasOpenAccess W2914362802 @default.
- W2914362802 hasRelatedWork W2156347136 @default.
- W2914362802 hasRelatedWork W2781726626 @default.