Matches in SemOpenAlex for { <https://semopenalex.org/work/W3201854402> ?p ?o ?g. }
- W3201854402 abstract "Distributional reinforcement learning~(RL) is a class of state-of-the-art algorithms that estimate the whole distribution of the total return rather than only its expectation. Despite the remarkable performance of distributional RL, a theoretical understanding of its advantages over expectation-based RL remains elusive. In this paper, we interpret distributional RL as entropy-regularized maximum likelihood estimation in the textit{neural Z-fitted iteration} framework, and establish the connection of the resulting risk-aware regularization with maximum entropy RL. In addition, We shed light on the stability-promoting distributional loss with desirable smoothness properties in distributional RL, which can yield stable optimization and guaranteed generalization. We also analyze the acceleration behavior while optimizing distributional RL algorithms and show that an appropriate approximation to the true target distribution can speed up the convergence. From the perspective of representation, we find that distributional RL encourages state representation from the same action class classified by the policy in tighter clusters. Finally, we propose a class of textit{Sinkhorn distributional RL} algorithm that interpolates between the Wasserstein distance and maximum mean discrepancy~(MMD). Experiments on a suite of Atari games reveal the competitive performance of our algorithm relative to existing state-of-the-art distributional RL algorithms." @default.
- W3201854402 created "2021-10-11" @default.
- W3201854402 creator A5006962593 @default.
- W3201854402 creator A5008659449 @default.
- W3201854402 creator A5018151914 @default.
- W3201854402 creator A5020873211 @default.
- W3201854402 creator A5027724935 @default.
- W3201854402 creator A5028346074 @default.
- W3201854402 creator A5062334200 @default.
- W3201854402 creator A5068048142 @default.
- W3201854402 creator A5079305186 @default.
- W3201854402 date "2021-10-07" @default.
- W3201854402 modified "2023-09-27" @default.
- W3201854402 title "Towards Understanding Distributional Reinforcement Learning: Regularization, Optimization, Acceleration and Sinkhorn Algorithm." @default.
- W3201854402 cites W1821462560 @default.
- W3201854402 cites W1993411524 @default.
- W3201854402 cites W2121863487 @default.
- W3201854402 cites W2145339207 @default.
- W3201854402 cites W2312355711 @default.
- W3201854402 cites W2594103415 @default.
- W3201854402 cites W2619903301 @default.
- W3201854402 cites W2736601468 @default.
- W3201854402 cites W2739748921 @default.
- W3201854402 cites W2904246096 @default.
- W3201854402 cites W2905224739 @default.
- W3201854402 cites W2915401840 @default.
- W3201854402 cites W2946364835 @default.
- W3201854402 cites W2948210185 @default.
- W3201854402 cites W2962878825 @default.
- W3201854402 cites W2963417849 @default.
- W3201854402 cites W2963423916 @default.
- W3201854402 cites W2963506485 @default.
- W3201854402 cites W2963610502 @default.
- W3201854402 cites W2963612511 @default.
- W3201854402 cites W2963794891 @default.
- W3201854402 cites W2970036354 @default.
- W3201854402 cites W3030320826 @default.
- W3201854402 cites W3035148960 @default.
- W3201854402 cites W3036585823 @default.
- W3201854402 cites W3045080532 @default.
- W3201854402 cites W3101710896 @default.
- W3201854402 cites W3151336074 @default.
- W3201854402 cites W3157052873 @default.
- W3201854402 cites W3179631121 @default.
- W3201854402 cites W3198990431 @default.
- W3201854402 cites W2917022709 @default.
- W3201854402 hasPublicationYear "2021" @default.
- W3201854402 type Work @default.
- W3201854402 sameAs 3201854402 @default.
- W3201854402 citedByCount "0" @default.
- W3201854402 crossrefType "posted-content" @default.
- W3201854402 hasAuthorship W3201854402A5006962593 @default.
- W3201854402 hasAuthorship W3201854402A5008659449 @default.
- W3201854402 hasAuthorship W3201854402A5018151914 @default.
- W3201854402 hasAuthorship W3201854402A5020873211 @default.
- W3201854402 hasAuthorship W3201854402A5027724935 @default.
- W3201854402 hasAuthorship W3201854402A5028346074 @default.
- W3201854402 hasAuthorship W3201854402A5062334200 @default.
- W3201854402 hasAuthorship W3201854402A5068048142 @default.
- W3201854402 hasAuthorship W3201854402A5079305186 @default.
- W3201854402 hasConcept C11413529 @default.
- W3201854402 hasConcept C126255220 @default.
- W3201854402 hasConcept C154945302 @default.
- W3201854402 hasConcept C167981619 @default.
- W3201854402 hasConcept C2776135515 @default.
- W3201854402 hasConcept C33923547 @default.
- W3201854402 hasConcept C41008148 @default.
- W3201854402 hasConcept C9679016 @default.
- W3201854402 hasConcept C97541855 @default.
- W3201854402 hasConceptScore W3201854402C11413529 @default.
- W3201854402 hasConceptScore W3201854402C126255220 @default.
- W3201854402 hasConceptScore W3201854402C154945302 @default.
- W3201854402 hasConceptScore W3201854402C167981619 @default.
- W3201854402 hasConceptScore W3201854402C2776135515 @default.
- W3201854402 hasConceptScore W3201854402C33923547 @default.
- W3201854402 hasConceptScore W3201854402C41008148 @default.
- W3201854402 hasConceptScore W3201854402C9679016 @default.
- W3201854402 hasConceptScore W3201854402C97541855 @default.
- W3201854402 hasLocation W32018544021 @default.
- W3201854402 hasOpenAccess W3201854402 @default.
- W3201854402 hasPrimaryLocation W32018544021 @default.
- W3201854402 hasRelatedWork W2460675832 @default.
- W3201854402 hasRelatedWork W2911884060 @default.
- W3201854402 hasRelatedWork W2949694312 @default.
- W3201854402 hasRelatedWork W2953083372 @default.
- W3201854402 hasRelatedWork W2953633867 @default.
- W3201854402 hasRelatedWork W2965922694 @default.
- W3201854402 hasRelatedWork W2970036354 @default.
- W3201854402 hasRelatedWork W3013875875 @default.
- W3201854402 hasRelatedWork W3014137283 @default.
- W3201854402 hasRelatedWork W3045080532 @default.
- W3201854402 hasRelatedWork W3049102318 @default.
- W3201854402 hasRelatedWork W3101180011 @default.
- W3201854402 hasRelatedWork W3101487584 @default.
- W3201854402 hasRelatedWork W3101995310 @default.
- W3201854402 hasRelatedWork W3164559980 @default.
- W3201854402 hasRelatedWork W3174436986 @default.
- W3201854402 hasRelatedWork W3193494395 @default.
- W3201854402 hasRelatedWork W3198990431 @default.
- W3201854402 hasRelatedWork W3202086712 @default.