Matches in SemOpenAlex for { <https://semopenalex.org/work/W2996001434> ?p ?o ?g. }
- W2996001434 abstract "Entropy regularization is used to get improved optimization performance in reinforcement learning tasks. A common form of regularization is to maximize policy entropy to avoid premature convergence and lead to more stochastic policies for exploration through action space. However, this does not ensure exploration in the state space. In this work, we instead consider the distribution of discounted weighting of states, and propose to maximize the entropy of a lower bound approximation to the weighting of a state, based on latent space state representation. We propose entropy regularization based on the marginal state distribution, to encourage the policy to have a more uniform distribution over the state space for exploration. Our approach based on marginal state distribution achieves superior state space coverage on complex gridworld domains, that translate into empirical gains in sparse reward 3D maze navigation and continuous control domains compared to entropy regularization with stochastic policies." @default.
- W2996001434 created "2019-12-26" @default.
- W2996001434 creator A5024581585 @default.
- W2996001434 creator A5065836447 @default.
- W2996001434 creator A5079926596 @default.
- W2996001434 date "2019-12-11" @default.
- W2996001434 modified "2023-09-27" @default.
- W2996001434 title "Marginalized State Distribution Entropy Regularization in Policy Optimization" @default.
- W2996001434 cites W1191599655 @default.
- W2996001434 cites W1777239053 @default.
- W2996001434 cites W1959608418 @default.
- W2996001434 cites W2119717200 @default.
- W2996001434 cites W2155027007 @default.
- W2996001434 cites W2158782408 @default.
- W2996001434 cites W2561776174 @default.
- W2996001434 cites W2609650878 @default.
- W2996001434 cites W2751973545 @default.
- W2996001434 cites W2754517384 @default.
- W2996001434 cites W2854803484 @default.
- W2996001434 cites W2902298341 @default.
- W2996001434 cites W2905822515 @default.
- W2996001434 cites W2922007426 @default.
- W2996001434 cites W2962821147 @default.
- W2996001434 cites W2962902376 @default.
- W2996001434 cites W2963160877 @default.
- W2996001434 cites W2963276097 @default.
- W2996001434 cites W2963438456 @default.
- W2996001434 cites W2963864421 @default.
- W2996001434 cites W2963923407 @default.
- W2996001434 cites W2964009285 @default.
- W2996001434 cites W2964043796 @default.
- W2996001434 cites W2964097858 @default.
- W2996001434 cites W2997289589 @default.
- W2996001434 cites W567721252 @default.
- W2996001434 hasPublicationYear "2019" @default.
- W2996001434 type Work @default.
- W2996001434 sameAs 2996001434 @default.
- W2996001434 citedByCount "5" @default.
- W2996001434 countsByYear W29960014342020 @default.
- W2996001434 countsByYear W29960014342021 @default.
- W2996001434 countsByYear W29960014342022 @default.
- W2996001434 crossrefType "posted-content" @default.
- W2996001434 hasAuthorship W2996001434A5024581585 @default.
- W2996001434 hasAuthorship W2996001434A5065836447 @default.
- W2996001434 hasAuthorship W2996001434A5079926596 @default.
- W2996001434 hasConcept C101721835 @default.
- W2996001434 hasConcept C105795698 @default.
- W2996001434 hasConcept C106301342 @default.
- W2996001434 hasConcept C121332964 @default.
- W2996001434 hasConcept C122123141 @default.
- W2996001434 hasConcept C124551494 @default.
- W2996001434 hasConcept C126255220 @default.
- W2996001434 hasConcept C154945302 @default.
- W2996001434 hasConcept C165216359 @default.
- W2996001434 hasConcept C171752962 @default.
- W2996001434 hasConcept C183115368 @default.
- W2996001434 hasConcept C24890656 @default.
- W2996001434 hasConcept C2776135515 @default.
- W2996001434 hasConcept C33923547 @default.
- W2996001434 hasConcept C41008148 @default.
- W2996001434 hasConcept C60507348 @default.
- W2996001434 hasConcept C62520636 @default.
- W2996001434 hasConcept C9679016 @default.
- W2996001434 hasConceptScore W2996001434C101721835 @default.
- W2996001434 hasConceptScore W2996001434C105795698 @default.
- W2996001434 hasConceptScore W2996001434C106301342 @default.
- W2996001434 hasConceptScore W2996001434C121332964 @default.
- W2996001434 hasConceptScore W2996001434C122123141 @default.
- W2996001434 hasConceptScore W2996001434C124551494 @default.
- W2996001434 hasConceptScore W2996001434C126255220 @default.
- W2996001434 hasConceptScore W2996001434C154945302 @default.
- W2996001434 hasConceptScore W2996001434C165216359 @default.
- W2996001434 hasConceptScore W2996001434C171752962 @default.
- W2996001434 hasConceptScore W2996001434C183115368 @default.
- W2996001434 hasConceptScore W2996001434C24890656 @default.
- W2996001434 hasConceptScore W2996001434C2776135515 @default.
- W2996001434 hasConceptScore W2996001434C33923547 @default.
- W2996001434 hasConceptScore W2996001434C41008148 @default.
- W2996001434 hasConceptScore W2996001434C60507348 @default.
- W2996001434 hasConceptScore W2996001434C62520636 @default.
- W2996001434 hasConceptScore W2996001434C9679016 @default.
- W2996001434 hasLocation W29960014341 @default.
- W2996001434 hasOpenAccess W2996001434 @default.
- W2996001434 hasPrimaryLocation W29960014341 @default.
- W2996001434 hasRelatedWork W2015482176 @default.
- W2996001434 hasRelatedWork W2173248099 @default.
- W2996001434 hasRelatedWork W2606757878 @default.
- W2996001434 hasRelatedWork W2611835223 @default.
- W2996001434 hasRelatedWork W2786042995 @default.
- W2996001434 hasRelatedWork W2883895200 @default.
- W2996001434 hasRelatedWork W2950650380 @default.
- W2996001434 hasRelatedWork W2953326529 @default.
- W2996001434 hasRelatedWork W3036846812 @default.
- W2996001434 hasRelatedWork W3039845099 @default.
- W2996001434 hasRelatedWork W3041970508 @default.
- W2996001434 hasRelatedWork W3046626913 @default.
- W2996001434 hasRelatedWork W3080213971 @default.
- W2996001434 hasRelatedWork W3102709953 @default.
- W2996001434 hasRelatedWork W3120050142 @default.
- W2996001434 hasRelatedWork W3127035336 @default.