Matches in SemOpenAlex for { <https://semopenalex.org/work/W2963285565> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W2963285565 abstract "Policy optimization methods have shown great promise in solving complex reinforcement and imitation learning tasks. While model-free methods are broadly applicable, they often require many samples to optimize complex policies. Model-based methods greatly improve sample-efficiency but at the cost of poor generalization, requiring a carefully handcrafted model of the system dynamics for each task. Recently, hybrid methods have been successful in trading off applicability for improved sample-complexity. However, these have been limited to continuous action spaces. In this work, we present a new hybrid method based on an approximation of the dynamics as an expectation over the next state under the current policy. This relaxation allows us to derive a novel hybrid policy gradient estimator, combining score function and pathwise derivative estimators, that is applicable to discrete action spaces. We show significant gains in sample complexity, ranging between 1.7 and 25 times, when learning parameterized policies on Cart Pole, Acrobot, Mountain Car and Hand Mass. Our method is applicable to both discrete and continuous action spaces, when competing pathwise methods are limited to the latter." @default.
- W2963285565 created "2019-07-30" @default.
- W2963285565 creator A5086319422 @default.
- W2963285565 creator A5091179481 @default.
- W2963285565 date "2018-04-29" @default.
- W2963285565 modified "2023-10-16" @default.
- W2963285565 title "Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces" @default.
- W2963285565 doi "https://doi.org/10.1609/aaai.v32i1.11822" @default.
- W2963285565 hasPublicationYear "2018" @default.
- W2963285565 type Work @default.
- W2963285565 sameAs 2963285565 @default.
- W2963285565 citedByCount "2" @default.
- W2963285565 countsByYear W29632855652018 @default.
- W2963285565 crossrefType "journal-article" @default.
- W2963285565 hasAuthorship W2963285565A5086319422 @default.
- W2963285565 hasAuthorship W2963285565A5091179481 @default.
- W2963285565 hasBestOaLocation W29632855651 @default.
- W2963285565 hasConcept C105795698 @default.
- W2963285565 hasConcept C11413529 @default.
- W2963285565 hasConcept C119857082 @default.
- W2963285565 hasConcept C121332964 @default.
- W2963285565 hasConcept C126255220 @default.
- W2963285565 hasConcept C134306372 @default.
- W2963285565 hasConcept C14036430 @default.
- W2963285565 hasConcept C154945302 @default.
- W2963285565 hasConcept C162324750 @default.
- W2963285565 hasConcept C165464430 @default.
- W2963285565 hasConcept C177148314 @default.
- W2963285565 hasConcept C185429906 @default.
- W2963285565 hasConcept C185592680 @default.
- W2963285565 hasConcept C187736073 @default.
- W2963285565 hasConcept C190669063 @default.
- W2963285565 hasConcept C198531522 @default.
- W2963285565 hasConcept C2780451532 @default.
- W2963285565 hasConcept C2780791683 @default.
- W2963285565 hasConcept C33923547 @default.
- W2963285565 hasConcept C41008148 @default.
- W2963285565 hasConcept C43617362 @default.
- W2963285565 hasConcept C62520636 @default.
- W2963285565 hasConcept C78458016 @default.
- W2963285565 hasConcept C86803240 @default.
- W2963285565 hasConcept C97541855 @default.
- W2963285565 hasConceptScore W2963285565C105795698 @default.
- W2963285565 hasConceptScore W2963285565C11413529 @default.
- W2963285565 hasConceptScore W2963285565C119857082 @default.
- W2963285565 hasConceptScore W2963285565C121332964 @default.
- W2963285565 hasConceptScore W2963285565C126255220 @default.
- W2963285565 hasConceptScore W2963285565C134306372 @default.
- W2963285565 hasConceptScore W2963285565C14036430 @default.
- W2963285565 hasConceptScore W2963285565C154945302 @default.
- W2963285565 hasConceptScore W2963285565C162324750 @default.
- W2963285565 hasConceptScore W2963285565C165464430 @default.
- W2963285565 hasConceptScore W2963285565C177148314 @default.
- W2963285565 hasConceptScore W2963285565C185429906 @default.
- W2963285565 hasConceptScore W2963285565C185592680 @default.
- W2963285565 hasConceptScore W2963285565C187736073 @default.
- W2963285565 hasConceptScore W2963285565C190669063 @default.
- W2963285565 hasConceptScore W2963285565C198531522 @default.
- W2963285565 hasConceptScore W2963285565C2780451532 @default.
- W2963285565 hasConceptScore W2963285565C2780791683 @default.
- W2963285565 hasConceptScore W2963285565C33923547 @default.
- W2963285565 hasConceptScore W2963285565C41008148 @default.
- W2963285565 hasConceptScore W2963285565C43617362 @default.
- W2963285565 hasConceptScore W2963285565C62520636 @default.
- W2963285565 hasConceptScore W2963285565C78458016 @default.
- W2963285565 hasConceptScore W2963285565C86803240 @default.
- W2963285565 hasConceptScore W2963285565C97541855 @default.
- W2963285565 hasIssue "1" @default.
- W2963285565 hasLocation W29632855651 @default.
- W2963285565 hasLocation W29632855652 @default.
- W2963285565 hasOpenAccess W2963285565 @default.
- W2963285565 hasPrimaryLocation W29632855651 @default.
- W2963285565 hasRelatedWork W2416943787 @default.
- W2963285565 hasRelatedWork W2734912394 @default.
- W2963285565 hasRelatedWork W2737943148 @default.
- W2963285565 hasRelatedWork W2951308022 @default.
- W2963285565 hasRelatedWork W3147214434 @default.
- W2963285565 hasRelatedWork W4211240529 @default.
- W2963285565 hasRelatedWork W4221153218 @default.
- W2963285565 hasRelatedWork W4288317198 @default.
- W2963285565 hasRelatedWork W4293469469 @default.
- W2963285565 hasRelatedWork W4302011254 @default.
- W2963285565 hasVolume "32" @default.
- W2963285565 isParatext "false" @default.
- W2963285565 isRetracted "false" @default.
- W2963285565 magId "2963285565" @default.
- W2963285565 workType "article" @default.