Matches in SemOpenAlex for { <https://semopenalex.org/work/W4221139949> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4221139949 abstract "Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator - the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks, both in low and high-dimensional action spaces. With this work, we want to show that the Measure-Valued Derivative estimator can be a useful alternative to other policy gradient estimators." @default.
- W4221139949 created "2022-04-03" @default.
- W4221139949 creator A5065775738 @default.
- W4221139949 creator A5071367253 @default.
- W4221139949 date "2022-03-08" @default.
- W4221139949 modified "2023-10-17" @default.
- W4221139949 title "An Analysis of Measure-Valued Derivatives for Policy Gradients" @default.
- W4221139949 doi "https://doi.org/10.48550/arxiv.2203.03917" @default.
- W4221139949 hasPublicationYear "2022" @default.
- W4221139949 type Work @default.
- W4221139949 citedByCount "0" @default.
- W4221139949 crossrefType "posted-content" @default.
- W4221139949 hasAuthorship W4221139949A5065775738 @default.
- W4221139949 hasAuthorship W4221139949A5071367253 @default.
- W4221139949 hasBestOaLocation W42211399491 @default.
- W4221139949 hasConcept C105795698 @default.
- W4221139949 hasConcept C115680565 @default.
- W4221139949 hasConcept C121955636 @default.
- W4221139949 hasConcept C124101348 @default.
- W4221139949 hasConcept C126255220 @default.
- W4221139949 hasConcept C134306372 @default.
- W4221139949 hasConcept C14036430 @default.
- W4221139949 hasConcept C154945302 @default.
- W4221139949 hasConcept C162324750 @default.
- W4221139949 hasConcept C165646398 @default.
- W4221139949 hasConcept C185429906 @default.
- W4221139949 hasConcept C191393472 @default.
- W4221139949 hasConcept C196083921 @default.
- W4221139949 hasConcept C202615002 @default.
- W4221139949 hasConcept C2780009758 @default.
- W4221139949 hasConcept C28826006 @default.
- W4221139949 hasConcept C33923547 @default.
- W4221139949 hasConcept C41008148 @default.
- W4221139949 hasConcept C78458016 @default.
- W4221139949 hasConcept C86803240 @default.
- W4221139949 hasConcept C97541855 @default.
- W4221139949 hasConceptScore W4221139949C105795698 @default.
- W4221139949 hasConceptScore W4221139949C115680565 @default.
- W4221139949 hasConceptScore W4221139949C121955636 @default.
- W4221139949 hasConceptScore W4221139949C124101348 @default.
- W4221139949 hasConceptScore W4221139949C126255220 @default.
- W4221139949 hasConceptScore W4221139949C134306372 @default.
- W4221139949 hasConceptScore W4221139949C14036430 @default.
- W4221139949 hasConceptScore W4221139949C154945302 @default.
- W4221139949 hasConceptScore W4221139949C162324750 @default.
- W4221139949 hasConceptScore W4221139949C165646398 @default.
- W4221139949 hasConceptScore W4221139949C185429906 @default.
- W4221139949 hasConceptScore W4221139949C191393472 @default.
- W4221139949 hasConceptScore W4221139949C196083921 @default.
- W4221139949 hasConceptScore W4221139949C202615002 @default.
- W4221139949 hasConceptScore W4221139949C2780009758 @default.
- W4221139949 hasConceptScore W4221139949C28826006 @default.
- W4221139949 hasConceptScore W4221139949C33923547 @default.
- W4221139949 hasConceptScore W4221139949C41008148 @default.
- W4221139949 hasConceptScore W4221139949C78458016 @default.
- W4221139949 hasConceptScore W4221139949C86803240 @default.
- W4221139949 hasConceptScore W4221139949C97541855 @default.
- W4221139949 hasLocation W42211399491 @default.
- W4221139949 hasOpenAccess W4221139949 @default.
- W4221139949 hasPrimaryLocation W42211399491 @default.
- W4221139949 hasRelatedWork W1981948503 @default.
- W4221139949 hasRelatedWork W2133164226 @default.
- W4221139949 hasRelatedWork W2147031398 @default.
- W4221139949 hasRelatedWork W2162641328 @default.
- W4221139949 hasRelatedWork W2234859443 @default.
- W4221139949 hasRelatedWork W2962802563 @default.
- W4221139949 hasRelatedWork W3184091527 @default.
- W4221139949 hasRelatedWork W3200112695 @default.
- W4221139949 hasRelatedWork W4221139949 @default.
- W4221139949 hasRelatedWork W4287071387 @default.
- W4221139949 isParatext "false" @default.
- W4221139949 isRetracted "false" @default.
- W4221139949 workType "article" @default.