Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287816077> ?p ?o ?g. }
Showing items 1 to 65 of
65
with 100 items per page.
- W4287816077 abstract "This paper considers policy search in continuous state-action reinforcement learning problems. Typically, one computes search directions using a classic expression for the policy gradient called the Policy Gradient Theorem, which decomposes the gradient of the value function into two factors: the score function and the Q-function. This paper presents four results:(i) an alternative policy gradient theorem using weak (measure-valued) derivatives instead of score-function is established; (ii) the stochastic gradient estimates thus derived are shown to be unbiased and to yield algorithms that converge almost surely to stationary points of the non-convex value function of the reinforcement learning problem; (iii) the sample complexity of the algorithm is derived and is shown to be $O(1/sqrt(k))$; (iv) finally, the expected variance of the gradient estimates obtained using weak derivatives is shown to be lower than those obtained using the popular score-function approach. Experiments on OpenAI gym pendulum environment show superior performance of the proposed algorithm." @default.
- W4287816077 created "2022-07-26" @default.
- W4287816077 creator A5025896653 @default.
- W4287816077 creator A5068804090 @default.
- W4287816077 creator A5083631658 @default.
- W4287816077 date "2020-04-09" @default.
- W4287816077 modified "2023-10-16" @default.
- W4287816077 title "Policy Gradient using Weak Derivatives for Reinforcement Learning" @default.
- W4287816077 doi "https://doi.org/10.48550/arxiv.2004.04843" @default.
- W4287816077 hasPublicationYear "2020" @default.
- W4287816077 type Work @default.
- W4287816077 citedByCount "0" @default.
- W4287816077 crossrefType "posted-content" @default.
- W4287816077 hasAuthorship W4287816077A5025896653 @default.
- W4287816077 hasAuthorship W4287816077A5068804090 @default.
- W4287816077 hasAuthorship W4287816077A5083631658 @default.
- W4287816077 hasBestOaLocation W42878160771 @default.
- W4287816077 hasConcept C105795698 @default.
- W4287816077 hasConcept C112680207 @default.
- W4287816077 hasConcept C115680565 @default.
- W4287816077 hasConcept C126255220 @default.
- W4287816077 hasConcept C14036430 @default.
- W4287816077 hasConcept C145446738 @default.
- W4287816077 hasConcept C14646407 @default.
- W4287816077 hasConcept C154945302 @default.
- W4287816077 hasConcept C2524010 @default.
- W4287816077 hasConcept C28826006 @default.
- W4287816077 hasConcept C33923547 @default.
- W4287816077 hasConcept C41008148 @default.
- W4287816077 hasConcept C65660741 @default.
- W4287816077 hasConcept C78458016 @default.
- W4287816077 hasConcept C86803240 @default.
- W4287816077 hasConcept C97541855 @default.
- W4287816077 hasConceptScore W4287816077C105795698 @default.
- W4287816077 hasConceptScore W4287816077C112680207 @default.
- W4287816077 hasConceptScore W4287816077C115680565 @default.
- W4287816077 hasConceptScore W4287816077C126255220 @default.
- W4287816077 hasConceptScore W4287816077C14036430 @default.
- W4287816077 hasConceptScore W4287816077C145446738 @default.
- W4287816077 hasConceptScore W4287816077C14646407 @default.
- W4287816077 hasConceptScore W4287816077C154945302 @default.
- W4287816077 hasConceptScore W4287816077C2524010 @default.
- W4287816077 hasConceptScore W4287816077C28826006 @default.
- W4287816077 hasConceptScore W4287816077C33923547 @default.
- W4287816077 hasConceptScore W4287816077C41008148 @default.
- W4287816077 hasConceptScore W4287816077C65660741 @default.
- W4287816077 hasConceptScore W4287816077C78458016 @default.
- W4287816077 hasConceptScore W4287816077C86803240 @default.
- W4287816077 hasConceptScore W4287816077C97541855 @default.
- W4287816077 hasLocation W42878160771 @default.
- W4287816077 hasOpenAccess W4287816077 @default.
- W4287816077 hasPrimaryLocation W42878160771 @default.
- W4287816077 hasRelatedWork W1621708194 @default.
- W4287816077 hasRelatedWork W2155027007 @default.
- W4287816077 hasRelatedWork W2770149067 @default.
- W4287816077 hasRelatedWork W2888238026 @default.
- W4287816077 hasRelatedWork W2918392679 @default.
- W4287816077 hasRelatedWork W2995081787 @default.
- W4287816077 hasRelatedWork W3015953126 @default.
- W4287816077 hasRelatedWork W3115368470 @default.
- W4287816077 hasRelatedWork W4287550122 @default.
- W4287816077 hasRelatedWork W4287816077 @default.
- W4287816077 isParatext "false" @default.
- W4287816077 isRetracted "false" @default.
- W4287816077 workType "article" @default.