Matches in SemOpenAlex for { <https://semopenalex.org/work/W3168496698> ?p ?o ?g. }
- W3168496698 endingPage "6010" @default.
- W3168496698 startingPage "5996" @default.
- W3168496698 abstract "Off-policy reinforcement learning (RL) holds the promise of better data efficiency as it allows sample reuse and potentially enables safe interaction with the environment. Current off-policy policy gradient methods either suffer from high bias or high variance, delivering often unreliable estimates. The price of inefficiency becomes evident in real-world scenarios such as interaction-driven robot learning, where the success of RL has been rather limited, and a very high sample cost hinders straightforward application. In this paper, we propose a nonparametric Bellman equation, which can be solved in closed form. The solution is differentiable w.r.t the policy parameters and gives access to an estimation of the policy gradient. In this way, we avoid the high variance of importance sampling approaches, and the high bias of semi-gradient methods. We empirically analyze the quality of our gradient estimate against state-of-the-art methods, and show that it outperforms the baselines in terms of sample efficiency on classical control tasks." @default.
- W3168496698 created "2021-06-22" @default.
- W3168496698 creator A5051476305 @default.
- W3168496698 creator A5053233736 @default.
- W3168496698 creator A5071367253 @default.
- W3168496698 date "2022-10-01" @default.
- W3168496698 modified "2023-10-02" @default.
- W3168496698 title "Batch Reinforcement Learning With a Nonparametric Off-Policy Policy Gradient" @default.
- W3168496698 cites W1499669280 @default.
- W3168496698 cites W1646707810 @default.
- W3168496698 cites W1971713783 @default.
- W3168496698 cites W2046513829 @default.
- W3168496698 cites W2057032881 @default.
- W3168496698 cites W2118556122 @default.
- W3168496698 cites W2119717200 @default.
- W3168496698 cites W2145339207 @default.
- W3168496698 cites W2147632348 @default.
- W3168496698 cites W2156974606 @default.
- W3168496698 cites W2158782408 @default.
- W3168496698 cites W2165308133 @default.
- W3168496698 cites W2172968643 @default.
- W3168496698 cites W2749680651 @default.
- W3168496698 cites W3103182070 @default.
- W3168496698 cites W32403112 @default.
- W3168496698 cites W4245296547 @default.
- W3168496698 doi "https://doi.org/10.1109/tpami.2021.3088063" @default.
- W3168496698 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/34106848" @default.
- W3168496698 hasPublicationYear "2022" @default.
- W3168496698 type Work @default.
- W3168496698 sameAs 3168496698 @default.
- W3168496698 citedByCount "0" @default.
- W3168496698 crossrefType "journal-article" @default.
- W3168496698 hasAuthorship W3168496698A5051476305 @default.
- W3168496698 hasAuthorship W3168496698A5053233736 @default.
- W3168496698 hasAuthorship W3168496698A5071367253 @default.
- W3168496698 hasBestOaLocation W31684966982 @default.
- W3168496698 hasConcept C102366305 @default.
- W3168496698 hasConcept C119857082 @default.
- W3168496698 hasConcept C121955636 @default.
- W3168496698 hasConcept C126255220 @default.
- W3168496698 hasConcept C134306372 @default.
- W3168496698 hasConcept C149782125 @default.
- W3168496698 hasConcept C154945302 @default.
- W3168496698 hasConcept C162324750 @default.
- W3168496698 hasConcept C175444787 @default.
- W3168496698 hasConcept C185592680 @default.
- W3168496698 hasConcept C18903297 @default.
- W3168496698 hasConcept C196083921 @default.
- W3168496698 hasConcept C198531522 @default.
- W3168496698 hasConcept C202615002 @default.
- W3168496698 hasConcept C206588197 @default.
- W3168496698 hasConcept C2778869765 @default.
- W3168496698 hasConcept C33923547 @default.
- W3168496698 hasConcept C41008148 @default.
- W3168496698 hasConcept C43617362 @default.
- W3168496698 hasConcept C86803240 @default.
- W3168496698 hasConcept C97541855 @default.
- W3168496698 hasConceptScore W3168496698C102366305 @default.
- W3168496698 hasConceptScore W3168496698C119857082 @default.
- W3168496698 hasConceptScore W3168496698C121955636 @default.
- W3168496698 hasConceptScore W3168496698C126255220 @default.
- W3168496698 hasConceptScore W3168496698C134306372 @default.
- W3168496698 hasConceptScore W3168496698C149782125 @default.
- W3168496698 hasConceptScore W3168496698C154945302 @default.
- W3168496698 hasConceptScore W3168496698C162324750 @default.
- W3168496698 hasConceptScore W3168496698C175444787 @default.
- W3168496698 hasConceptScore W3168496698C185592680 @default.
- W3168496698 hasConceptScore W3168496698C18903297 @default.
- W3168496698 hasConceptScore W3168496698C196083921 @default.
- W3168496698 hasConceptScore W3168496698C198531522 @default.
- W3168496698 hasConceptScore W3168496698C202615002 @default.
- W3168496698 hasConceptScore W3168496698C206588197 @default.
- W3168496698 hasConceptScore W3168496698C2778869765 @default.
- W3168496698 hasConceptScore W3168496698C33923547 @default.
- W3168496698 hasConceptScore W3168496698C41008148 @default.
- W3168496698 hasConceptScore W3168496698C43617362 @default.
- W3168496698 hasConceptScore W3168496698C86803240 @default.
- W3168496698 hasConceptScore W3168496698C97541855 @default.
- W3168496698 hasFunder F4320320869 @default.
- W3168496698 hasFunder F4320332999 @default.
- W3168496698 hasIssue "10" @default.
- W3168496698 hasLocation W31684966981 @default.
- W3168496698 hasLocation W31684966982 @default.
- W3168496698 hasLocation W31684966983 @default.
- W3168496698 hasOpenAccess W3168496698 @default.
- W3168496698 hasPrimaryLocation W31684966981 @default.
- W3168496698 hasRelatedWork W2012670059 @default.
- W3168496698 hasRelatedWork W2053441693 @default.
- W3168496698 hasRelatedWork W2144476245 @default.
- W3168496698 hasRelatedWork W2189444131 @default.
- W3168496698 hasRelatedWork W3036553387 @default.
- W3168496698 hasRelatedWork W3122160357 @default.
- W3168496698 hasRelatedWork W4234864691 @default.
- W3168496698 hasRelatedWork W4287827094 @default.
- W3168496698 hasRelatedWork W4288027505 @default.
- W3168496698 hasRelatedWork W4319083788 @default.
- W3168496698 hasVolume "44" @default.
- W3168496698 isParatext "false" @default.