Matches in SemOpenAlex for { <https://semopenalex.org/work/W2947150733> ?p ?o ?g. }
- W2947150733 abstract "Off-policy reinforcement learning aims to leverage experience collected from prior policies for sample-efficient learning. However, in practice, commonly used off-policy approximate dynamic programming methods based on Q-learning and actor-critic methods are highly sensitive to the data distribution, and can make only limited progress without collecting additional on-policy data. As a step towards more robust off-policy algorithms, we study the setting where the off-policy experience is fixed and there is no further interaction with the environment. We identify bootstrapping error as a key source of instability in current methods. Bootstrapping error is due to bootstrapping from actions that lie outside of the training data distribution, and it accumulates via the Bellman backup operator. We theoretically analyze bootstrapping error, and demonstrate how carefully constraining action selection in the backup can mitigate it. Based on our analysis, we propose a practical algorithm, bootstrapping error accumulation reduction (BEAR). We demonstrate that BEAR is able to learn robustly from different off-policy distributions, including random and suboptimal demonstrations, on a range of continuous control tasks." @default.
- W2947150733 created "2019-06-07" @default.
- W2947150733 creator A5026322200 @default.
- W2947150733 creator A5037720811 @default.
- W2947150733 creator A5047802575 @default.
- W2947150733 creator A5048032272 @default.
- W2947150733 date "2019-06-03" @default.
- W2947150733 modified "2023-09-27" @default.
- W2947150733 title "Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction" @default.
- W2947150733 cites W114364418 @default.
- W2947150733 cites W1575592356 @default.
- W2947150733 cites W1600046456 @default.
- W2947150733 cites W1730555343 @default.
- W2947150733 cites W1771410628 @default.
- W2947150733 cites W1889629917 @default.
- W2947150733 cites W2027184806 @default.
- W2947150733 cites W2108598243 @default.
- W2947150733 cites W2121863487 @default.
- W2947150733 cites W2128812357 @default.
- W2947150733 cites W2158782408 @default.
- W2947150733 cites W2166302491 @default.
- W2947150733 cites W2194775991 @default.
- W2947150733 cites W2212660284 @default.
- W2947150733 cites W2341535507 @default.
- W2947150733 cites W2781726626 @default.
- W2947150733 cites W2785379783 @default.
- W2947150733 cites W2799151646 @default.
- W2947150733 cites W2904789544 @default.
- W2947150733 cites W2911412474 @default.
- W2947150733 cites W2920306970 @default.
- W2947150733 cites W2949576341 @default.
- W2947150733 cites W2949608212 @default.
- W2947150733 cites W2953981431 @default.
- W2947150733 cites W2958416396 @default.
- W2947150733 cites W2963341956 @default.
- W2947150733 cites W2963363446 @default.
- W2947150733 cites W2963376229 @default.
- W2947150733 cites W2963453204 @default.
- W2947150733 cites W2963484919 @default.
- W2947150733 cites W2963674921 @default.
- W2947150733 cites W2963704132 @default.
- W2947150733 cites W2963923407 @default.
- W2947150733 cites W2963985863 @default.
- W2947150733 cites W3093206925 @default.
- W2947150733 cites W3148685027 @default.
- W2947150733 cites W779665318 @default.
- W2947150733 hasPublicationYear "2019" @default.
- W2947150733 type Work @default.
- W2947150733 sameAs 2947150733 @default.
- W2947150733 citedByCount "29" @default.
- W2947150733 countsByYear W29471507332019 @default.
- W2947150733 countsByYear W29471507332020 @default.
- W2947150733 countsByYear W29471507332021 @default.
- W2947150733 crossrefType "posted-content" @default.
- W2947150733 hasAuthorship W2947150733A5026322200 @default.
- W2947150733 hasAuthorship W2947150733A5037720811 @default.
- W2947150733 hasAuthorship W2947150733A5047802575 @default.
- W2947150733 hasAuthorship W2947150733A5048032272 @default.
- W2947150733 hasConcept C111335779 @default.
- W2947150733 hasConcept C119857082 @default.
- W2947150733 hasConcept C149782125 @default.
- W2947150733 hasConcept C153083717 @default.
- W2947150733 hasConcept C154945302 @default.
- W2947150733 hasConcept C166109690 @default.
- W2947150733 hasConcept C169760540 @default.
- W2947150733 hasConcept C207609745 @default.
- W2947150733 hasConcept C2524010 @default.
- W2947150733 hasConcept C26760741 @default.
- W2947150733 hasConcept C2780945871 @default.
- W2947150733 hasConcept C33923547 @default.
- W2947150733 hasConcept C41008148 @default.
- W2947150733 hasConcept C77088390 @default.
- W2947150733 hasConcept C86803240 @default.
- W2947150733 hasConcept C97541855 @default.
- W2947150733 hasConceptScore W2947150733C111335779 @default.
- W2947150733 hasConceptScore W2947150733C119857082 @default.
- W2947150733 hasConceptScore W2947150733C149782125 @default.
- W2947150733 hasConceptScore W2947150733C153083717 @default.
- W2947150733 hasConceptScore W2947150733C154945302 @default.
- W2947150733 hasConceptScore W2947150733C166109690 @default.
- W2947150733 hasConceptScore W2947150733C169760540 @default.
- W2947150733 hasConceptScore W2947150733C207609745 @default.
- W2947150733 hasConceptScore W2947150733C2524010 @default.
- W2947150733 hasConceptScore W2947150733C26760741 @default.
- W2947150733 hasConceptScore W2947150733C2780945871 @default.
- W2947150733 hasConceptScore W2947150733C33923547 @default.
- W2947150733 hasConceptScore W2947150733C41008148 @default.
- W2947150733 hasConceptScore W2947150733C77088390 @default.
- W2947150733 hasConceptScore W2947150733C86803240 @default.
- W2947150733 hasConceptScore W2947150733C97541855 @default.
- W2947150733 hasLocation W29471507331 @default.
- W2947150733 hasOpenAccess W2947150733 @default.
- W2947150733 hasPrimaryLocation W29471507331 @default.
- W2947150733 hasRelatedWork W192920577 @default.
- W2947150733 hasRelatedWork W2120346334 @default.
- W2947150733 hasRelatedWork W2121863487 @default.
- W2947150733 hasRelatedWork W2145339207 @default.
- W2947150733 hasRelatedWork W2158782408 @default.
- W2947150733 hasRelatedWork W2173248099 @default.
- W2947150733 hasRelatedWork W2736601468 @default.