Matches in SemOpenAlex for { <https://semopenalex.org/work/W2892076218> ?p ?o ?g. }
Showing items 1 to 97 of
97
with 100 items per page.
- W2892076218 abstract "Though successful in high-dimensional domains, deep reinforcement learning exhibits high sample complexity and suffers from stability issues as reported by researchers and practitioners in the field. These problems hinder the application of such algorithms in real-world and safety-critical scenarios. In this paper, we take steps towards stable and efficient reinforcement learning by following a model-based approach that is known to reduce agent-environment interactions. Namely, our method augments deep Q-networks (DQNs) with model predictions for transitions, rewards, and termination flags. Having the model at hand, we then conduct a rigorous theoretical study of our algorithm and show, for the first time, convergence to a stationary point. En route, we provide a counter-example showing that 'vanilla' DQNs can diverge confirming practitioners' and researchers' experiences. Our proof is novel in its own right and can be extended to other forms of deep reinforcement learning. In particular, we believe exploiting the relation between reinforcement (with deep function approximators) and online learning can serve as a recipe for future proofs in the domain. Finally, we validate our theoretical results in 20 games from the Atari benchmark. Our results show that following the proposed model-based learning approach not only ensures convergence but leads to a reduction in sample complexity and superior performance." @default.
- W2892076218 created "2018-09-27" @default.
- W2892076218 creator A5000651751 @default.
- W2892076218 creator A5005192247 @default.
- W2892076218 creator A5027297585 @default.
- W2892076218 creator A5062896285 @default.
- W2892076218 date "2018-09-06" @default.
- W2892076218 modified "2023-09-27" @default.
- W2892076218 title "Model-Based Stabilisation of Deep Reinforcement Learning." @default.
- W2892076218 hasPublicationYear "2018" @default.
- W2892076218 type Work @default.
- W2892076218 sameAs 2892076218 @default.
- W2892076218 citedByCount "0" @default.
- W2892076218 crossrefType "posted-content" @default.
- W2892076218 hasAuthorship W2892076218A5000651751 @default.
- W2892076218 hasAuthorship W2892076218A5005192247 @default.
- W2892076218 hasAuthorship W2892076218A5027297585 @default.
- W2892076218 hasAuthorship W2892076218A5062896285 @default.
- W2892076218 hasConcept C108583219 @default.
- W2892076218 hasConcept C108710211 @default.
- W2892076218 hasConcept C112972136 @default.
- W2892076218 hasConcept C119857082 @default.
- W2892076218 hasConcept C13280743 @default.
- W2892076218 hasConcept C134306372 @default.
- W2892076218 hasConcept C14036430 @default.
- W2892076218 hasConcept C154945302 @default.
- W2892076218 hasConcept C162324750 @default.
- W2892076218 hasConcept C185592680 @default.
- W2892076218 hasConcept C185798385 @default.
- W2892076218 hasConcept C198531522 @default.
- W2892076218 hasConcept C202444582 @default.
- W2892076218 hasConcept C205649164 @default.
- W2892076218 hasConcept C2524010 @default.
- W2892076218 hasConcept C2777303404 @default.
- W2892076218 hasConcept C28719098 @default.
- W2892076218 hasConcept C33923547 @default.
- W2892076218 hasConcept C36503486 @default.
- W2892076218 hasConcept C41008148 @default.
- W2892076218 hasConcept C43617362 @default.
- W2892076218 hasConcept C50522688 @default.
- W2892076218 hasConcept C78458016 @default.
- W2892076218 hasConcept C86803240 @default.
- W2892076218 hasConcept C9652623 @default.
- W2892076218 hasConcept C97541855 @default.
- W2892076218 hasConceptScore W2892076218C108583219 @default.
- W2892076218 hasConceptScore W2892076218C108710211 @default.
- W2892076218 hasConceptScore W2892076218C112972136 @default.
- W2892076218 hasConceptScore W2892076218C119857082 @default.
- W2892076218 hasConceptScore W2892076218C13280743 @default.
- W2892076218 hasConceptScore W2892076218C134306372 @default.
- W2892076218 hasConceptScore W2892076218C14036430 @default.
- W2892076218 hasConceptScore W2892076218C154945302 @default.
- W2892076218 hasConceptScore W2892076218C162324750 @default.
- W2892076218 hasConceptScore W2892076218C185592680 @default.
- W2892076218 hasConceptScore W2892076218C185798385 @default.
- W2892076218 hasConceptScore W2892076218C198531522 @default.
- W2892076218 hasConceptScore W2892076218C202444582 @default.
- W2892076218 hasConceptScore W2892076218C205649164 @default.
- W2892076218 hasConceptScore W2892076218C2524010 @default.
- W2892076218 hasConceptScore W2892076218C2777303404 @default.
- W2892076218 hasConceptScore W2892076218C28719098 @default.
- W2892076218 hasConceptScore W2892076218C33923547 @default.
- W2892076218 hasConceptScore W2892076218C36503486 @default.
- W2892076218 hasConceptScore W2892076218C41008148 @default.
- W2892076218 hasConceptScore W2892076218C43617362 @default.
- W2892076218 hasConceptScore W2892076218C50522688 @default.
- W2892076218 hasConceptScore W2892076218C78458016 @default.
- W2892076218 hasConceptScore W2892076218C86803240 @default.
- W2892076218 hasConceptScore W2892076218C9652623 @default.
- W2892076218 hasConceptScore W2892076218C97541855 @default.
- W2892076218 hasLocation W28920762181 @default.
- W2892076218 hasOpenAccess W2892076218 @default.
- W2892076218 hasPrimaryLocation W28920762181 @default.
- W2892076218 hasRelatedWork W1552148478 @default.
- W2892076218 hasRelatedWork W2159600763 @default.
- W2892076218 hasRelatedWork W2194966727 @default.
- W2892076218 hasRelatedWork W2542999299 @default.
- W2892076218 hasRelatedWork W2789824229 @default.
- W2892076218 hasRelatedWork W2910568379 @default.
- W2892076218 hasRelatedWork W2950471160 @default.
- W2892076218 hasRelatedWork W2952905979 @default.
- W2892076218 hasRelatedWork W2963704132 @default.
- W2892076218 hasRelatedWork W2979064149 @default.
- W2892076218 hasRelatedWork W2995290757 @default.
- W2892076218 hasRelatedWork W2995638039 @default.
- W2892076218 hasRelatedWork W3004986541 @default.
- W2892076218 hasRelatedWork W3096980216 @default.
- W2892076218 hasRelatedWork W3138923328 @default.
- W2892076218 hasRelatedWork W3152815381 @default.
- W2892076218 hasRelatedWork W3168260200 @default.
- W2892076218 hasRelatedWork W3177100477 @default.
- W2892076218 hasRelatedWork W3212021336 @default.
- W2892076218 hasRelatedWork W3106008061 @default.
- W2892076218 isParatext "false" @default.
- W2892076218 isRetracted "false" @default.
- W2892076218 magId "2892076218" @default.
- W2892076218 workType "article" @default.