Matches in SemOpenAlex for { <https://semopenalex.org/work/W2947861305> ?p ?o ?g. }
- W2947861305 abstract "Model-based reinforcement learning is an appealing framework for creating agents that learn, plan, and act in sequential environments. Model-based algorithms typically involve learning a transition model that takes a state and an action and outputs the next state---a one-step model. This model can be composed with itself to enable predicting multiple steps into the future, but one-step prediction errors can get magnified, leading to unacceptable inaccuracy. This compounding-error problem plagues planning and undermines model-based reinforcement learning. In this paper, we address the compounding-error problem by introducing a multi-step model that directly outputs the outcome of executing a sequence of actions. Novel theoretical and empirical results indicate that the multi-step model is more conducive to efficient value-function estimation, and it yields better action selection compared to the one-step model. These results make a strong case for using multi-step models in the context of model-based reinforcement learning." @default.
- W2947861305 created "2019-06-07" @default.
- W2947861305 creator A5009722403 @default.
- W2947861305 creator A5037667167 @default.
- W2947861305 creator A5060909278 @default.
- W2947861305 creator A5088218114 @default.
- W2947861305 date "2019-05-30" @default.
- W2947861305 modified "2023-09-23" @default.
- W2947861305 title "Combating the Compounding-Error Problem with a Multi-step Model." @default.
- W2947861305 cites W1491843047 @default.
- W2947861305 cites W1505937442 @default.
- W2947861305 cites W1514587017 @default.
- W2947861305 cites W1526654727 @default.
- W2947861305 cites W1546851689 @default.
- W2947861305 cites W1594216983 @default.
- W2947861305 cites W1625390266 @default.
- W2947861305 cites W1757796397 @default.
- W2947861305 cites W1758031947 @default.
- W2947861305 cites W1870822514 @default.
- W2947861305 cites W1940174779 @default.
- W2947861305 cites W1944672 @default.
- W2947861305 cites W1953057174 @default.
- W2947861305 cites W1980264541 @default.
- W2947861305 cites W1981031568 @default.
- W2947861305 cites W2048226872 @default.
- W2947861305 cites W2049633694 @default.
- W2947861305 cites W2109910161 @default.
- W2947861305 cites W2113913482 @default.
- W2947861305 cites W2115008305 @default.
- W2947861305 cites W2119567691 @default.
- W2947861305 cites W2121103318 @default.
- W2947861305 cites W2121863487 @default.
- W2947861305 cites W2122410182 @default.
- W2947861305 cites W2130535800 @default.
- W2947861305 cites W2132602063 @default.
- W2947861305 cites W2140135625 @default.
- W2947861305 cites W2145860152 @default.
- W2947861305 cites W2155027007 @default.
- W2947861305 cites W2156067405 @default.
- W2947861305 cites W2168020168 @default.
- W2947861305 cites W2257979135 @default.
- W2947861305 cites W2268617045 @default.
- W2947861305 cites W2404689820 @default.
- W2947861305 cites W2489939061 @default.
- W2947861305 cites W2579923771 @default.
- W2947861305 cites W2593237273 @default.
- W2947861305 cites W2610719643 @default.
- W2947861305 cites W2618318883 @default.
- W2947861305 cites W2753511062 @default.
- W2947861305 cites W2788953735 @default.
- W2947861305 cites W2807366257 @default.
- W2947861305 cites W2808335646 @default.
- W2947861305 cites W2859967432 @default.
- W2947861305 cites W2898585858 @default.
- W2947861305 cites W2901345551 @default.
- W2947861305 cites W2951326042 @default.
- W2947861305 cites W2962708723 @default.
- W2947861305 cites W2962841471 @default.
- W2947861305 cites W2962872206 @default.
- W2947861305 cites W2962951703 @default.
- W2947861305 cites W2962957031 @default.
- W2947861305 cites W2963395712 @default.
- W2947861305 cites W2963403143 @default.
- W2947861305 cites W2963521487 @default.
- W2947861305 cites W2963604043 @default.
- W2947861305 cites W2963846183 @default.
- W2947861305 cites W2964043796 @default.
- W2947861305 cites W2964299116 @default.
- W2947861305 cites W2965004202 @default.
- W2947861305 cites W3139377883 @default.
- W2947861305 cites W607505555 @default.
- W2947861305 cites W648786980 @default.
- W2947861305 hasPublicationYear "2019" @default.
- W2947861305 type Work @default.
- W2947861305 sameAs 2947861305 @default.
- W2947861305 citedByCount "14" @default.
- W2947861305 countsByYear W29478613052019 @default.
- W2947861305 countsByYear W29478613052020 @default.
- W2947861305 countsByYear W29478613052021 @default.
- W2947861305 crossrefType "posted-content" @default.
- W2947861305 hasAuthorship W2947861305A5009722403 @default.
- W2947861305 hasAuthorship W2947861305A5037667167 @default.
- W2947861305 hasAuthorship W2947861305A5060909278 @default.
- W2947861305 hasAuthorship W2947861305A5088218114 @default.
- W2947861305 hasConcept C119857082 @default.
- W2947861305 hasConcept C121332964 @default.
- W2947861305 hasConcept C126255220 @default.
- W2947861305 hasConcept C14036430 @default.
- W2947861305 hasConcept C14646407 @default.
- W2947861305 hasConcept C151730666 @default.
- W2947861305 hasConcept C154945302 @default.
- W2947861305 hasConcept C159110408 @default.
- W2947861305 hasConcept C166109690 @default.
- W2947861305 hasConcept C166957645 @default.
- W2947861305 hasConcept C169760540 @default.
- W2947861305 hasConcept C207673951 @default.
- W2947861305 hasConcept C26760741 @default.
- W2947861305 hasConcept C2776505523 @default.
- W2947861305 hasConcept C2778112365 @default.
- W2947861305 hasConcept C2779343474 @default.