Matches in SemOpenAlex for { <https://semopenalex.org/work/W2785389871> ?p ?o ?g. }
- W2785389871 abstract "Model-free reinforcement learning (RL) methods are succeeding in a growing number of tasks, aided by recent advances in deep learning. However, they tend to suffer from high sample complexity, which hinders their use in real-world domains. Alternatively, model-based reinforcement learning promises to reduce sample complexity, but tends to require careful tuning and to date have succeeded mainly in restrictive domains where simple models are sufficient for learning. In this paper, we analyze the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and show that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training. To overcome this issue, we propose to use an ensemble of models to maintain the model uncertainty and regularize the learning process. We further show that the use of likelihood ratio derivatives yields much more stable learning than backpropagation through time. Altogether, our approach Model-Ensemble Trust-Region Policy Optimization (ME-TRPO) significantly reduces the sample complexity compared to model-free deep RL methods on challenging continuous control benchmark tasks." @default.
- W2785389871 created "2018-02-23" @default.
- W2785389871 creator A5020699288 @default.
- W2785389871 creator A5027941146 @default.
- W2785389871 creator A5049349154 @default.
- W2785389871 creator A5073533020 @default.
- W2785389871 creator A5087643379 @default.
- W2785389871 date "2018-02-28" @default.
- W2785389871 modified "2023-09-27" @default.
- W2785389871 title "Model-Ensemble Trust-Region Policy Optimization" @default.
- W2785389871 cites W1491843047 @default.
- W2785389871 cites W1522301498 @default.
- W2785389871 cites W1980035368 @default.
- W2785389871 cites W2117629901 @default.
- W2785389871 cites W2118688707 @default.
- W2785389871 cites W2121103318 @default.
- W2785389871 cites W2127107099 @default.
- W2785389871 cites W2130105540 @default.
- W2785389871 cites W2132602063 @default.
- W2785389871 cites W2139769245 @default.
- W2785389871 cites W2140135625 @default.
- W2785389871 cites W2145339207 @default.
- W2785389871 cites W2151268438 @default.
- W2785389871 cites W2153244676 @default.
- W2785389871 cites W2201912979 @default.
- W2785389871 cites W2257979135 @default.
- W2785389871 cites W2281096776 @default.
- W2785389871 cites W2293467699 @default.
- W2785389871 cites W2342662072 @default.
- W2785389871 cites W2416041116 @default.
- W2785389871 cites W2528489519 @default.
- W2785389871 cites W2592285981 @default.
- W2785389871 cites W2595180411 @default.
- W2785389871 cites W2766447205 @default.
- W2785389871 cites W2949608212 @default.
- W2785389871 cites W2962872206 @default.
- W2785389871 cites W2963184621 @default.
- W2785389871 cites W2963280855 @default.
- W2785389871 cites W2963430173 @default.
- W2785389871 cites W2964161785 @default.
- W2785389871 cites W2964174623 @default.
- W2785389871 hasPublicationYear "2018" @default.
- W2785389871 type Work @default.
- W2785389871 sameAs 2785389871 @default.
- W2785389871 citedByCount "104" @default.
- W2785389871 countsByYear W27853898712018 @default.
- W2785389871 countsByYear W27853898712019 @default.
- W2785389871 countsByYear W27853898712020 @default.
- W2785389871 countsByYear W27853898712021 @default.
- W2785389871 countsByYear W27853898712022 @default.
- W2785389871 crossrefType "posted-content" @default.
- W2785389871 hasAuthorship W2785389871A5020699288 @default.
- W2785389871 hasAuthorship W2785389871A5027941146 @default.
- W2785389871 hasAuthorship W2785389871A5049349154 @default.
- W2785389871 hasAuthorship W2785389871A5073533020 @default.
- W2785389871 hasAuthorship W2785389871A5087643379 @default.
- W2785389871 hasConcept C108583219 @default.
- W2785389871 hasConcept C111919701 @default.
- W2785389871 hasConcept C112972136 @default.
- W2785389871 hasConcept C119857082 @default.
- W2785389871 hasConcept C13280743 @default.
- W2785389871 hasConcept C154945302 @default.
- W2785389871 hasConcept C155032097 @default.
- W2785389871 hasConcept C165696696 @default.
- W2785389871 hasConcept C178635117 @default.
- W2785389871 hasConcept C185592680 @default.
- W2785389871 hasConcept C185798385 @default.
- W2785389871 hasConcept C198531522 @default.
- W2785389871 hasConcept C205649164 @default.
- W2785389871 hasConcept C38652104 @default.
- W2785389871 hasConcept C41008148 @default.
- W2785389871 hasConcept C43617362 @default.
- W2785389871 hasConcept C45942800 @default.
- W2785389871 hasConcept C50644808 @default.
- W2785389871 hasConcept C89109886 @default.
- W2785389871 hasConcept C97541855 @default.
- W2785389871 hasConcept C98045186 @default.
- W2785389871 hasConceptScore W2785389871C108583219 @default.
- W2785389871 hasConceptScore W2785389871C111919701 @default.
- W2785389871 hasConceptScore W2785389871C112972136 @default.
- W2785389871 hasConceptScore W2785389871C119857082 @default.
- W2785389871 hasConceptScore W2785389871C13280743 @default.
- W2785389871 hasConceptScore W2785389871C154945302 @default.
- W2785389871 hasConceptScore W2785389871C155032097 @default.
- W2785389871 hasConceptScore W2785389871C165696696 @default.
- W2785389871 hasConceptScore W2785389871C178635117 @default.
- W2785389871 hasConceptScore W2785389871C185592680 @default.
- W2785389871 hasConceptScore W2785389871C185798385 @default.
- W2785389871 hasConceptScore W2785389871C198531522 @default.
- W2785389871 hasConceptScore W2785389871C205649164 @default.
- W2785389871 hasConceptScore W2785389871C38652104 @default.
- W2785389871 hasConceptScore W2785389871C41008148 @default.
- W2785389871 hasConceptScore W2785389871C43617362 @default.
- W2785389871 hasConceptScore W2785389871C45942800 @default.
- W2785389871 hasConceptScore W2785389871C50644808 @default.
- W2785389871 hasConceptScore W2785389871C89109886 @default.
- W2785389871 hasConceptScore W2785389871C97541855 @default.
- W2785389871 hasConceptScore W2785389871C98045186 @default.
- W2785389871 hasLocation W27853898711 @default.
- W2785389871 hasOpenAccess W2785389871 @default.