Matches in SemOpenAlex for { <https://semopenalex.org/work/W3048481719> ?p ?o ?g. }
- W3048481719 abstract "State-of-the-art reinforcement learning algorithms mostly rely on being allowed to directly interact with their environment to collect millions of observations. This makes it hard to transfer their success to industrial control problems, where simulations are often very costly or do not exist, and exploring in the real environment can potentially lead to catastrophic events. Recently developed, model-free, offline RL algorithms, can learn from a single dataset (containing limited exploration) by mitigating extrapolation error in value functions. However, the robustness of the training process is still comparatively low, a problem known from methods using value functions. To improve robustness and stability of the learning process, we use dynamics models to assess policy performance instead of value functions, resulting in MOOSE (MOdel-based Offline policy Search with Ensembles), an algorithm which ensures low model bias by keeping the policy within the support of the data. We compare MOOSE with state-of-the-art model-free, offline RL algorithms { BRAC,} BEAR and BCQ on the Industrial Benchmark and MuJoCo continuous control tasks in terms of robust performance, and find that MOOSE outperforms its model-free counterparts in almost all considered cases, often even by far." @default.
- W3048481719 created "2020-08-18" @default.
- W3048481719 creator A5002518694 @default.
- W3048481719 creator A5005487112 @default.
- W3048481719 creator A5035246650 @default.
- W3048481719 date "2020-08-12" @default.
- W3048481719 modified "2023-09-26" @default.
- W3048481719 title "Overcoming Model Bias for Robust Offline Deep Reinforcement Learning." @default.
- W3048481719 cites W1491843047 @default.
- W3048481719 cites W1522301498 @default.
- W3048481719 cites W1552327263 @default.
- W3048481719 cites W166862392 @default.
- W3048481719 cites W1757796397 @default.
- W3048481719 cites W1771410628 @default.
- W3048481719 cites W192920577 @default.
- W3048481719 cites W1982262386 @default.
- W3048481719 cites W2019671144 @default.
- W3048481719 cites W2061562262 @default.
- W3048481719 cites W2061868368 @default.
- W3048481719 cites W2119717200 @default.
- W3048481719 cites W2120346334 @default.
- W3048481719 cites W2127412976 @default.
- W3048481719 cites W2130005627 @default.
- W3048481719 cites W2140135625 @default.
- W3048481719 cites W2142641780 @default.
- W3048481719 cites W2151702863 @default.
- W3048481719 cites W2158782408 @default.
- W3048481719 cites W2160308170 @default.
- W3048481719 cites W2165150801 @default.
- W3048481719 cites W2173248099 @default.
- W3048481719 cites W2212660284 @default.
- W3048481719 cites W2396820603 @default.
- W3048481719 cites W2404067440 @default.
- W3048481719 cites W2556958149 @default.
- W3048481719 cites W2733961795 @default.
- W3048481719 cites W2736601468 @default.
- W3048481719 cites W2774527530 @default.
- W3048481719 cites W2780045768 @default.
- W3048481719 cites W2781726626 @default.
- W3048481719 cites W2785389871 @default.
- W3048481719 cites W2787757704 @default.
- W3048481719 cites W2803308811 @default.
- W3048481719 cites W2949608212 @default.
- W3048481719 cites W2951004968 @default.
- W3048481719 cites W2953021786 @default.
- W3048481719 cites W2953981431 @default.
- W3048481719 cites W2958416396 @default.
- W3048481719 cites W2962872206 @default.
- W3048481719 cites W2963099939 @default.
- W3048481719 cites W2963276097 @default.
- W3048481719 cites W2963277051 @default.
- W3048481719 cites W2963672746 @default.
- W3048481719 cites W2963685250 @default.
- W3048481719 cites W2963704132 @default.
- W3048481719 cites W2964001908 @default.
- W3048481719 cites W2971262355 @default.
- W3048481719 cites W2974778612 @default.
- W3048481719 cites W2979211489 @default.
- W3048481719 cites W3007369745 @default.
- W3048481719 cites W3009593063 @default.
- W3048481719 cites W3015838160 @default.
- W3048481719 cites W3025606523 @default.
- W3048481719 cites W3028766998 @default.
- W3048481719 cites W3035267056 @default.
- W3048481719 cites W3037207827 @default.
- W3048481719 hasPublicationYear "2020" @default.
- W3048481719 type Work @default.
- W3048481719 sameAs 3048481719 @default.
- W3048481719 citedByCount "6" @default.
- W3048481719 countsByYear W30484817192020 @default.
- W3048481719 countsByYear W30484817192021 @default.
- W3048481719 crossrefType "posted-content" @default.
- W3048481719 hasAuthorship W3048481719A5002518694 @default.
- W3048481719 hasAuthorship W3048481719A5005487112 @default.
- W3048481719 hasAuthorship W3048481719A5035246650 @default.
- W3048481719 hasConcept C104317684 @default.
- W3048481719 hasConcept C119857082 @default.
- W3048481719 hasConcept C132459708 @default.
- W3048481719 hasConcept C13280743 @default.
- W3048481719 hasConcept C134306372 @default.
- W3048481719 hasConcept C154945302 @default.
- W3048481719 hasConcept C185592680 @default.
- W3048481719 hasConcept C185798385 @default.
- W3048481719 hasConcept C205649164 @default.
- W3048481719 hasConcept C33923547 @default.
- W3048481719 hasConcept C41008148 @default.
- W3048481719 hasConcept C55493867 @default.
- W3048481719 hasConcept C63479239 @default.
- W3048481719 hasConcept C97541855 @default.
- W3048481719 hasConceptScore W3048481719C104317684 @default.
- W3048481719 hasConceptScore W3048481719C119857082 @default.
- W3048481719 hasConceptScore W3048481719C132459708 @default.
- W3048481719 hasConceptScore W3048481719C13280743 @default.
- W3048481719 hasConceptScore W3048481719C134306372 @default.
- W3048481719 hasConceptScore W3048481719C154945302 @default.
- W3048481719 hasConceptScore W3048481719C185592680 @default.
- W3048481719 hasConceptScore W3048481719C185798385 @default.
- W3048481719 hasConceptScore W3048481719C205649164 @default.
- W3048481719 hasConceptScore W3048481719C33923547 @default.
- W3048481719 hasConceptScore W3048481719C41008148 @default.