SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387560012> ?p ?o ?g. }

Showing items 1 to 71 of 71 with 100 items per page.

W4387560012 abstract "Learning a precise dynamics model can be crucial for offline reinforcement learning, which, unfortunately, has been found to be quite challenging. Dynamics models that are learned by fitting historical transitions often struggle to generalize to unseen transitions. In this study, we identify a hidden but pivotal factor termed dynamics reward that remains consistent across transitions, offering a pathway to better generalization. Therefore, we propose the idea of reward-consistent dynamics models: any trajectory generated by the dynamics model should maximize the dynamics reward derived from the data. We implement this idea as the MOREC (Model-based Offline reinforcement learning with Reward Consistency) method, which can be seamlessly integrated into previous offline model-based reinforcement learning (MBRL) methods. MOREC learns a generalizable dynamics reward function from offline data, which is subsequently employed as a transition filter in any offline MBRL method: when generating transitions, the dynamics model generates a batch of transitions and selects the one with the highest dynamics reward value. On a synthetic task, we visualize that MOREC has a strong generalization ability and can surprisingly recover some distant unseen transitions. On 21 offline tasks in D4RL and NeoRL benchmarks, MOREC improves the previous state-of-the-art performance by a significant margin, i.e., 4.6% on D4RL tasks and 25.9% on NeoRL tasks. Notably, MOREC is the first method that can achieve above 95% online RL performance in 6 out of 12 D4RL tasks and 3 out of 9 NeoRL tasks." @default.
W4387560012 created "2023-10-12" @default.
W4387560012 creator A5001317343 @default.
W4387560012 creator A5011788131 @default.
W4387560012 creator A5012971809 @default.
W4387560012 creator A5024630977 @default.
W4387560012 date "2023-10-09" @default.
W4387560012 modified "2023-10-18" @default.
W4387560012 title "Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning" @default.
W4387560012 doi "https://doi.org/10.48550/arxiv.2310.05422" @default.
W4387560012 hasPublicationYear "2023" @default.
W4387560012 type Work @default.
W4387560012 citedByCount "0" @default.
W4387560012 crossrefType "posted-content" @default.
W4387560012 hasAuthorship W4387560012A5001317343 @default.
W4387560012 hasAuthorship W4387560012A5011788131 @default.
W4387560012 hasAuthorship W4387560012A5012971809 @default.
W4387560012 hasAuthorship W4387560012A5024630977 @default.
W4387560012 hasBestOaLocation W43875600121 @default.
W4387560012 hasConcept C119857082 @default.
W4387560012 hasConcept C127413603 @default.
W4387560012 hasConcept C134306372 @default.
W4387560012 hasConcept C14036430 @default.
W4387560012 hasConcept C145912823 @default.
W4387560012 hasConcept C154945302 @default.
W4387560012 hasConcept C15744967 @default.
W4387560012 hasConcept C177148314 @default.
W4387560012 hasConcept C19417346 @default.
W4387560012 hasConcept C201995342 @default.
W4387560012 hasConcept C2776436953 @default.
W4387560012 hasConcept C2780451532 @default.
W4387560012 hasConcept C33923547 @default.
W4387560012 hasConcept C41008148 @default.
W4387560012 hasConcept C774472 @default.
W4387560012 hasConcept C78458016 @default.
W4387560012 hasConcept C86803240 @default.
W4387560012 hasConcept C97541855 @default.
W4387560012 hasConceptScore W4387560012C119857082 @default.
W4387560012 hasConceptScore W4387560012C127413603 @default.
W4387560012 hasConceptScore W4387560012C134306372 @default.
W4387560012 hasConceptScore W4387560012C14036430 @default.
W4387560012 hasConceptScore W4387560012C145912823 @default.
W4387560012 hasConceptScore W4387560012C154945302 @default.
W4387560012 hasConceptScore W4387560012C15744967 @default.
W4387560012 hasConceptScore W4387560012C177148314 @default.
W4387560012 hasConceptScore W4387560012C19417346 @default.
W4387560012 hasConceptScore W4387560012C201995342 @default.
W4387560012 hasConceptScore W4387560012C2776436953 @default.
W4387560012 hasConceptScore W4387560012C2780451532 @default.
W4387560012 hasConceptScore W4387560012C33923547 @default.
W4387560012 hasConceptScore W4387560012C41008148 @default.
W4387560012 hasConceptScore W4387560012C774472 @default.
W4387560012 hasConceptScore W4387560012C78458016 @default.
W4387560012 hasConceptScore W4387560012C86803240 @default.
W4387560012 hasConceptScore W4387560012C97541855 @default.
W4387560012 hasLocation W43875600121 @default.
W4387560012 hasOpenAccess W4387560012 @default.
W4387560012 hasPrimaryLocation W43875600121 @default.
W4387560012 hasRelatedWork W1508631387 @default.
W4387560012 hasRelatedWork W2017776670 @default.
W4387560012 hasRelatedWork W2347897961 @default.
W4387560012 hasRelatedWork W2370917603 @default.
W4387560012 hasRelatedWork W2952760143 @default.
W4387560012 hasRelatedWork W3125011624 @default.
W4387560012 hasRelatedWork W4306904969 @default.
W4387560012 hasRelatedWork W4362501864 @default.
W4387560012 hasRelatedWork W4377293004 @default.
W4387560012 hasRelatedWork W4380318855 @default.
W4387560012 isParatext "false" @default.
W4387560012 isRetracted "false" @default.
W4387560012 workType "article" @default.