SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386875113> ?p ?o ?g. }

Showing items 1 to 81 of 81 with 100 items per page.

W4386875113 abstract "Model-based reinforcement learning (RL), which learns environment model from offline dataset and generates more out-of-distribution model data, has become an effective approach to the problem of distribution shift in offline RL. Due to the gap between the learned and actual environment, conservatism should be incorporated into the algorithm to balance accurate offline data and imprecise model data. The conservatism of current algorithms mostly relies on model uncertainty estimation. However, uncertainty estimation is unreliable and leads to poor performance in certain scenarios, and the previous methods ignore differences between the model data, which brings great conservatism. Therefore, this paper proposes a milDly cOnservative Model-bAsed offlINe RL algorithm (DOMAIN) without estimating model uncertainty to address the above issues. DOMAIN introduces adaptive sampling distribution of model samples, which can adaptively adjust the model data penalty. In this paper, we theoretically demonstrate that the Q value learned by the DOMAIN outside the region is a lower bound of the true Q value, the DOMAIN is less conservative than previous model-based offline RL algorithms and has the guarantee of security policy improvement. The results of extensive experiments show that DOMAIN outperforms prior RL algorithms on the D4RL dataset benchmark, and achieves better performance than other RL algorithms on tasks that require generalization." @default.
W4386875113 created "2023-09-20" @default.
W4386875113 creator A5012231588 @default.
W4386875113 creator A5019560977 @default.
W4386875113 creator A5035211277 @default.
W4386875113 creator A5036534440 @default.
W4386875113 creator A5038613343 @default.
W4386875113 creator A5038907015 @default.
W4386875113 creator A5045616680 @default.
W4386875113 creator A5068304525 @default.
W4386875113 creator A5081798115 @default.
W4386875113 creator A5083474841 @default.
W4386875113 date "2023-09-16" @default.
W4386875113 modified "2023-09-27" @default.
W4386875113 title "DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning" @default.
W4386875113 doi "https://doi.org/10.48550/arxiv.2309.08925" @default.
W4386875113 hasPublicationYear "2023" @default.
W4386875113 type Work @default.
W4386875113 citedByCount "0" @default.
W4386875113 crossrefType "posted-content" @default.
W4386875113 hasAuthorship W4386875113A5012231588 @default.
W4386875113 hasAuthorship W4386875113A5019560977 @default.
W4386875113 hasAuthorship W4386875113A5035211277 @default.
W4386875113 hasAuthorship W4386875113A5036534440 @default.
W4386875113 hasAuthorship W4386875113A5038613343 @default.
W4386875113 hasAuthorship W4386875113A5038907015 @default.
W4386875113 hasAuthorship W4386875113A5045616680 @default.
W4386875113 hasAuthorship W4386875113A5068304525 @default.
W4386875113 hasAuthorship W4386875113A5081798115 @default.
W4386875113 hasAuthorship W4386875113A5083474841 @default.
W4386875113 hasBestOaLocation W43868751131 @default.
W4386875113 hasConcept C11413529 @default.
W4386875113 hasConcept C119857082 @default.
W4386875113 hasConcept C124101348 @default.
W4386875113 hasConcept C13280743 @default.
W4386875113 hasConcept C134306372 @default.
W4386875113 hasConcept C154945302 @default.
W4386875113 hasConcept C177148314 @default.
W4386875113 hasConcept C17744445 @default.
W4386875113 hasConcept C185798385 @default.
W4386875113 hasConcept C199539241 @default.
W4386875113 hasConcept C205649164 @default.
W4386875113 hasConcept C33923547 @default.
W4386875113 hasConcept C36503486 @default.
W4386875113 hasConcept C41008148 @default.
W4386875113 hasConcept C94625758 @default.
W4386875113 hasConcept C96640997 @default.
W4386875113 hasConcept C97541855 @default.
W4386875113 hasConceptScore W4386875113C11413529 @default.
W4386875113 hasConceptScore W4386875113C119857082 @default.
W4386875113 hasConceptScore W4386875113C124101348 @default.
W4386875113 hasConceptScore W4386875113C13280743 @default.
W4386875113 hasConceptScore W4386875113C134306372 @default.
W4386875113 hasConceptScore W4386875113C154945302 @default.
W4386875113 hasConceptScore W4386875113C177148314 @default.
W4386875113 hasConceptScore W4386875113C17744445 @default.
W4386875113 hasConceptScore W4386875113C185798385 @default.
W4386875113 hasConceptScore W4386875113C199539241 @default.
W4386875113 hasConceptScore W4386875113C205649164 @default.
W4386875113 hasConceptScore W4386875113C33923547 @default.
W4386875113 hasConceptScore W4386875113C36503486 @default.
W4386875113 hasConceptScore W4386875113C41008148 @default.
W4386875113 hasConceptScore W4386875113C94625758 @default.
W4386875113 hasConceptScore W4386875113C96640997 @default.
W4386875113 hasConceptScore W4386875113C97541855 @default.
W4386875113 hasLocation W43868751131 @default.
W4386875113 hasOpenAccess W4386875113 @default.
W4386875113 hasPrimaryLocation W43868751131 @default.
W4386875113 hasRelatedWork W3147214434 @default.
W4386875113 hasRelatedWork W3196841879 @default.
W4386875113 hasRelatedWork W4221150964 @default.
W4386875113 hasRelatedWork W4225747855 @default.
W4386875113 hasRelatedWork W4282813201 @default.
W4386875113 hasRelatedWork W4287327031 @default.
W4386875113 hasRelatedWork W4296474751 @default.
W4386875113 hasRelatedWork W4319083788 @default.
W4386875113 hasRelatedWork W4385488677 @default.
W4386875113 hasRelatedWork W4386072117 @default.
W4386875113 isParatext "false" @default.
W4386875113 isRetracted "false" @default.
W4386875113 workType "article" @default.