Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386869676> ?p ?o ?g. }
Showing items 1 to 76 of
76
with 100 items per page.
- W4386869676 endingPage "16" @default.
- W4386869676 startingPage "1" @default.
- W4386869676 abstract "In reinforcement learning, a promising direction to avoid online trial-and-error costs is learning from an offline dataset. Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies. Such constraints, however, also limit the potential of the outcome policies. In this paper, to release the potential of offline policy learning, we investigate the decision-making problems in out-of-support regions directly and propose offline Model-based Adaptable Policy LEarning (MAPLE). By this approach, instead of learning in in-support regions, we learn an adaptable policy that can adapt its behavior in out-of-support regions when deployed. We give a practical implementation of MAPLE via meta-learning techniques and ensemble model learning techniques. We conduct experiments on MuJoCo locomotion tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms." @default.
- W4386869676 created "2023-09-20" @default.
- W4386869676 creator A5002985691 @default.
- W4386869676 creator A5016024688 @default.
- W4386869676 creator A5019686106 @default.
- W4386869676 creator A5024630977 @default.
- W4386869676 creator A5048915407 @default.
- W4386869676 creator A5059867781 @default.
- W4386869676 creator A5085579946 @default.
- W4386869676 date "2023-01-01" @default.
- W4386869676 modified "2023-09-27" @default.
- W4386869676 title "Offline Model-Based Adaptable Policy Learning for Decision-Making in Out-of-Support Regions" @default.
- W4386869676 doi "https://doi.org/10.1109/tpami.2023.3317131" @default.
- W4386869676 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/37725727" @default.
- W4386869676 hasPublicationYear "2023" @default.
- W4386869676 type Work @default.
- W4386869676 citedByCount "0" @default.
- W4386869676 crossrefType "journal-article" @default.
- W4386869676 hasAuthorship W4386869676A5002985691 @default.
- W4386869676 hasAuthorship W4386869676A5016024688 @default.
- W4386869676 hasAuthorship W4386869676A5019686106 @default.
- W4386869676 hasAuthorship W4386869676A5024630977 @default.
- W4386869676 hasAuthorship W4386869676A5048915407 @default.
- W4386869676 hasAuthorship W4386869676A5059867781 @default.
- W4386869676 hasAuthorship W4386869676A5085579946 @default.
- W4386869676 hasConcept C104317684 @default.
- W4386869676 hasConcept C119857082 @default.
- W4386869676 hasConcept C136764020 @default.
- W4386869676 hasConcept C144237770 @default.
- W4386869676 hasConcept C148220186 @default.
- W4386869676 hasConcept C154945302 @default.
- W4386869676 hasConcept C185592680 @default.
- W4386869676 hasConcept C2780490138 @default.
- W4386869676 hasConcept C2986087404 @default.
- W4386869676 hasConcept C33923547 @default.
- W4386869676 hasConcept C41008148 @default.
- W4386869676 hasConcept C45942800 @default.
- W4386869676 hasConcept C55493867 @default.
- W4386869676 hasConcept C63479239 @default.
- W4386869676 hasConcept C84525736 @default.
- W4386869676 hasConcept C97541855 @default.
- W4386869676 hasConceptScore W4386869676C104317684 @default.
- W4386869676 hasConceptScore W4386869676C119857082 @default.
- W4386869676 hasConceptScore W4386869676C136764020 @default.
- W4386869676 hasConceptScore W4386869676C144237770 @default.
- W4386869676 hasConceptScore W4386869676C148220186 @default.
- W4386869676 hasConceptScore W4386869676C154945302 @default.
- W4386869676 hasConceptScore W4386869676C185592680 @default.
- W4386869676 hasConceptScore W4386869676C2780490138 @default.
- W4386869676 hasConceptScore W4386869676C2986087404 @default.
- W4386869676 hasConceptScore W4386869676C33923547 @default.
- W4386869676 hasConceptScore W4386869676C41008148 @default.
- W4386869676 hasConceptScore W4386869676C45942800 @default.
- W4386869676 hasConceptScore W4386869676C55493867 @default.
- W4386869676 hasConceptScore W4386869676C63479239 @default.
- W4386869676 hasConceptScore W4386869676C84525736 @default.
- W4386869676 hasConceptScore W4386869676C97541855 @default.
- W4386869676 hasLocation W43868696761 @default.
- W4386869676 hasLocation W43868696762 @default.
- W4386869676 hasOpenAccess W4386869676 @default.
- W4386869676 hasPrimaryLocation W43868696761 @default.
- W4386869676 hasRelatedWork W1470425429 @default.
- W4386869676 hasRelatedWork W2997671848 @default.
- W4386869676 hasRelatedWork W3136871737 @default.
- W4386869676 hasRelatedWork W4205478082 @default.
- W4386869676 hasRelatedWork W4285046548 @default.
- W4386869676 hasRelatedWork W4285741730 @default.
- W4386869676 hasRelatedWork W4313488044 @default.
- W4386869676 hasRelatedWork W4318350883 @default.
- W4386869676 hasRelatedWork W4319083788 @default.
- W4386869676 hasRelatedWork W4328134586 @default.
- W4386869676 isParatext "false" @default.
- W4386869676 isRetracted "false" @default.
- W4386869676 workType "article" @default.