Matches in SemOpenAlex for { <https://semopenalex.org/work/W4379924892> ?p ?o ?g. }
Showing items 1 to 67 of
67
with 100 items per page.
- W4379924892 abstract "Offline reinforcement learning (RL) methodologies enforce constraints on the policy to adhere closely to the behavior policy, thereby stabilizing value learning and mitigating the selection of out-of-distribution (OOD) actions during test time. Conventional approaches apply identical constraints for both value learning and test time inference. However, our findings indicate that the constraints suitable for value estimation may in fact be excessively restrictive for action selection during test time. To address this issue, we propose a Mildly Constrained Evaluation Policy (MCEP) for test time inference with a more constrained target policy for value estimation. Since the target policy has been adopted in various prior approaches, MCEP can be seamlessly integrated with them as a plug-in. We instantiate MCEP based on TD3-BC [Fujimoto and Gu, 2021] and AWAC [Nair et al., 2020] algorithms. The empirical results on MuJoCo locomotion tasks show that the MCEP significantly outperforms the target policy and achieves competitive results to state-of-the-art offline RL methods. The codes are open-sourced at https://github.com/egg-west/MCEP.git." @default.
- W4379924892 created "2023-06-09" @default.
- W4379924892 creator A5007508021 @default.
- W4379924892 creator A5018864173 @default.
- W4379924892 creator A5030951014 @default.
- W4379924892 creator A5048947860 @default.
- W4379924892 creator A5052124868 @default.
- W4379924892 date "2023-06-06" @default.
- W4379924892 modified "2023-10-16" @default.
- W4379924892 title "Mildly Constrained Evaluation Policy for Offline Reinforcement Learning" @default.
- W4379924892 doi "https://doi.org/10.48550/arxiv.2306.03680" @default.
- W4379924892 hasPublicationYear "2023" @default.
- W4379924892 type Work @default.
- W4379924892 citedByCount "0" @default.
- W4379924892 crossrefType "posted-content" @default.
- W4379924892 hasAuthorship W4379924892A5007508021 @default.
- W4379924892 hasAuthorship W4379924892A5018864173 @default.
- W4379924892 hasAuthorship W4379924892A5030951014 @default.
- W4379924892 hasAuthorship W4379924892A5048947860 @default.
- W4379924892 hasAuthorship W4379924892A5052124868 @default.
- W4379924892 hasBestOaLocation W43799248921 @default.
- W4379924892 hasConcept C119857082 @default.
- W4379924892 hasConcept C136764020 @default.
- W4379924892 hasConcept C154945302 @default.
- W4379924892 hasConcept C166109690 @default.
- W4379924892 hasConcept C169760540 @default.
- W4379924892 hasConcept C188116033 @default.
- W4379924892 hasConcept C26760741 @default.
- W4379924892 hasConcept C2776214188 @default.
- W4379924892 hasConcept C2776291640 @default.
- W4379924892 hasConcept C2780490138 @default.
- W4379924892 hasConcept C2986087404 @default.
- W4379924892 hasConcept C41008148 @default.
- W4379924892 hasConcept C81917197 @default.
- W4379924892 hasConcept C86803240 @default.
- W4379924892 hasConcept C97541855 @default.
- W4379924892 hasConceptScore W4379924892C119857082 @default.
- W4379924892 hasConceptScore W4379924892C136764020 @default.
- W4379924892 hasConceptScore W4379924892C154945302 @default.
- W4379924892 hasConceptScore W4379924892C166109690 @default.
- W4379924892 hasConceptScore W4379924892C169760540 @default.
- W4379924892 hasConceptScore W4379924892C188116033 @default.
- W4379924892 hasConceptScore W4379924892C26760741 @default.
- W4379924892 hasConceptScore W4379924892C2776214188 @default.
- W4379924892 hasConceptScore W4379924892C2776291640 @default.
- W4379924892 hasConceptScore W4379924892C2780490138 @default.
- W4379924892 hasConceptScore W4379924892C2986087404 @default.
- W4379924892 hasConceptScore W4379924892C41008148 @default.
- W4379924892 hasConceptScore W4379924892C81917197 @default.
- W4379924892 hasConceptScore W4379924892C86803240 @default.
- W4379924892 hasConceptScore W4379924892C97541855 @default.
- W4379924892 hasLocation W43799248921 @default.
- W4379924892 hasOpenAccess W4379924892 @default.
- W4379924892 hasPrimaryLocation W43799248921 @default.
- W4379924892 hasRelatedWork W1587318060 @default.
- W4379924892 hasRelatedWork W2041176007 @default.
- W4379924892 hasRelatedWork W2120968583 @default.
- W4379924892 hasRelatedWork W2123899227 @default.
- W4379924892 hasRelatedWork W2923653485 @default.
- W4379924892 hasRelatedWork W2935889419 @default.
- W4379924892 hasRelatedWork W3022038857 @default.
- W4379924892 hasRelatedWork W39810663 @default.
- W4379924892 hasRelatedWork W4319083788 @default.
- W4379924892 hasRelatedWork W66717747 @default.
- W4379924892 isParatext "false" @default.
- W4379924892 isRetracted "false" @default.
- W4379924892 workType "article" @default.