Matches in SemOpenAlex for { <https://semopenalex.org/work/W4308166345> ?p ?o ?g. }
Showing items 1 to 80 of
80
with 100 items per page.
- W4308166345 abstract "Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR carries out performance guarantees when integrated into algorithms that have either policy improvement guarantees (conservative algorithms) or produce lower bounds of the policy values (pessimistic algorithms). Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks. The code is available at url{https://github.com/bit1029public/offline_bpr}." @default.
- W4308166345 created "2022-11-08" @default.
- W4308166345 creator A5003360183 @default.
- W4308166345 creator A5022526821 @default.
- W4308166345 creator A5031746269 @default.
- W4308166345 creator A5057136360 @default.
- W4308166345 creator A5079926596 @default.
- W4308166345 creator A5080042640 @default.
- W4308166345 creator A5089214987 @default.
- W4308166345 date "2022-11-02" @default.
- W4308166345 modified "2023-10-18" @default.
- W4308166345 title "Behavior Prior Representation learning for Offline Reinforcement Learning" @default.
- W4308166345 doi "https://doi.org/10.48550/arxiv.2211.00863" @default.
- W4308166345 hasPublicationYear "2022" @default.
- W4308166345 type Work @default.
- W4308166345 citedByCount "1" @default.
- W4308166345 countsByYear W43081663452023 @default.
- W4308166345 crossrefType "posted-content" @default.
- W4308166345 hasAuthorship W4308166345A5003360183 @default.
- W4308166345 hasAuthorship W4308166345A5022526821 @default.
- W4308166345 hasAuthorship W4308166345A5031746269 @default.
- W4308166345 hasAuthorship W4308166345A5057136360 @default.
- W4308166345 hasAuthorship W4308166345A5079926596 @default.
- W4308166345 hasAuthorship W4308166345A5080042640 @default.
- W4308166345 hasAuthorship W4308166345A5089214987 @default.
- W4308166345 hasBestOaLocation W43081663451 @default.
- W4308166345 hasConcept C111919701 @default.
- W4308166345 hasConcept C11413529 @default.
- W4308166345 hasConcept C119857082 @default.
- W4308166345 hasConcept C137335462 @default.
- W4308166345 hasConcept C154945302 @default.
- W4308166345 hasConcept C162324750 @default.
- W4308166345 hasConcept C177264268 @default.
- W4308166345 hasConcept C17744445 @default.
- W4308166345 hasConcept C199360897 @default.
- W4308166345 hasConcept C199539241 @default.
- W4308166345 hasConcept C21547014 @default.
- W4308166345 hasConcept C2776359362 @default.
- W4308166345 hasConcept C2776760102 @default.
- W4308166345 hasConcept C2780102126 @default.
- W4308166345 hasConcept C29143872 @default.
- W4308166345 hasConcept C41008148 @default.
- W4308166345 hasConcept C48103436 @default.
- W4308166345 hasConcept C94625758 @default.
- W4308166345 hasConcept C97541855 @default.
- W4308166345 hasConceptScore W4308166345C111919701 @default.
- W4308166345 hasConceptScore W4308166345C11413529 @default.
- W4308166345 hasConceptScore W4308166345C119857082 @default.
- W4308166345 hasConceptScore W4308166345C137335462 @default.
- W4308166345 hasConceptScore W4308166345C154945302 @default.
- W4308166345 hasConceptScore W4308166345C162324750 @default.
- W4308166345 hasConceptScore W4308166345C177264268 @default.
- W4308166345 hasConceptScore W4308166345C17744445 @default.
- W4308166345 hasConceptScore W4308166345C199360897 @default.
- W4308166345 hasConceptScore W4308166345C199539241 @default.
- W4308166345 hasConceptScore W4308166345C21547014 @default.
- W4308166345 hasConceptScore W4308166345C2776359362 @default.
- W4308166345 hasConceptScore W4308166345C2776760102 @default.
- W4308166345 hasConceptScore W4308166345C2780102126 @default.
- W4308166345 hasConceptScore W4308166345C29143872 @default.
- W4308166345 hasConceptScore W4308166345C41008148 @default.
- W4308166345 hasConceptScore W4308166345C48103436 @default.
- W4308166345 hasConceptScore W4308166345C94625758 @default.
- W4308166345 hasConceptScore W4308166345C97541855 @default.
- W4308166345 hasLocation W43081663451 @default.
- W4308166345 hasOpenAccess W4308166345 @default.
- W4308166345 hasPrimaryLocation W43081663451 @default.
- W4308166345 hasRelatedWork W2923653485 @default.
- W4308166345 hasRelatedWork W2990162188 @default.
- W4308166345 hasRelatedWork W3022038857 @default.
- W4308166345 hasRelatedWork W3034786558 @default.
- W4308166345 hasRelatedWork W3173984942 @default.
- W4308166345 hasRelatedWork W3214578249 @default.
- W4308166345 hasRelatedWork W4224863981 @default.
- W4308166345 hasRelatedWork W4311991951 @default.
- W4308166345 hasRelatedWork W4318621078 @default.
- W4308166345 hasRelatedWork W4319083788 @default.
- W4308166345 isParatext "false" @default.
- W4308166345 isRetracted "false" @default.
- W4308166345 workType "article" @default.