Matches in SemOpenAlex for { <https://semopenalex.org/work/W4380558537> ?p ?o ?g. }
Showing items 1 to 67 of
67
with 100 items per page.
- W4380558537 abstract "Offline reinforcement learning (RL) is a learning paradigm where an agent learns from a fixed dataset of experience. However, learning solely from a static dataset can limit the performance due to the lack of exploration. To overcome it, offline-to-online RL combines offline pre-training with online fine-tuning, which enables the agent to further refine its policy by interacting with the environment in real-time. Despite its benefits, existing offline-to-online RL methods suffer from performance degradation and slow improvement during the online phase. To tackle these challenges, we propose a novel framework called Ensemble-based Offline-to-Online (E2O) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Moreover, to expedite online performance enhancement, we appropriately loosen the pessimism of Q-value estimation and incorporate ensemble-based exploration mechanisms into our framework. Experimental results demonstrate that E2O can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods during online fine-tuning on a range of locomotion and navigation tasks, significantly outperforming existing offline-to-online RL methods." @default.
- W4380558537 created "2023-06-14" @default.
- W4380558537 creator A5005604730 @default.
- W4380558537 creator A5012251323 @default.
- W4380558537 creator A5016856595 @default.
- W4380558537 creator A5029760958 @default.
- W4380558537 creator A5082032813 @default.
- W4380558537 date "2023-06-12" @default.
- W4380558537 modified "2023-09-23" @default.
- W4380558537 title "Improving Offline-to-Online Reinforcement Learning with Q-Ensembles" @default.
- W4380558537 doi "https://doi.org/10.48550/arxiv.2306.06871" @default.
- W4380558537 hasPublicationYear "2023" @default.
- W4380558537 type Work @default.
- W4380558537 citedByCount "0" @default.
- W4380558537 crossrefType "posted-content" @default.
- W4380558537 hasAuthorship W4380558537A5005604730 @default.
- W4380558537 hasAuthorship W4380558537A5012251323 @default.
- W4380558537 hasAuthorship W4380558537A5016856595 @default.
- W4380558537 hasAuthorship W4380558537A5029760958 @default.
- W4380558537 hasAuthorship W4380558537A5082032813 @default.
- W4380558537 hasBestOaLocation W43805585371 @default.
- W4380558537 hasConcept C111919701 @default.
- W4380558537 hasConcept C112972136 @default.
- W4380558537 hasConcept C119857082 @default.
- W4380558537 hasConcept C127413603 @default.
- W4380558537 hasConcept C146978453 @default.
- W4380558537 hasConcept C154945302 @default.
- W4380558537 hasConcept C196921405 @default.
- W4380558537 hasConcept C199360897 @default.
- W4380558537 hasConcept C204323151 @default.
- W4380558537 hasConcept C2780102126 @default.
- W4380558537 hasConcept C2780490138 @default.
- W4380558537 hasConcept C2986087404 @default.
- W4380558537 hasConcept C41008148 @default.
- W4380558537 hasConcept C49774154 @default.
- W4380558537 hasConcept C97541855 @default.
- W4380558537 hasConceptScore W4380558537C111919701 @default.
- W4380558537 hasConceptScore W4380558537C112972136 @default.
- W4380558537 hasConceptScore W4380558537C119857082 @default.
- W4380558537 hasConceptScore W4380558537C127413603 @default.
- W4380558537 hasConceptScore W4380558537C146978453 @default.
- W4380558537 hasConceptScore W4380558537C154945302 @default.
- W4380558537 hasConceptScore W4380558537C196921405 @default.
- W4380558537 hasConceptScore W4380558537C199360897 @default.
- W4380558537 hasConceptScore W4380558537C204323151 @default.
- W4380558537 hasConceptScore W4380558537C2780102126 @default.
- W4380558537 hasConceptScore W4380558537C2780490138 @default.
- W4380558537 hasConceptScore W4380558537C2986087404 @default.
- W4380558537 hasConceptScore W4380558537C41008148 @default.
- W4380558537 hasConceptScore W4380558537C49774154 @default.
- W4380558537 hasConceptScore W4380558537C97541855 @default.
- W4380558537 hasLocation W43805585371 @default.
- W4380558537 hasOpenAccess W4380558537 @default.
- W4380558537 hasPrimaryLocation W43805585371 @default.
- W4380558537 hasRelatedWork W3138976442 @default.
- W4380558537 hasRelatedWork W3206880326 @default.
- W4380558537 hasRelatedWork W4225619808 @default.
- W4380558537 hasRelatedWork W4308759076 @default.
- W4380558537 hasRelatedWork W4312614522 @default.
- W4380558537 hasRelatedWork W4361865079 @default.
- W4380558537 hasRelatedWork W4377161554 @default.
- W4380558537 hasRelatedWork W4380558537 @default.
- W4380558537 hasRelatedWork W4381248033 @default.
- W4380558537 hasRelatedWork W4382463554 @default.
- W4380558537 isParatext "false" @default.
- W4380558537 isRetracted "false" @default.
- W4380558537 workType "article" @default.