Matches in SemOpenAlex for { <https://semopenalex.org/work/W4382173453> ?p ?o ?g. }
Showing items 1 to 77 of
77
with 100 items per page.
- W4382173453 abstract "Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected and labeled datasets, which eliminates the time-consuming data collection in online RL. However, offline RL still bears a large burden of specifying/handcrafting extrinsic rewards for each transition in the offline data. As a remedy for the labor-intensive labeling, we propose to endow offline RL tasks with a few expert data and utilize the limited expert data to drive intrinsic rewards, thus eliminating the need for extrinsic rewards. To achieve that, we introduce textbf{C}alibrated textbf{L}atent gtextbf{U}idanctextbf{E} (CLUE), which utilizes a conditional variational auto-encoder to learn a latent space such that intrinsic rewards can be directly qualified over the latent space. CLUE's key idea is to align the intrinsic rewards consistent with the expert intention via enforcing the embeddings of expert data to a calibrated contextual representation. We instantiate the expert-driven intrinsic rewards in sparse-reward offline RL tasks, offline imitation learning (IL) tasks, and unsupervised offline RL tasks. Empirically, we find that CLUE can effectively improve the sparse-reward offline RL performance, outperform the state-of-the-art offline IL baselines, and discover diverse skills from static reward-free offline data." @default.
- W4382173453 created "2023-06-27" @default.
- W4382173453 creator A5000359400 @default.
- W4382173453 creator A5011973939 @default.
- W4382173453 creator A5032326710 @default.
- W4382173453 creator A5043584847 @default.
- W4382173453 date "2023-06-23" @default.
- W4382173453 modified "2023-10-17" @default.
- W4382173453 title "CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning" @default.
- W4382173453 doi "https://doi.org/10.48550/arxiv.2306.13412" @default.
- W4382173453 hasPublicationYear "2023" @default.
- W4382173453 type Work @default.
- W4382173453 citedByCount "0" @default.
- W4382173453 crossrefType "posted-content" @default.
- W4382173453 hasAuthorship W4382173453A5000359400 @default.
- W4382173453 hasAuthorship W4382173453A5011973939 @default.
- W4382173453 hasAuthorship W4382173453A5032326710 @default.
- W4382173453 hasAuthorship W4382173453A5043584847 @default.
- W4382173453 hasBestOaLocation W43821734531 @default.
- W4382173453 hasConcept C105795698 @default.
- W4382173453 hasConcept C111919701 @default.
- W4382173453 hasConcept C118505674 @default.
- W4382173453 hasConcept C119857082 @default.
- W4382173453 hasConcept C126388530 @default.
- W4382173453 hasConcept C136764020 @default.
- W4382173453 hasConcept C154945302 @default.
- W4382173453 hasConcept C15744967 @default.
- W4382173453 hasConcept C17744445 @default.
- W4382173453 hasConcept C199539241 @default.
- W4382173453 hasConcept C2776359362 @default.
- W4382173453 hasConcept C2780102126 @default.
- W4382173453 hasConcept C2780490138 @default.
- W4382173453 hasConcept C2986087404 @default.
- W4382173453 hasConcept C33923547 @default.
- W4382173453 hasConcept C41008148 @default.
- W4382173453 hasConcept C51167844 @default.
- W4382173453 hasConcept C72434380 @default.
- W4382173453 hasConcept C77805123 @default.
- W4382173453 hasConcept C94625758 @default.
- W4382173453 hasConcept C97541855 @default.
- W4382173453 hasConceptScore W4382173453C105795698 @default.
- W4382173453 hasConceptScore W4382173453C111919701 @default.
- W4382173453 hasConceptScore W4382173453C118505674 @default.
- W4382173453 hasConceptScore W4382173453C119857082 @default.
- W4382173453 hasConceptScore W4382173453C126388530 @default.
- W4382173453 hasConceptScore W4382173453C136764020 @default.
- W4382173453 hasConceptScore W4382173453C154945302 @default.
- W4382173453 hasConceptScore W4382173453C15744967 @default.
- W4382173453 hasConceptScore W4382173453C17744445 @default.
- W4382173453 hasConceptScore W4382173453C199539241 @default.
- W4382173453 hasConceptScore W4382173453C2776359362 @default.
- W4382173453 hasConceptScore W4382173453C2780102126 @default.
- W4382173453 hasConceptScore W4382173453C2780490138 @default.
- W4382173453 hasConceptScore W4382173453C2986087404 @default.
- W4382173453 hasConceptScore W4382173453C33923547 @default.
- W4382173453 hasConceptScore W4382173453C41008148 @default.
- W4382173453 hasConceptScore W4382173453C51167844 @default.
- W4382173453 hasConceptScore W4382173453C72434380 @default.
- W4382173453 hasConceptScore W4382173453C77805123 @default.
- W4382173453 hasConceptScore W4382173453C94625758 @default.
- W4382173453 hasConceptScore W4382173453C97541855 @default.
- W4382173453 hasLocation W43821734531 @default.
- W4382173453 hasOpenAccess W4382173453 @default.
- W4382173453 hasPrimaryLocation W43821734531 @default.
- W4382173453 hasRelatedWork W2795910581 @default.
- W4382173453 hasRelatedWork W3022038857 @default.
- W4382173453 hasRelatedWork W4221145086 @default.
- W4382173453 hasRelatedWork W4225619808 @default.
- W4382173453 hasRelatedWork W4226283576 @default.
- W4382173453 hasRelatedWork W4247881572 @default.
- W4382173453 hasRelatedWork W4283797777 @default.
- W4382173453 hasRelatedWork W4311991951 @default.
- W4382173453 hasRelatedWork W4319083788 @default.
- W4382173453 hasRelatedWork W4226221094 @default.
- W4382173453 isParatext "false" @default.
- W4382173453 isRetracted "false" @default.
- W4382173453 workType "article" @default.