Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387156749> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4387156749 abstract "Zero-shot reinforcement learning (RL) promises to provide agents that can perform any task in an environment after an offline pre-training phase. Forward-backward (FB) representations represent remarkable progress towards this ideal, achieving 85% of the performance of task-specific agents in this setting. However, such performance is contingent on access to large and diverse datasets for pre-training, which cannot be expected for most real problems. Here, we explore how FB performance degrades when trained on small datasets that lack diversity, and mitigate it with conservatism, a well-established feature of performant offline RL algorithms. We evaluate our family of methods across various datasets, domains and tasks, reaching 150% of vanilla FB performance in aggregate. Somewhat surprisingly, conservative FB algorithms also outperform the task-specific baseline, despite lacking access to reward labels and being required to maintain policies for all tasks. Conservative FB algorithms perform no worse than FB on full datasets, and so present little downside over their predecessor. Our code is available open-source via https://enjeeneer.io/projects/conservative-world-models/." @default.
- W4387156749 created "2023-09-30" @default.
- W4387156749 creator A5016873176 @default.
- W4387156749 creator A5017714952 @default.
- W4387156749 creator A5032532921 @default.
- W4387156749 date "2023-09-26" @default.
- W4387156749 modified "2023-10-13" @default.
- W4387156749 title "Conservative World Models" @default.
- W4387156749 doi "https://doi.org/10.48550/arxiv.2309.15178" @default.
- W4387156749 hasPublicationYear "2023" @default.
- W4387156749 type Work @default.
- W4387156749 citedByCount "0" @default.
- W4387156749 crossrefType "posted-content" @default.
- W4387156749 hasAuthorship W4387156749A5016873176 @default.
- W4387156749 hasAuthorship W4387156749A5017714952 @default.
- W4387156749 hasAuthorship W4387156749A5032532921 @default.
- W4387156749 hasBestOaLocation W43871567491 @default.
- W4387156749 hasConcept C111368507 @default.
- W4387156749 hasConcept C119857082 @default.
- W4387156749 hasConcept C12725497 @default.
- W4387156749 hasConcept C127313418 @default.
- W4387156749 hasConcept C138885662 @default.
- W4387156749 hasConcept C154945302 @default.
- W4387156749 hasConcept C159985019 @default.
- W4387156749 hasConcept C162324750 @default.
- W4387156749 hasConcept C177264268 @default.
- W4387156749 hasConcept C17744445 @default.
- W4387156749 hasConcept C187736073 @default.
- W4387156749 hasConcept C192562407 @default.
- W4387156749 hasConcept C199360897 @default.
- W4387156749 hasConcept C199539241 @default.
- W4387156749 hasConcept C2776401178 @default.
- W4387156749 hasConcept C2776760102 @default.
- W4387156749 hasConcept C2780451532 @default.
- W4387156749 hasConcept C41008148 @default.
- W4387156749 hasConcept C41895202 @default.
- W4387156749 hasConcept C4679612 @default.
- W4387156749 hasConcept C94625758 @default.
- W4387156749 hasConcept C96640997 @default.
- W4387156749 hasConcept C97541855 @default.
- W4387156749 hasConceptScore W4387156749C111368507 @default.
- W4387156749 hasConceptScore W4387156749C119857082 @default.
- W4387156749 hasConceptScore W4387156749C12725497 @default.
- W4387156749 hasConceptScore W4387156749C127313418 @default.
- W4387156749 hasConceptScore W4387156749C138885662 @default.
- W4387156749 hasConceptScore W4387156749C154945302 @default.
- W4387156749 hasConceptScore W4387156749C159985019 @default.
- W4387156749 hasConceptScore W4387156749C162324750 @default.
- W4387156749 hasConceptScore W4387156749C177264268 @default.
- W4387156749 hasConceptScore W4387156749C17744445 @default.
- W4387156749 hasConceptScore W4387156749C187736073 @default.
- W4387156749 hasConceptScore W4387156749C192562407 @default.
- W4387156749 hasConceptScore W4387156749C199360897 @default.
- W4387156749 hasConceptScore W4387156749C199539241 @default.
- W4387156749 hasConceptScore W4387156749C2776401178 @default.
- W4387156749 hasConceptScore W4387156749C2776760102 @default.
- W4387156749 hasConceptScore W4387156749C2780451532 @default.
- W4387156749 hasConceptScore W4387156749C41008148 @default.
- W4387156749 hasConceptScore W4387156749C41895202 @default.
- W4387156749 hasConceptScore W4387156749C4679612 @default.
- W4387156749 hasConceptScore W4387156749C94625758 @default.
- W4387156749 hasConceptScore W4387156749C96640997 @default.
- W4387156749 hasConceptScore W4387156749C97541855 @default.
- W4387156749 hasLocation W43871567491 @default.
- W4387156749 hasOpenAccess W4387156749 @default.
- W4387156749 hasPrimaryLocation W43871567491 @default.
- W4387156749 hasRelatedWork W1592764918 @default.
- W4387156749 hasRelatedWork W2092346941 @default.
- W4387156749 hasRelatedWork W2347774719 @default.
- W4387156749 hasRelatedWork W2376556533 @default.
- W4387156749 hasRelatedWork W2381997239 @default.
- W4387156749 hasRelatedWork W4233324169 @default.
- W4387156749 hasRelatedWork W4245885004 @default.
- W4387156749 hasRelatedWork W4246523182 @default.
- W4387156749 hasRelatedWork W4385983104 @default.
- W4387156749 hasRelatedWork W1975687574 @default.
- W4387156749 isParatext "false" @default.
- W4387156749 isRetracted "false" @default.
- W4387156749 workType "article" @default.