Matches in SemOpenAlex for { <https://semopenalex.org/work/W3126498497> ?p ?o ?g. }
- W3126498497 abstract "Reinforcement learning (RL) has achieved impressive performance in a variety of online settings in which an agent's ability to query the environment for transitions and rewards is effectively unlimited. However, in many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited. In this work, we focus on this offline setting. Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors before using these primitives for downstream task learning. Primitives extracted in this way serve two purposes: they delineate the behaviors that are supported by the data from those that are not, making them useful for avoiding distributional shift in offline RL; and they provide a degree of temporal abstraction, which reduces the effective horizon yielding better learning in theory, and improved offline RL in practice. In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning as well as exploration and transfer in online RL on a variety of benchmark domains. Visualizations are available at https://sites.google.com/view/opal-iclr" @default.
- W3126498497 created "2021-02-15" @default.
- W3126498497 creator A5026322200 @default.
- W3126498497 creator A5040424522 @default.
- W3126498497 creator A5047802575 @default.
- W3126498497 creator A5057773393 @default.
- W3126498497 creator A5076889780 @default.
- W3126498497 date "2021-05-03" @default.
- W3126498497 modified "2023-10-01" @default.
- W3126498497 title "OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning" @default.
- W3126498497 cites W1499669280 @default.
- W3126498497 cites W2119567691 @default.
- W3126498497 cites W2201912979 @default.
- W3126498497 cites W2257979135 @default.
- W3126498497 cites W2736601468 @default.
- W3126498497 cites W2753738274 @default.
- W3126498497 cites W2785342287 @default.
- W3126498497 cites W2787259794 @default.
- W3126498497 cites W2803281228 @default.
- W3126498497 cites W2804596319 @default.
- W3126498497 cites W2894605519 @default.
- W3126498497 cites W2902711054 @default.
- W3126498497 cites W2942608247 @default.
- W3126498497 cites W2949267040 @default.
- W3126498497 cites W2953981431 @default.
- W3126498497 cites W2954142106 @default.
- W3126498497 cites W2960876848 @default.
- W3126498497 cites W2962779270 @default.
- W3126498497 cites W2962803570 @default.
- W3126498497 cites W2963438456 @default.
- W3126498497 cites W2963704132 @default.
- W3126498497 cites W2964121744 @default.
- W3126498497 cites W2964161785 @default.
- W3126498497 cites W2964227312 @default.
- W3126498497 cites W2964311356 @default.
- W3126498497 cites W2970384648 @default.
- W3126498497 cites W2970990801 @default.
- W3126498497 cites W2976657239 @default.
- W3126498497 cites W2991355586 @default.
- W3126498497 cites W3016525976 @default.
- W3126498497 cites W3022566517 @default.
- W3126498497 cites W3028676366 @default.
- W3126498497 cites W3030163244 @default.
- W3126498497 cites W3030605021 @default.
- W3126498497 cites W3031840745 @default.
- W3126498497 cites W3032377877 @default.
- W3126498497 cites W3033324992 @default.
- W3126498497 cites W3034084488 @default.
- W3126498497 cites W3035406442 @default.
- W3126498497 hasPublicationYear "2021" @default.
- W3126498497 type Work @default.
- W3126498497 sameAs 3126498497 @default.
- W3126498497 citedByCount "5" @default.
- W3126498497 countsByYear W31264984972021 @default.
- W3126498497 crossrefType "proceedings-article" @default.
- W3126498497 hasAuthorship W3126498497A5026322200 @default.
- W3126498497 hasAuthorship W3126498497A5040424522 @default.
- W3126498497 hasAuthorship W3126498497A5047802575 @default.
- W3126498497 hasAuthorship W3126498497A5057773393 @default.
- W3126498497 hasAuthorship W3126498497A5076889780 @default.
- W3126498497 hasConcept C107457646 @default.
- W3126498497 hasConcept C111472728 @default.
- W3126498497 hasConcept C111919701 @default.
- W3126498497 hasConcept C119857082 @default.
- W3126498497 hasConcept C124304363 @default.
- W3126498497 hasConcept C13280743 @default.
- W3126498497 hasConcept C136197465 @default.
- W3126498497 hasConcept C136764020 @default.
- W3126498497 hasConcept C138885662 @default.
- W3126498497 hasConcept C153083717 @default.
- W3126498497 hasConcept C154945302 @default.
- W3126498497 hasConcept C162324750 @default.
- W3126498497 hasConcept C185798385 @default.
- W3126498497 hasConcept C187736073 @default.
- W3126498497 hasConcept C205649164 @default.
- W3126498497 hasConcept C2780102126 @default.
- W3126498497 hasConcept C2780451532 @default.
- W3126498497 hasConcept C2780490138 @default.
- W3126498497 hasConcept C2986087404 @default.
- W3126498497 hasConcept C41008148 @default.
- W3126498497 hasConcept C97541855 @default.
- W3126498497 hasConceptScore W3126498497C107457646 @default.
- W3126498497 hasConceptScore W3126498497C111472728 @default.
- W3126498497 hasConceptScore W3126498497C111919701 @default.
- W3126498497 hasConceptScore W3126498497C119857082 @default.
- W3126498497 hasConceptScore W3126498497C124304363 @default.
- W3126498497 hasConceptScore W3126498497C13280743 @default.
- W3126498497 hasConceptScore W3126498497C136197465 @default.
- W3126498497 hasConceptScore W3126498497C136764020 @default.
- W3126498497 hasConceptScore W3126498497C138885662 @default.
- W3126498497 hasConceptScore W3126498497C153083717 @default.
- W3126498497 hasConceptScore W3126498497C154945302 @default.
- W3126498497 hasConceptScore W3126498497C162324750 @default.
- W3126498497 hasConceptScore W3126498497C185798385 @default.
- W3126498497 hasConceptScore W3126498497C187736073 @default.
- W3126498497 hasConceptScore W3126498497C205649164 @default.
- W3126498497 hasConceptScore W3126498497C2780102126 @default.
- W3126498497 hasConceptScore W3126498497C2780451532 @default.
- W3126498497 hasConceptScore W3126498497C2780490138 @default.
- W3126498497 hasConceptScore W3126498497C2986087404 @default.