Matches in SemOpenAlex for { <https://semopenalex.org/work/W4310390787> ?p ?o ?g. }
Showing items 1 to 60 of
60
with 100 items per page.
- W4310390787 abstract "Recently, methods such as Decision Transformer that reduce reinforcement learning to a prediction task and solve it via supervised learning (RvS) have become popular due to their simplicity, robustness to hyperparameters, and strong overall performance on offline RL tasks. However, simply conditioning a probabilistic model on a desired return and taking the predicted action can fail dramatically in stochastic environments since trajectories that result in a return may have only achieved that return due to luck. In this work, we describe the limitations of RvS approaches in stochastic environments and propose a solution. Rather than simply conditioning on the return of a single trajectory as is standard practice, our proposed method, ESPER, learns to cluster trajectories and conditions on average cluster returns, which are independent from environment stochasticity. Doing so allows ESPER to achieve strong alignment between target return and expected performance in real environments. We demonstrate this in several challenging stochastic offline-RL tasks including the challenging puzzle game 2048, and Connect Four playing against a stochastic opponent. In all tested domains, ESPER achieves significantly better alignment between the target return and achieved return than simply conditioning on returns. ESPER also achieves higher maximum performance than even the value-based baselines." @default.
- W4310390787 created "2022-12-10" @default.
- W4310390787 creator A5012276327 @default.
- W4310390787 creator A5027198497 @default.
- W4310390787 creator A5041982650 @default.
- W4310390787 date "2022-05-31" @default.
- W4310390787 modified "2023-09-27" @default.
- W4310390787 title "You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments" @default.
- W4310390787 doi "https://doi.org/10.48550/arxiv.2205.15967" @default.
- W4310390787 hasPublicationYear "2022" @default.
- W4310390787 type Work @default.
- W4310390787 citedByCount "0" @default.
- W4310390787 crossrefType "posted-content" @default.
- W4310390787 hasAuthorship W4310390787A5012276327 @default.
- W4310390787 hasAuthorship W4310390787A5027198497 @default.
- W4310390787 hasAuthorship W4310390787A5041982650 @default.
- W4310390787 hasBestOaLocation W43103907871 @default.
- W4310390787 hasConcept C10138342 @default.
- W4310390787 hasConcept C104317684 @default.
- W4310390787 hasConcept C119857082 @default.
- W4310390787 hasConcept C154611145 @default.
- W4310390787 hasConcept C154945302 @default.
- W4310390787 hasConcept C162324750 @default.
- W4310390787 hasConcept C185592680 @default.
- W4310390787 hasConcept C2780821815 @default.
- W4310390787 hasConcept C41008148 @default.
- W4310390787 hasConcept C49937458 @default.
- W4310390787 hasConcept C55493867 @default.
- W4310390787 hasConcept C63479239 @default.
- W4310390787 hasConcept C97541855 @default.
- W4310390787 hasConceptScore W4310390787C10138342 @default.
- W4310390787 hasConceptScore W4310390787C104317684 @default.
- W4310390787 hasConceptScore W4310390787C119857082 @default.
- W4310390787 hasConceptScore W4310390787C154611145 @default.
- W4310390787 hasConceptScore W4310390787C154945302 @default.
- W4310390787 hasConceptScore W4310390787C162324750 @default.
- W4310390787 hasConceptScore W4310390787C185592680 @default.
- W4310390787 hasConceptScore W4310390787C2780821815 @default.
- W4310390787 hasConceptScore W4310390787C41008148 @default.
- W4310390787 hasConceptScore W4310390787C49937458 @default.
- W4310390787 hasConceptScore W4310390787C55493867 @default.
- W4310390787 hasConceptScore W4310390787C63479239 @default.
- W4310390787 hasConceptScore W4310390787C97541855 @default.
- W4310390787 hasLocation W43103907871 @default.
- W4310390787 hasLocation W43103907872 @default.
- W4310390787 hasOpenAccess W4310390787 @default.
- W4310390787 hasPrimaryLocation W43103907871 @default.
- W4310390787 hasRelatedWork W260766989 @default.
- W4310390787 hasRelatedWork W2959276766 @default.
- W4310390787 hasRelatedWork W2961085424 @default.
- W4310390787 hasRelatedWork W3074294383 @default.
- W4310390787 hasRelatedWork W3139193008 @default.
- W4310390787 hasRelatedWork W4206669594 @default.
- W4310390787 hasRelatedWork W4290792893 @default.
- W4310390787 hasRelatedWork W4295941380 @default.
- W4310390787 hasRelatedWork W4306674287 @default.
- W4310390787 hasRelatedWork W4319083788 @default.
- W4310390787 isParatext "false" @default.
- W4310390787 isRetracted "false" @default.
- W4310390787 workType "article" @default.