Matches in SemOpenAlex for { <https://semopenalex.org/work/W4226435594> ?p ?o ?g. }
Showing items 1 to 47 of
47
with 100 items per page.
- W4226435594 abstract "How to extract as much learning signal from each trajectory data has been a key problem in reinforcement learning (RL), where sample inefficiency has posed serious challenges for practical applications. Recent works have shown that using expressive policy function approximators and conditioning on future trajectory information -- such as future states in hindsight experience replay or returns-to-go in Decision Transformer (DT) -- enables efficient learning of multi-task policies, where at times online RL is fully replaced by offline behavioral cloning, e.g. sequence modeling. We demonstrate that all these approaches are doing hindsight information matching (HIM) -- training policies that can output the rest of trajectory that matches some statistics of future state information. We present Generalized Decision Transformer (GDT) for solving any HIM problem, and show how different choices for the feature function and the anti-causal aggregator not only recover DT as a special case, but also lead to novel Categorical DT (CDT) and Bi-directional DT (BDT) for matching different statistics of the future. For evaluating CDT and BDT, we define offline multi-task state-marginal matching (SMM) and imitation learning (IL) as two generic HIM problems, propose a Wasserstein distance loss as a metric for both, and empirically study them on MuJoCo continuous control benchmarks. CDT, which simply replaces anti-causal summation with anti-causal binning in DT, enables the first effective offline multi-task SMM algorithm that generalizes well to unseen and even synthetic multi-modal state-feature distributions. BDT, which uses an anti-causal second transformer as the aggregator, can learn to model any statistics of the future and outperforms DT variants in offline multi-task IL. Our generalized formulations from HIM and GDT greatly expand the role of powerful sequence modeling architectures in modern RL." @default.
- W4226435594 created "2022-05-05" @default.
- W4226435594 creator A5012231366 @default.
- W4226435594 creator A5061613634 @default.
- W4226435594 creator A5074059447 @default.
- W4226435594 date "2021-11-19" @default.
- W4226435594 modified "2023-10-16" @default.
- W4226435594 title "Generalized Decision Transformer for Offline Hindsight Information Matching" @default.
- W4226435594 doi "https://doi.org/10.48550/arxiv.2111.10364" @default.
- W4226435594 hasPublicationYear "2021" @default.
- W4226435594 type Work @default.
- W4226435594 citedByCount "0" @default.
- W4226435594 crossrefType "posted-content" @default.
- W4226435594 hasAuthorship W4226435594A5012231366 @default.
- W4226435594 hasAuthorship W4226435594A5061613634 @default.
- W4226435594 hasAuthorship W4226435594A5074059447 @default.
- W4226435594 hasBestOaLocation W42264355941 @default.
- W4226435594 hasConcept C10347200 @default.
- W4226435594 hasConcept C119857082 @default.
- W4226435594 hasConcept C154945302 @default.
- W4226435594 hasConcept C15744967 @default.
- W4226435594 hasConcept C180747234 @default.
- W4226435594 hasConcept C41008148 @default.
- W4226435594 hasConcept C97541855 @default.
- W4226435594 hasConceptScore W4226435594C10347200 @default.
- W4226435594 hasConceptScore W4226435594C119857082 @default.
- W4226435594 hasConceptScore W4226435594C154945302 @default.
- W4226435594 hasConceptScore W4226435594C15744967 @default.
- W4226435594 hasConceptScore W4226435594C180747234 @default.
- W4226435594 hasConceptScore W4226435594C41008148 @default.
- W4226435594 hasConceptScore W4226435594C97541855 @default.
- W4226435594 hasLocation W42264355941 @default.
- W4226435594 hasOpenAccess W4226435594 @default.
- W4226435594 hasPrimaryLocation W42264355941 @default.
- W4226435594 hasRelatedWork W10379689 @default.
- W4226435594 hasRelatedWork W12291563 @default.
- W4226435594 hasRelatedWork W15135299 @default.
- W4226435594 hasRelatedWork W4412456 @default.
- W4226435594 hasRelatedWork W547392 @default.
- W4226435594 hasRelatedWork W5547603 @default.
- W4226435594 hasRelatedWork W7084024 @default.
- W4226435594 hasRelatedWork W8447228 @default.
- W4226435594 hasRelatedWork W8539471 @default.
- W4226435594 hasRelatedWork W868042 @default.
- W4226435594 isParatext "false" @default.
- W4226435594 isRetracted "false" @default.
- W4226435594 workType "article" @default.