Matches in SemOpenAlex for { <https://semopenalex.org/work/W4295356277> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4295356277 abstract "Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results. The Decision Transformer (DT) combines the conditional policy approach and a transformer architecture, showing competitive performance against several benchmarks. However, DT lacks stitching ability -- one of the critical abilities for offline RL to learn the optimal policy from sub-optimal trajectories. This issue becomes particularly significant when the offline dataset only contains sub-optimal trajectories. On the other hand, the conventional RL approaches based on Dynamic Programming (such as Q-learning) do not have the same limitation; however, they suffer from unstable learning behaviours, especially when they rely on function approximation in an off-policy learning setting. In this paper, we propose the Q-learning Decision Transformer (QDT) to address the shortcomings of DT by leveraging the benefits of Dynamic Programming (Q-learning). It utilises the Dynamic Programming results to relabel the return-to-go in the training data to then train the DT with the relabelled data. Our approach efficiently exploits the benefits of these two approaches and compensates for each other's shortcomings to achieve better performance. We empirically show these in both simple toy environments and the more complex D4RL benchmark, showing competitive performance gains." @default.
- W4295356277 created "2022-09-13" @default.
- W4295356277 creator A5028764929 @default.
- W4295356277 creator A5043261884 @default.
- W4295356277 creator A5049069908 @default.
- W4295356277 date "2022-09-08" @default.
- W4295356277 modified "2023-09-28" @default.
- W4295356277 title "Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL" @default.
- W4295356277 doi "https://doi.org/10.48550/arxiv.2209.03993" @default.
- W4295356277 hasPublicationYear "2022" @default.
- W4295356277 type Work @default.
- W4295356277 citedByCount "0" @default.
- W4295356277 crossrefType "posted-content" @default.
- W4295356277 hasAuthorship W4295356277A5028764929 @default.
- W4295356277 hasAuthorship W4295356277A5043261884 @default.
- W4295356277 hasAuthorship W4295356277A5049069908 @default.
- W4295356277 hasBestOaLocation W42953562771 @default.
- W4295356277 hasConcept C11413529 @default.
- W4295356277 hasConcept C119599485 @default.
- W4295356277 hasConcept C119857082 @default.
- W4295356277 hasConcept C127413603 @default.
- W4295356277 hasConcept C13280743 @default.
- W4295356277 hasConcept C136764020 @default.
- W4295356277 hasConcept C154945302 @default.
- W4295356277 hasConcept C165696696 @default.
- W4295356277 hasConcept C165801399 @default.
- W4295356277 hasConcept C185798385 @default.
- W4295356277 hasConcept C188116033 @default.
- W4295356277 hasConcept C205649164 @default.
- W4295356277 hasConcept C2780490138 @default.
- W4295356277 hasConcept C2986087404 @default.
- W4295356277 hasConcept C37404715 @default.
- W4295356277 hasConcept C38652104 @default.
- W4295356277 hasConcept C41008148 @default.
- W4295356277 hasConcept C66322947 @default.
- W4295356277 hasConcept C97541855 @default.
- W4295356277 hasConceptScore W4295356277C11413529 @default.
- W4295356277 hasConceptScore W4295356277C119599485 @default.
- W4295356277 hasConceptScore W4295356277C119857082 @default.
- W4295356277 hasConceptScore W4295356277C127413603 @default.
- W4295356277 hasConceptScore W4295356277C13280743 @default.
- W4295356277 hasConceptScore W4295356277C136764020 @default.
- W4295356277 hasConceptScore W4295356277C154945302 @default.
- W4295356277 hasConceptScore W4295356277C165696696 @default.
- W4295356277 hasConceptScore W4295356277C165801399 @default.
- W4295356277 hasConceptScore W4295356277C185798385 @default.
- W4295356277 hasConceptScore W4295356277C188116033 @default.
- W4295356277 hasConceptScore W4295356277C205649164 @default.
- W4295356277 hasConceptScore W4295356277C2780490138 @default.
- W4295356277 hasConceptScore W4295356277C2986087404 @default.
- W4295356277 hasConceptScore W4295356277C37404715 @default.
- W4295356277 hasConceptScore W4295356277C38652104 @default.
- W4295356277 hasConceptScore W4295356277C41008148 @default.
- W4295356277 hasConceptScore W4295356277C66322947 @default.
- W4295356277 hasConceptScore W4295356277C97541855 @default.
- W4295356277 hasLocation W42953562771 @default.
- W4295356277 hasOpenAccess W4295356277 @default.
- W4295356277 hasPrimaryLocation W42953562771 @default.
- W4295356277 hasRelatedWork W2923653485 @default.
- W4295356277 hasRelatedWork W3022038857 @default.
- W4295356277 hasRelatedWork W3131920644 @default.
- W4295356277 hasRelatedWork W3150098721 @default.
- W4295356277 hasRelatedWork W3153007185 @default.
- W4295356277 hasRelatedWork W3212439828 @default.
- W4295356277 hasRelatedWork W4225619808 @default.
- W4295356277 hasRelatedWork W4311991951 @default.
- W4295356277 hasRelatedWork W4318621078 @default.
- W4295356277 hasRelatedWork W4319083788 @default.
- W4295356277 isParatext "false" @default.
- W4295356277 isRetracted "false" @default.
- W4295356277 workType "article" @default.