Matches in SemOpenAlex for { <https://semopenalex.org/work/W4309804912> ?p ?o ?g. }
Showing items 1 to 85 of
85
with 100 items per page.
- W4309804912 abstract "In many real-world applications, collecting large and high-quality datasets may be too costly or impractical. Offline reinforcement learning (RL) aims to infer an optimal decision-making policy from a fixed set of data. Getting the most information from historical data is then vital for good performance once the policy is deployed. We propose a model-based data augmentation strategy, Trajectory Stitching (TS), to improve the quality of sub-optimal historical trajectories. TS introduces unseen actions joining previously disconnected states: using a probabilistic notion of state reachability, it effectively `stitches' together parts of the historical demonstrations to generate new, higher quality ones. A stitching event consists of a transition between a pair of observed states through a synthetic and highly probable action. New actions are introduced only when they are expected to be beneficial, according to an estimated state-value function. We show that using this data augmentation strategy jointly with behavioural cloning (BC) leads to improvements over the behaviour-cloned policy from the original dataset. Improving over the BC policy could then be used as a launchpad for online RL through planning and demonstration-guided RL." @default.
- W4309804912 created "2022-11-29" @default.
- W4309804912 creator A5006965239 @default.
- W4309804912 creator A5010581004 @default.
- W4309804912 date "2022-11-21" @default.
- W4309804912 modified "2023-09-28" @default.
- W4309804912 title "Model-based Trajectory Stitching for Improved Offline Reinforcement Learning" @default.
- W4309804912 doi "https://doi.org/10.48550/arxiv.2211.11603" @default.
- W4309804912 hasPublicationYear "2022" @default.
- W4309804912 type Work @default.
- W4309804912 citedByCount "0" @default.
- W4309804912 crossrefType "posted-content" @default.
- W4309804912 hasAuthorship W4309804912A5006965239 @default.
- W4309804912 hasAuthorship W4309804912A5010581004 @default.
- W4309804912 hasBestOaLocation W43098049121 @default.
- W4309804912 hasConcept C111472728 @default.
- W4309804912 hasConcept C11413529 @default.
- W4309804912 hasConcept C119857082 @default.
- W4309804912 hasConcept C121050878 @default.
- W4309804912 hasConcept C121332964 @default.
- W4309804912 hasConcept C126255220 @default.
- W4309804912 hasConcept C1276947 @default.
- W4309804912 hasConcept C13662910 @default.
- W4309804912 hasConcept C136643341 @default.
- W4309804912 hasConcept C138885662 @default.
- W4309804912 hasConcept C14036430 @default.
- W4309804912 hasConcept C14646407 @default.
- W4309804912 hasConcept C154945302 @default.
- W4309804912 hasConcept C177264268 @default.
- W4309804912 hasConcept C199360897 @default.
- W4309804912 hasConcept C2779530757 @default.
- W4309804912 hasConcept C2779662365 @default.
- W4309804912 hasConcept C29081049 @default.
- W4309804912 hasConcept C33923547 @default.
- W4309804912 hasConcept C41008148 @default.
- W4309804912 hasConcept C48103436 @default.
- W4309804912 hasConcept C49937458 @default.
- W4309804912 hasConcept C62520636 @default.
- W4309804912 hasConcept C78458016 @default.
- W4309804912 hasConcept C80444323 @default.
- W4309804912 hasConcept C86803240 @default.
- W4309804912 hasConcept C97541855 @default.
- W4309804912 hasConceptScore W4309804912C111472728 @default.
- W4309804912 hasConceptScore W4309804912C11413529 @default.
- W4309804912 hasConceptScore W4309804912C119857082 @default.
- W4309804912 hasConceptScore W4309804912C121050878 @default.
- W4309804912 hasConceptScore W4309804912C121332964 @default.
- W4309804912 hasConceptScore W4309804912C126255220 @default.
- W4309804912 hasConceptScore W4309804912C1276947 @default.
- W4309804912 hasConceptScore W4309804912C13662910 @default.
- W4309804912 hasConceptScore W4309804912C136643341 @default.
- W4309804912 hasConceptScore W4309804912C138885662 @default.
- W4309804912 hasConceptScore W4309804912C14036430 @default.
- W4309804912 hasConceptScore W4309804912C14646407 @default.
- W4309804912 hasConceptScore W4309804912C154945302 @default.
- W4309804912 hasConceptScore W4309804912C177264268 @default.
- W4309804912 hasConceptScore W4309804912C199360897 @default.
- W4309804912 hasConceptScore W4309804912C2779530757 @default.
- W4309804912 hasConceptScore W4309804912C2779662365 @default.
- W4309804912 hasConceptScore W4309804912C29081049 @default.
- W4309804912 hasConceptScore W4309804912C33923547 @default.
- W4309804912 hasConceptScore W4309804912C41008148 @default.
- W4309804912 hasConceptScore W4309804912C48103436 @default.
- W4309804912 hasConceptScore W4309804912C49937458 @default.
- W4309804912 hasConceptScore W4309804912C62520636 @default.
- W4309804912 hasConceptScore W4309804912C78458016 @default.
- W4309804912 hasConceptScore W4309804912C80444323 @default.
- W4309804912 hasConceptScore W4309804912C86803240 @default.
- W4309804912 hasConceptScore W4309804912C97541855 @default.
- W4309804912 hasLocation W43098049121 @default.
- W4309804912 hasOpenAccess W4309804912 @default.
- W4309804912 hasPrimaryLocation W43098049121 @default.
- W4309804912 hasRelatedWork W2061469207 @default.
- W4309804912 hasRelatedWork W2767781638 @default.
- W4309804912 hasRelatedWork W2950892788 @default.
- W4309804912 hasRelatedWork W2974826883 @default.
- W4309804912 hasRelatedWork W3103643887 @default.
- W4309804912 hasRelatedWork W3208753255 @default.
- W4309804912 hasRelatedWork W329326438 @default.
- W4309804912 hasRelatedWork W4309804912 @default.
- W4309804912 hasRelatedWork W4319083788 @default.
- W4309804912 hasRelatedWork W4378771262 @default.
- W4309804912 isParatext "false" @default.
- W4309804912 isRetracted "false" @default.
- W4309804912 workType "article" @default.