Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387687172> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4387687172 abstract "Most Reinforcement Learning (RL) methods are traditionally studied in an active learning setting, where agents directly interact with their environments, observe action outcomes, and learn through trial and error. However, allowing partially trained agents to interact with real physical systems poses significant challenges, including high costs, safety risks, and the need for constant supervision. Offline RL addresses these cost and safety concerns by leveraging existing datasets and reducing the need for resource-intensive real-time interactions. Nevertheless, a substantial challenge lies in the demand for these datasets to be meticulously annotated with rewards. In this paper, we introduce Optimal Transport Reward (OTR) labelling, an innovative algorithm designed to assign rewards to offline trajectories, using a small number of high-quality expert demonstrations. The core principle of OTR involves employing Optimal Transport (OT) to calculate an optimal alignment between an unlabeled trajectory from the dataset and an expert demonstration. This alignment yields a similarity measure that is effectively interpreted as a reward signal. An offline RL algorithm can then utilize these reward signals to learn a policy. This approach circumvents the need for handcrafted rewards, unlocking the potential to harness vast datasets for policy learning. Leveraging the SurRoL simulation platform tailored for surgical robot learning, we generate datasets and employ them to train policies using the OTR algorithm. By demonstrating the efficacy of OTR in a different domain, we emphasize its versatility and its potential to expedite RL deployment across a wide range of fields." @default.
- W4387687172 created "2023-10-17" @default.
- W4387687172 creator A5059557438 @default.
- W4387687172 creator A5079334517 @default.
- W4387687172 creator A5088897776 @default.
- W4387687172 date "2023-10-12" @default.
- W4387687172 modified "2023-10-18" @default.
- W4387687172 title "Leveraging Optimal Transport for Enhanced Offline Reinforcement Learning in Surgical Robotic Environments" @default.
- W4387687172 doi "https://doi.org/10.48550/arxiv.2310.08841" @default.
- W4387687172 hasPublicationYear "2023" @default.
- W4387687172 type Work @default.
- W4387687172 citedByCount "0" @default.
- W4387687172 crossrefType "posted-content" @default.
- W4387687172 hasAuthorship W4387687172A5059557438 @default.
- W4387687172 hasAuthorship W4387687172A5079334517 @default.
- W4387687172 hasAuthorship W4387687172A5088897776 @default.
- W4387687172 hasBestOaLocation W43876871721 @default.
- W4387687172 hasConcept C105339364 @default.
- W4387687172 hasConcept C111472728 @default.
- W4387687172 hasConcept C111919701 @default.
- W4387687172 hasConcept C119857082 @default.
- W4387687172 hasConcept C134306372 @default.
- W4387687172 hasConcept C136764020 @default.
- W4387687172 hasConcept C138885662 @default.
- W4387687172 hasConcept C154945302 @default.
- W4387687172 hasConcept C159985019 @default.
- W4387687172 hasConcept C192562407 @default.
- W4387687172 hasConcept C204323151 @default.
- W4387687172 hasConcept C206345919 @default.
- W4387687172 hasConcept C2779530757 @default.
- W4387687172 hasConcept C2780490138 @default.
- W4387687172 hasConcept C2986087404 @default.
- W4387687172 hasConcept C31258907 @default.
- W4387687172 hasConcept C33923547 @default.
- W4387687172 hasConcept C36503486 @default.
- W4387687172 hasConcept C41008148 @default.
- W4387687172 hasConcept C90509273 @default.
- W4387687172 hasConcept C97541855 @default.
- W4387687172 hasConceptScore W4387687172C105339364 @default.
- W4387687172 hasConceptScore W4387687172C111472728 @default.
- W4387687172 hasConceptScore W4387687172C111919701 @default.
- W4387687172 hasConceptScore W4387687172C119857082 @default.
- W4387687172 hasConceptScore W4387687172C134306372 @default.
- W4387687172 hasConceptScore W4387687172C136764020 @default.
- W4387687172 hasConceptScore W4387687172C138885662 @default.
- W4387687172 hasConceptScore W4387687172C154945302 @default.
- W4387687172 hasConceptScore W4387687172C159985019 @default.
- W4387687172 hasConceptScore W4387687172C192562407 @default.
- W4387687172 hasConceptScore W4387687172C204323151 @default.
- W4387687172 hasConceptScore W4387687172C206345919 @default.
- W4387687172 hasConceptScore W4387687172C2779530757 @default.
- W4387687172 hasConceptScore W4387687172C2780490138 @default.
- W4387687172 hasConceptScore W4387687172C2986087404 @default.
- W4387687172 hasConceptScore W4387687172C31258907 @default.
- W4387687172 hasConceptScore W4387687172C33923547 @default.
- W4387687172 hasConceptScore W4387687172C36503486 @default.
- W4387687172 hasConceptScore W4387687172C41008148 @default.
- W4387687172 hasConceptScore W4387687172C90509273 @default.
- W4387687172 hasConceptScore W4387687172C97541855 @default.
- W4387687172 hasLocation W43876871721 @default.
- W4387687172 hasOpenAccess W4387687172 @default.
- W4387687172 hasPrimaryLocation W43876871721 @default.
- W4387687172 hasRelatedWork W2566006169 @default.
- W4387687172 hasRelatedWork W2770234245 @default.
- W4387687172 hasRelatedWork W2987774938 @default.
- W4387687172 hasRelatedWork W4225619808 @default.
- W4387687172 hasRelatedWork W4226042081 @default.
- W4387687172 hasRelatedWork W4283712691 @default.
- W4387687172 hasRelatedWork W4308935744 @default.
- W4387687172 hasRelatedWork W4376223516 @default.
- W4387687172 hasRelatedWork W4386160446 @default.
- W4387687172 hasRelatedWork W4387545330 @default.
- W4387687172 isParatext "false" @default.
- W4387687172 isRetracted "false" @default.
- W4387687172 workType "article" @default.