Matches in SemOpenAlex for { <https://semopenalex.org/work/W2287850282> ?p ?o ?g. }
- W2287850282 endingPage "1068" @default.
- W2287850282 startingPage "1060" @default.
- W2287850282 abstract "Inverse reinforcement learning (IRL) allows autonomous agents to learn to solve complex tasks from successful demonstrations. However, in many settings, e.g., when a human learns the task by trial and error, failed demonstrations are also readily available. In addition, in some tasks, purposely generating failed demonstrations may be easier than generating successful ones. Since existing IRL methods cannot make use of failed demonstrations, in this paper we propose inverse reinforcement learning from failure (IRLF) which exploits both successful and failed demonstrations. Starting from the state-of-the-art maximum causal entropy IRL method, we propose a new constrained optimisation formulation that accommodates both types of demonstrations while remaining convex. We then derive update rules for learning reward functions and policies. Experiments on both simulated and real-robot data demonstrate that IRLF converges faster and generalises better than maximum causal entropy IRL, especially when few successful demonstrations are available." @default.
- W2287850282 created "2016-06-24" @default.
- W2287850282 creator A5035251638 @default.
- W2287850282 creator A5056879203 @default.
- W2287850282 creator A5076409021 @default.
- W2287850282 date "2016-05-09" @default.
- W2287850282 modified "2023-09-23" @default.
- W2287850282 title "Inverse Reinforcement Learning from Failure" @default.
- W2287850282 cites W1527702126 @default.
- W2287850282 cites W1559267142 @default.
- W2287850282 cites W1567876833 @default.
- W2287850282 cites W1579130168 @default.
- W2287850282 cites W1591675293 @default.
- W2287850282 cites W1846353404 @default.
- W2287850282 cites W1963649406 @default.
- W2287850282 cites W1999874108 @default.
- W2287850282 cites W2033697467 @default.
- W2287850282 cites W2051944318 @default.
- W2287850282 cites W2057134775 @default.
- W2287850282 cites W2061562262 @default.
- W2287850282 cites W2098774185 @default.
- W2287850282 cites W2113023245 @default.
- W2287850282 cites W2117675763 @default.
- W2287850282 cites W2119785746 @default.
- W2287850282 cites W2126105931 @default.
- W2287850282 cites W2169498096 @default.
- W2287850282 cites W2171054284 @default.
- W2287850282 cites W2181849516 @default.
- W2287850282 cites W2402734324 @default.
- W2287850282 cites W64088143 @default.
- W2287850282 doi "https://doi.org/10.5555/2936924.2937079" @default.
- W2287850282 hasPublicationYear "2016" @default.
- W2287850282 type Work @default.
- W2287850282 sameAs 2287850282 @default.
- W2287850282 citedByCount "30" @default.
- W2287850282 countsByYear W22878502822016 @default.
- W2287850282 countsByYear W22878502822017 @default.
- W2287850282 countsByYear W22878502822018 @default.
- W2287850282 countsByYear W22878502822019 @default.
- W2287850282 countsByYear W22878502822020 @default.
- W2287850282 countsByYear W22878502822021 @default.
- W2287850282 crossrefType "proceedings-article" @default.
- W2287850282 hasAuthorship W2287850282A5035251638 @default.
- W2287850282 hasAuthorship W2287850282A5056879203 @default.
- W2287850282 hasAuthorship W2287850282A5076409021 @default.
- W2287850282 hasConcept C106301342 @default.
- W2287850282 hasConcept C119857082 @default.
- W2287850282 hasConcept C121332964 @default.
- W2287850282 hasConcept C127413603 @default.
- W2287850282 hasConcept C154945302 @default.
- W2287850282 hasConcept C165696696 @default.
- W2287850282 hasConcept C201995342 @default.
- W2287850282 hasConcept C207467116 @default.
- W2287850282 hasConcept C2524010 @default.
- W2287850282 hasConcept C2780451532 @default.
- W2287850282 hasConcept C33923547 @default.
- W2287850282 hasConcept C38652104 @default.
- W2287850282 hasConcept C41008148 @default.
- W2287850282 hasConcept C62520636 @default.
- W2287850282 hasConcept C90509273 @default.
- W2287850282 hasConcept C9679016 @default.
- W2287850282 hasConcept C97541855 @default.
- W2287850282 hasConceptScore W2287850282C106301342 @default.
- W2287850282 hasConceptScore W2287850282C119857082 @default.
- W2287850282 hasConceptScore W2287850282C121332964 @default.
- W2287850282 hasConceptScore W2287850282C127413603 @default.
- W2287850282 hasConceptScore W2287850282C154945302 @default.
- W2287850282 hasConceptScore W2287850282C165696696 @default.
- W2287850282 hasConceptScore W2287850282C201995342 @default.
- W2287850282 hasConceptScore W2287850282C207467116 @default.
- W2287850282 hasConceptScore W2287850282C2524010 @default.
- W2287850282 hasConceptScore W2287850282C2780451532 @default.
- W2287850282 hasConceptScore W2287850282C33923547 @default.
- W2287850282 hasConceptScore W2287850282C38652104 @default.
- W2287850282 hasConceptScore W2287850282C41008148 @default.
- W2287850282 hasConceptScore W2287850282C62520636 @default.
- W2287850282 hasConceptScore W2287850282C90509273 @default.
- W2287850282 hasConceptScore W2287850282C9679016 @default.
- W2287850282 hasConceptScore W2287850282C97541855 @default.
- W2287850282 hasLocation W22878502821 @default.
- W2287850282 hasOpenAccess W2287850282 @default.
- W2287850282 hasPrimaryLocation W22878502821 @default.
- W2287850282 hasRelatedWork W1591675293 @default.
- W2287850282 hasRelatedWork W1986014385 @default.
- W2287850282 hasRelatedWork W1994648061 @default.
- W2287850282 hasRelatedWork W1999874108 @default.
- W2287850282 hasRelatedWork W2061562262 @default.
- W2287850282 hasRelatedWork W2098774185 @default.
- W2287850282 hasRelatedWork W2113023245 @default.
- W2287850282 hasRelatedWork W2117675763 @default.
- W2287850282 hasRelatedWork W2119567691 @default.
- W2287850282 hasRelatedWork W2121863487 @default.
- W2287850282 hasRelatedWork W2133068870 @default.
- W2287850282 hasRelatedWork W2145339207 @default.
- W2287850282 hasRelatedWork W2169498096 @default.
- W2287850282 hasRelatedWork W2171054284 @default.
- W2287850282 hasRelatedWork W2402734324 @default.
- W2287850282 hasRelatedWork W2794908222 @default.