Matches in SemOpenAlex for { <https://semopenalex.org/work/W4313641965> ?p ?o ?g. }
- W4313641965 endingPage "103856" @default.
- W4313641965 startingPage "103856" @default.
- W4313641965 abstract "In inverse reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. However, many existing IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs). We address two limitations of existing IRL techniques. First, they require an excessive amount of data due to the information asymmetry between the expert and the learner. Second, most of these IRL techniques require solving the computationally intractable forward problem—computing an optimal policy given a reward function—in POMDPs. The developed algorithm reduces the information asymmetry while increasing the data efficiency by incorporating task specifications expressed in temporal logic into IRL. Such specifications may be interpreted as side information available to the learner a priori in addition to the demonstrations. Further, the algorithm avoids a common source of algorithmic complexity by building on causal entropy as the measure of the likelihood of the demonstrations as opposed to entropy. Nevertheless, the resulting problem is nonconvex due to the so-called forward problem. We solve the intrinsic nonconvexity of the forward problem in a scalable manner through a sequential linear programming scheme that guarantees to converge to a locally optimal policy. In a series of examples, including experiments in a high-fidelity Unity simulator, we demonstrate that even with a limited amount of data and POMDPs with tens of thousands of states, our algorithm learns reward functions and policies that satisfy the task while inducing similar behavior to the expert by leveraging the provided side information." @default.
- W4313641965 created "2023-01-07" @default.
- W4313641965 creator A5024122189 @default.
- W4313641965 creator A5050465036 @default.
- W4313641965 creator A5068441112 @default.
- W4313641965 creator A5077138859 @default.
- W4313641965 creator A5086083363 @default.
- W4313641965 date "2023-04-01" @default.
- W4313641965 modified "2023-10-08" @default.
- W4313641965 title "Task-guided IRL in POMDPs that scales" @default.
- W4313641965 cites W2025217226 @default.
- W4313641965 cites W2051944318 @default.
- W4313641965 cites W2070469928 @default.
- W4313641965 cites W2080108722 @default.
- W4313641965 cites W2087992130 @default.
- W4313641965 cites W2105925198 @default.
- W4313641965 cites W2128823230 @default.
- W4313641965 cites W2134491302 @default.
- W4313641965 cites W2142477416 @default.
- W4313641965 cites W2594789366 @default.
- W4313641965 cites W2601943843 @default.
- W4313641965 cites W2914120968 @default.
- W4313641965 cites W2962946825 @default.
- W4313641965 cites W3179132544 @default.
- W4313641965 doi "https://doi.org/10.1016/j.artint.2023.103856" @default.
- W4313641965 hasPublicationYear "2023" @default.
- W4313641965 type Work @default.
- W4313641965 citedByCount "0" @default.
- W4313641965 crossrefType "journal-article" @default.
- W4313641965 hasAuthorship W4313641965A5024122189 @default.
- W4313641965 hasAuthorship W4313641965A5050465036 @default.
- W4313641965 hasAuthorship W4313641965A5068441112 @default.
- W4313641965 hasAuthorship W4313641965A5077138859 @default.
- W4313641965 hasAuthorship W4313641965A5086083363 @default.
- W4313641965 hasBestOaLocation W43136419652 @default.
- W4313641965 hasConcept C105795698 @default.
- W4313641965 hasConcept C106189395 @default.
- W4313641965 hasConcept C111472728 @default.
- W4313641965 hasConcept C11413529 @default.
- W4313641965 hasConcept C119857082 @default.
- W4313641965 hasConcept C126255220 @default.
- W4313641965 hasConcept C138885662 @default.
- W4313641965 hasConcept C14036430 @default.
- W4313641965 hasConcept C154945302 @default.
- W4313641965 hasConcept C159886148 @default.
- W4313641965 hasConcept C162324750 @default.
- W4313641965 hasConcept C163836022 @default.
- W4313641965 hasConcept C17098449 @default.
- W4313641965 hasConcept C187736073 @default.
- W4313641965 hasConcept C2780451532 @default.
- W4313641965 hasConcept C33923547 @default.
- W4313641965 hasConcept C37404715 @default.
- W4313641965 hasConcept C41008148 @default.
- W4313641965 hasConcept C48044578 @default.
- W4313641965 hasConcept C75553542 @default.
- W4313641965 hasConcept C77088390 @default.
- W4313641965 hasConcept C78458016 @default.
- W4313641965 hasConcept C86803240 @default.
- W4313641965 hasConcept C97541855 @default.
- W4313641965 hasConcept C98763669 @default.
- W4313641965 hasConceptScore W4313641965C105795698 @default.
- W4313641965 hasConceptScore W4313641965C106189395 @default.
- W4313641965 hasConceptScore W4313641965C111472728 @default.
- W4313641965 hasConceptScore W4313641965C11413529 @default.
- W4313641965 hasConceptScore W4313641965C119857082 @default.
- W4313641965 hasConceptScore W4313641965C126255220 @default.
- W4313641965 hasConceptScore W4313641965C138885662 @default.
- W4313641965 hasConceptScore W4313641965C14036430 @default.
- W4313641965 hasConceptScore W4313641965C154945302 @default.
- W4313641965 hasConceptScore W4313641965C159886148 @default.
- W4313641965 hasConceptScore W4313641965C162324750 @default.
- W4313641965 hasConceptScore W4313641965C163836022 @default.
- W4313641965 hasConceptScore W4313641965C17098449 @default.
- W4313641965 hasConceptScore W4313641965C187736073 @default.
- W4313641965 hasConceptScore W4313641965C2780451532 @default.
- W4313641965 hasConceptScore W4313641965C33923547 @default.
- W4313641965 hasConceptScore W4313641965C37404715 @default.
- W4313641965 hasConceptScore W4313641965C41008148 @default.
- W4313641965 hasConceptScore W4313641965C48044578 @default.
- W4313641965 hasConceptScore W4313641965C75553542 @default.
- W4313641965 hasConceptScore W4313641965C77088390 @default.
- W4313641965 hasConceptScore W4313641965C78458016 @default.
- W4313641965 hasConceptScore W4313641965C86803240 @default.
- W4313641965 hasConceptScore W4313641965C97541855 @default.
- W4313641965 hasConceptScore W4313641965C98763669 @default.
- W4313641965 hasFunder F4320337345 @default.
- W4313641965 hasFunder F4320338295 @default.
- W4313641965 hasLocation W43136419651 @default.
- W4313641965 hasLocation W43136419652 @default.
- W4313641965 hasOpenAccess W4313641965 @default.
- W4313641965 hasPrimaryLocation W43136419651 @default.
- W4313641965 hasRelatedWork W1511927616 @default.
- W4313641965 hasRelatedWork W1994682696 @default.
- W4313641965 hasRelatedWork W2103061585 @default.
- W4313641965 hasRelatedWork W2133764300 @default.
- W4313641965 hasRelatedWork W2161367706 @default.
- W4313641965 hasRelatedWork W2330493680 @default.
- W4313641965 hasRelatedWork W3092987701 @default.