Matches in SemOpenAlex for { <https://semopenalex.org/work/W3130395087> ?p ?o ?g. }
- W3130395087 endingPage "8248" @default.
- W3130395087 startingPage "8240" @default.
- W3130395087 abstract "In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. Instead, we observe trajectories sampled by an expert that acts according to some policy. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We introduce an online variant of AL (Online Apprenticeship Learning; OAL), where the agent is expected to perform comparably to the expert while interacting with the environment. We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms: one for policy optimization and another for learning the worst case cost. By employing optimistic exploration, we derive a convergent algorithm with O(sqrt(K)) regret, where K is the number of interactions with the MDP, and an additional linear error term that depends on the amount of expert trajectories available. Importantly, our algorithm avoids the need to solve an MDP at each iteration, making it more practical compared to prior AL methods. Finally, we implement a deep variant of our algorithm which shares some similarities to GAIL, but where the discriminator is replaced with the costs learned by OAL. Our simulations suggest that OAL performs well in high dimensional control problems." @default.
- W3130395087 created "2021-03-01" @default.
- W3130395087 creator A5018613019 @default.
- W3130395087 creator A5036260775 @default.
- W3130395087 creator A5049062714 @default.
- W3130395087 date "2022-06-28" @default.
- W3130395087 modified "2023-09-23" @default.
- W3130395087 title "Online Apprenticeship Learning" @default.
- W3130395087 cites W1505731132 @default.
- W3130395087 cites W1575592356 @default.
- W3130395087 cites W1771410628 @default.
- W3130395087 cites W1850488217 @default.
- W3130395087 cites W1986014385 @default.
- W3130395087 cites W1988790447 @default.
- W3130395087 cites W1996625075 @default.
- W3130395087 cites W1999874108 @default.
- W3130395087 cites W2016384870 @default.
- W3130395087 cites W2093825590 @default.
- W3130395087 cites W2099471712 @default.
- W3130395087 cites W2102847492 @default.
- W3130395087 cites W2106887613 @default.
- W3130395087 cites W2113023245 @default.
- W3130395087 cites W2115738253 @default.
- W3130395087 cites W2119567691 @default.
- W3130395087 cites W2121863487 @default.
- W3130395087 cites W2148112459 @default.
- W3130395087 cites W2158782408 @default.
- W3130395087 cites W21934178 @default.
- W3130395087 cites W2513180554 @default.
- W3130395087 cites W2753339894 @default.
- W3130395087 cites W2781726626 @default.
- W3130395087 cites W2914920107 @default.
- W3130395087 cites W2949608212 @default.
- W3130395087 cites W2949916679 @default.
- W3130395087 cites W2952854274 @default.
- W3130395087 cites W2962723383 @default.
- W3130395087 cites W2962879692 @default.
- W3130395087 cites W2963014947 @default.
- W3130395087 cites W2963277051 @default.
- W3130395087 cites W2963301010 @default.
- W3130395087 cites W2963582321 @default.
- W3130395087 cites W2970870329 @default.
- W3130395087 cites W2995551516 @default.
- W3130395087 cites W2997976910 @default.
- W3130395087 cites W2998050631 @default.
- W3130395087 cites W2998111914 @default.
- W3130395087 cites W2999385649 @default.
- W3130395087 cites W3007034372 @default.
- W3130395087 cites W3009820880 @default.
- W3130395087 cites W3026615607 @default.
- W3130395087 cites W3033836998 @default.
- W3130395087 cites W3034871777 @default.
- W3130395087 cites W3046395471 @default.
- W3130395087 cites W3046626913 @default.
- W3130395087 doi "https://doi.org/10.1609/aaai.v36i8.20798" @default.
- W3130395087 hasPublicationYear "2022" @default.
- W3130395087 type Work @default.
- W3130395087 sameAs 3130395087 @default.
- W3130395087 citedByCount "3" @default.
- W3130395087 countsByYear W31303950872021 @default.
- W3130395087 countsByYear W31303950872023 @default.
- W3130395087 crossrefType "journal-article" @default.
- W3130395087 hasAuthorship W3130395087A5018613019 @default.
- W3130395087 hasAuthorship W3130395087A5036260775 @default.
- W3130395087 hasAuthorship W3130395087A5049062714 @default.
- W3130395087 hasBestOaLocation W31303950871 @default.
- W3130395087 hasConcept C105795698 @default.
- W3130395087 hasConcept C106189395 @default.
- W3130395087 hasConcept C107806365 @default.
- W3130395087 hasConcept C119857082 @default.
- W3130395087 hasConcept C126255220 @default.
- W3130395087 hasConcept C138885662 @default.
- W3130395087 hasConcept C154945302 @default.
- W3130395087 hasConcept C159886148 @default.
- W3130395087 hasConcept C2779803651 @default.
- W3130395087 hasConcept C33923547 @default.
- W3130395087 hasConcept C41008148 @default.
- W3130395087 hasConcept C41895202 @default.
- W3130395087 hasConcept C50817715 @default.
- W3130395087 hasConcept C76155785 @default.
- W3130395087 hasConcept C94915269 @default.
- W3130395087 hasConceptScore W3130395087C105795698 @default.
- W3130395087 hasConceptScore W3130395087C106189395 @default.
- W3130395087 hasConceptScore W3130395087C107806365 @default.
- W3130395087 hasConceptScore W3130395087C119857082 @default.
- W3130395087 hasConceptScore W3130395087C126255220 @default.
- W3130395087 hasConceptScore W3130395087C138885662 @default.
- W3130395087 hasConceptScore W3130395087C154945302 @default.
- W3130395087 hasConceptScore W3130395087C159886148 @default.
- W3130395087 hasConceptScore W3130395087C2779803651 @default.
- W3130395087 hasConceptScore W3130395087C33923547 @default.
- W3130395087 hasConceptScore W3130395087C41008148 @default.
- W3130395087 hasConceptScore W3130395087C41895202 @default.
- W3130395087 hasConceptScore W3130395087C50817715 @default.
- W3130395087 hasConceptScore W3130395087C76155785 @default.
- W3130395087 hasConceptScore W3130395087C94915269 @default.
- W3130395087 hasIssue "8" @default.
- W3130395087 hasLocation W31303950871 @default.