Matches in SemOpenAlex for { <https://semopenalex.org/work/W2805805280> ?p ?o ?g. }
- W2805805280 abstract "In this paper, we propose to combine imitation and reinforcement learning via the idea of reward shaping using an oracle. We study the effectiveness of the near-optimal cost-to-go oracle on the planning horizon and demonstrate that the cost-to-go oracle shortens the learner's planning horizon as function of its accuracy: a globally optimal oracle can shorten the planning horizon to one, leading to a one-step greedy Markov Decision Process which is much easier to optimize, while an oracle that is far away from the optimality requires planning over a longer horizon to achieve near-optimal performance. Hence our new insight bridges the gap and interpolates between imitation learning and reinforcement learning. Motivated by the above mentioned insights, we propose Truncated HORizon Policy Search (THOR), a method that focuses on searching for policies that maximize the total reshaped reward over a finite planning horizon when the oracle is sub-optimal. We experimentally demonstrate that a gradient-based implementation of THOR can achieve superior performance compared to RL baselines and IL baselines even when the oracle is sub-optimal." @default.
- W2805805280 created "2018-06-13" @default.
- W2805805280 creator A5032266950 @default.
- W2805805280 creator A5034064327 @default.
- W2805805280 creator A5052552981 @default.
- W2805805280 date "2018-05-29" @default.
- W2805805280 modified "2023-09-27" @default.
- W2805805280 title "Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning" @default.
- W2805805280 cites W112666333 @default.
- W2805805280 cites W142858861 @default.
- W2805805280 cites W1564755532 @default.
- W2805805280 cites W1575592356 @default.
- W2805805280 cites W1777239053 @default.
- W2805805280 cites W1999874108 @default.
- W2805805280 cites W2098618477 @default.
- W2805805280 cites W2100677568 @default.
- W2805805280 cites W2119717200 @default.
- W2805805280 cites W2130801532 @default.
- W2805805280 cites W2165421048 @default.
- W2805805280 cites W2173248099 @default.
- W2805805280 cites W2257979135 @default.
- W2805805280 cites W2487501366 @default.
- W2805805280 cites W2594640072 @default.
- W2805805280 cites W2741122588 @default.
- W2805805280 cites W2757631751 @default.
- W2805805280 cites W2949600092 @default.
- W2805805280 cites W2949608212 @default.
- W2805805280 cites W2952840881 @default.
- W2805805280 cites W2962957031 @default.
- W2805805280 cites W2963099939 @default.
- W2805805280 hasPublicationYear "2018" @default.
- W2805805280 type Work @default.
- W2805805280 sameAs 2805805280 @default.
- W2805805280 citedByCount "21" @default.
- W2805805280 countsByYear W28058052802018 @default.
- W2805805280 countsByYear W28058052802019 @default.
- W2805805280 countsByYear W28058052802020 @default.
- W2805805280 countsByYear W28058052802021 @default.
- W2805805280 crossrefType "posted-content" @default.
- W2805805280 hasAuthorship W2805805280A5032266950 @default.
- W2805805280 hasAuthorship W2805805280A5034064327 @default.
- W2805805280 hasAuthorship W2805805280A5052552981 @default.
- W2805805280 hasConcept C105795698 @default.
- W2805805280 hasConcept C106189395 @default.
- W2805805280 hasConcept C115903868 @default.
- W2805805280 hasConcept C126255220 @default.
- W2805805280 hasConcept C126388530 @default.
- W2805805280 hasConcept C14036430 @default.
- W2805805280 hasConcept C154945302 @default.
- W2805805280 hasConcept C15744967 @default.
- W2805805280 hasConcept C159176650 @default.
- W2805805280 hasConcept C159886148 @default.
- W2805805280 hasConcept C2524010 @default.
- W2805805280 hasConcept C28761237 @default.
- W2805805280 hasConcept C33923547 @default.
- W2805805280 hasConcept C41008148 @default.
- W2805805280 hasConcept C55166926 @default.
- W2805805280 hasConcept C77805123 @default.
- W2805805280 hasConcept C78458016 @default.
- W2805805280 hasConcept C86803240 @default.
- W2805805280 hasConcept C97541855 @default.
- W2805805280 hasConceptScore W2805805280C105795698 @default.
- W2805805280 hasConceptScore W2805805280C106189395 @default.
- W2805805280 hasConceptScore W2805805280C115903868 @default.
- W2805805280 hasConceptScore W2805805280C126255220 @default.
- W2805805280 hasConceptScore W2805805280C126388530 @default.
- W2805805280 hasConceptScore W2805805280C14036430 @default.
- W2805805280 hasConceptScore W2805805280C154945302 @default.
- W2805805280 hasConceptScore W2805805280C15744967 @default.
- W2805805280 hasConceptScore W2805805280C159176650 @default.
- W2805805280 hasConceptScore W2805805280C159886148 @default.
- W2805805280 hasConceptScore W2805805280C2524010 @default.
- W2805805280 hasConceptScore W2805805280C28761237 @default.
- W2805805280 hasConceptScore W2805805280C33923547 @default.
- W2805805280 hasConceptScore W2805805280C41008148 @default.
- W2805805280 hasConceptScore W2805805280C55166926 @default.
- W2805805280 hasConceptScore W2805805280C77805123 @default.
- W2805805280 hasConceptScore W2805805280C78458016 @default.
- W2805805280 hasConceptScore W2805805280C86803240 @default.
- W2805805280 hasConceptScore W2805805280C97541855 @default.
- W2805805280 hasLocation W28058052801 @default.
- W2805805280 hasOpenAccess W2805805280 @default.
- W2805805280 hasPrimaryLocation W28058052801 @default.
- W2805805280 hasRelatedWork W112666333 @default.
- W2805805280 hasRelatedWork W1771410628 @default.
- W2805805280 hasRelatedWork W1986014385 @default.
- W2805805280 hasRelatedWork W2098774185 @default.
- W2805805280 hasRelatedWork W2145339207 @default.
- W2805805280 hasRelatedWork W2155968351 @default.
- W2805805280 hasRelatedWork W2158782408 @default.
- W2805805280 hasRelatedWork W2173248099 @default.
- W2805805280 hasRelatedWork W2257979135 @default.
- W2805805280 hasRelatedWork W2594640072 @default.
- W2805805280 hasRelatedWork W2736601468 @default.
- W2805805280 hasRelatedWork W2741122588 @default.
- W2805805280 hasRelatedWork W2804930149 @default.
- W2805805280 hasRelatedWork W2962957031 @default.
- W2805805280 hasRelatedWork W2963098081 @default.
- W2805805280 hasRelatedWork W2963099939 @default.
- W2805805280 hasRelatedWork W2963277051 @default.