Matches in SemOpenAlex for { <https://semopenalex.org/work/W3049102318> ?p ?o ?g. }
- W3049102318 abstract "This paper proposes Entropy-Regularized Imitation Learning (ERIL), which is a combination of forward and inverse reinforcement learning under the framework of the entropy-regularized Markov decision process. ERIL minimizes the reverse Kullback-Leibler (KL) divergence between two probability distributions induced by a learner and an expert. Inverse reinforcement learning (RL) in ERIL evaluates the log-ratio between two distributions using the density ratio trick, which is widely used in generative adversarial networks. More specifically, the log-ratio is estimated by building two binary discriminators. The first discriminator is a state-only function, and it tries to distinguish the state generated by the forward RL step from the expert's state. The second discriminator is a function of current state, action, and transitioned state, and it distinguishes the generated experiences from the ones provided by the expert. Since the second discriminator has the same hyperparameters of the forward RL step, it can be used to control the discriminator's ability. The forward RL minimizes the reverse KL estimated by the inverse RL. We show that minimizing the reverse KL divergence is equivalent to finding an optimal policy under entropy regularization. Consequently, a new policy is derived from an algorithm that resembles Dynamic Policy Programming and Soft Actor-Critic. Our experimental results on MuJoCo-simulated environments show that ERIL is more sample-efficient than such previous methods. We further apply the method to human behaviors in performing a pole-balancing task and show that the estimated reward functions show how every subject achieves the goal." @default.
- W3049102318 created "2020-08-21" @default.
- W3049102318 creator A5004840638 @default.
- W3049102318 creator A5031054137 @default.
- W3049102318 date "2020-08-17" @default.
- W3049102318 modified "2023-09-27" @default.
- W3049102318 title "Imitation learning based on entropy-regularized forward and inverse reinforcement learning." @default.
- W3049102318 cites W1771410628 @default.
- W3049102318 cites W1977655452 @default.
- W3049102318 cites W1977828796 @default.
- W3049102318 cites W1999874108 @default.
- W3049102318 cites W2021472139 @default.
- W3049102318 cites W2031067035 @default.
- W3049102318 cites W2061562262 @default.
- W3049102318 cites W2091789872 @default.
- W3049102318 cites W2098774185 @default.
- W3049102318 cites W2114984060 @default.
- W3049102318 cites W2119785746 @default.
- W3049102318 cites W2125612430 @default.
- W3049102318 cites W2145339207 @default.
- W3049102318 cites W2156347136 @default.
- W3049102318 cites W2158782408 @default.
- W3049102318 cites W2167224731 @default.
- W3049102318 cites W2171302338 @default.
- W3049102318 cites W2227909145 @default.
- W3049102318 cites W2398964619 @default.
- W3049102318 cites W2466175722 @default.
- W3049102318 cites W2751530711 @default.
- W3049102318 cites W2765861418 @default.
- W3049102318 cites W2774527530 @default.
- W3049102318 cites W2884247313 @default.
- W3049102318 cites W2911283634 @default.
- W3049102318 cites W2911383979 @default.
- W3049102318 cites W2921114252 @default.
- W3049102318 cites W2943868761 @default.
- W3049102318 cites W2944911388 @default.
- W3049102318 cites W2962787969 @default.
- W3049102318 cites W2962845991 @default.
- W3049102318 cites W2962901215 @default.
- W3049102318 cites W2962902376 @default.
- W3049102318 cites W2962957031 @default.
- W3049102318 cites W2963277051 @default.
- W3049102318 cites W2963301010 @default.
- W3049102318 cites W2963508354 @default.
- W3049102318 cites W2963864421 @default.
- W3049102318 cites W2963923407 @default.
- W3049102318 cites W2964121744 @default.
- W3049102318 cites W2966208741 @default.
- W3049102318 cites W2979776030 @default.
- W3049102318 cites W2983294627 @default.
- W3049102318 cites W3028821797 @default.
- W3049102318 cites W3036472058 @default.
- W3049102318 cites W3182474098 @default.
- W3049102318 cites W567721252 @default.
- W3049102318 cites W91088564 @default.
- W3049102318 hasPublicationYear "2020" @default.
- W3049102318 type Work @default.
- W3049102318 sameAs 3049102318 @default.
- W3049102318 citedByCount "0" @default.
- W3049102318 crossrefType "posted-content" @default.
- W3049102318 hasAuthorship W3049102318A5004840638 @default.
- W3049102318 hasAuthorship W3049102318A5031054137 @default.
- W3049102318 hasConcept C105795698 @default.
- W3049102318 hasConcept C106189395 @default.
- W3049102318 hasConcept C106301342 @default.
- W3049102318 hasConcept C11413529 @default.
- W3049102318 hasConcept C119857082 @default.
- W3049102318 hasConcept C121332964 @default.
- W3049102318 hasConcept C126255220 @default.
- W3049102318 hasConcept C138885662 @default.
- W3049102318 hasConcept C154945302 @default.
- W3049102318 hasConcept C159886148 @default.
- W3049102318 hasConcept C171752962 @default.
- W3049102318 hasConcept C207390915 @default.
- W3049102318 hasConcept C207467116 @default.
- W3049102318 hasConcept C2524010 @default.
- W3049102318 hasConcept C2776135515 @default.
- W3049102318 hasConcept C2779803651 @default.
- W3049102318 hasConcept C33923547 @default.
- W3049102318 hasConcept C41008148 @default.
- W3049102318 hasConcept C41895202 @default.
- W3049102318 hasConcept C62520636 @default.
- W3049102318 hasConcept C76155785 @default.
- W3049102318 hasConcept C94915269 @default.
- W3049102318 hasConcept C9679016 @default.
- W3049102318 hasConcept C97541855 @default.
- W3049102318 hasConceptScore W3049102318C105795698 @default.
- W3049102318 hasConceptScore W3049102318C106189395 @default.
- W3049102318 hasConceptScore W3049102318C106301342 @default.
- W3049102318 hasConceptScore W3049102318C11413529 @default.
- W3049102318 hasConceptScore W3049102318C119857082 @default.
- W3049102318 hasConceptScore W3049102318C121332964 @default.
- W3049102318 hasConceptScore W3049102318C126255220 @default.
- W3049102318 hasConceptScore W3049102318C138885662 @default.
- W3049102318 hasConceptScore W3049102318C154945302 @default.
- W3049102318 hasConceptScore W3049102318C159886148 @default.
- W3049102318 hasConceptScore W3049102318C171752962 @default.
- W3049102318 hasConceptScore W3049102318C207390915 @default.
- W3049102318 hasConceptScore W3049102318C207467116 @default.
- W3049102318 hasConceptScore W3049102318C2524010 @default.