Matches in SemOpenAlex for { <https://semopenalex.org/work/W3201210810> ?p ?o ?g. }
Showing items 1 to 97 of
97
with 100 items per page.
- W3201210810 abstract "Fluid human-agent communication is essential for the future of human-in-the-loop reinforcement learning. An agent must respond appropriately to feedback from its human trainer even before they have significant experience working together. Therefore, it is important that learning agents respond well to various feedback schemes human trainers are likely to provide. This work analyzes the COnvergent Actor-Critic by Humans (COACH) algorithm under three different types of feedback-policy feedback, reward feedback, and advantage feedback. For these three feedback types, we find that COACH can behave sub-optimally. We propose a variant of COACH, episodic COACH (E-COACH), which we prove converges for all three types. We compare our COACH variant with two other reinforcement-learning algorithms: Q-learning and TAMER." @default.
- W3201210810 created "2021-09-27" @default.
- W3201210810 creator A5009722403 @default.
- W3201210810 creator A5015320314 @default.
- W3201210810 creator A5022949932 @default.
- W3201210810 creator A5037667167 @default.
- W3201210810 date "2021-09-15" @default.
- W3201210810 modified "2023-09-27" @default.
- W3201210810 title "Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback." @default.
- W3201210810 cites W1626977535 @default.
- W3201210810 cites W2082261506 @default.
- W3201210810 cites W2150339816 @default.
- W3201210810 cites W2155027007 @default.
- W3201210810 cites W2489939061 @default.
- W3201210810 cites W2580300496 @default.
- W3201210810 cites W2592651140 @default.
- W3201210810 cites W2695227890 @default.
- W3201210810 cites W2916746008 @default.
- W3201210810 cites W2965622170 @default.
- W3201210810 cites W3011120880 @default.
- W3201210810 cites W3023407077 @default.
- W3201210810 cites W3039845099 @default.
- W3201210810 cites W3109546547 @default.
- W3201210810 hasPublicationYear "2021" @default.
- W3201210810 type Work @default.
- W3201210810 sameAs 3201210810 @default.
- W3201210810 citedByCount "0" @default.
- W3201210810 crossrefType "posted-content" @default.
- W3201210810 hasAuthorship W3201210810A5009722403 @default.
- W3201210810 hasAuthorship W3201210810A5015320314 @default.
- W3201210810 hasAuthorship W3201210810A5022949932 @default.
- W3201210810 hasAuthorship W3201210810A5037667167 @default.
- W3201210810 hasConcept C114614502 @default.
- W3201210810 hasConcept C138885662 @default.
- W3201210810 hasConcept C154945302 @default.
- W3201210810 hasConcept C162324750 @default.
- W3201210810 hasConcept C184670325 @default.
- W3201210810 hasConcept C186886427 @default.
- W3201210810 hasConcept C199360897 @default.
- W3201210810 hasConcept C2775924081 @default.
- W3201210810 hasConcept C2777303404 @default.
- W3201210810 hasConcept C2780463512 @default.
- W3201210810 hasConcept C2780626000 @default.
- W3201210810 hasConcept C33923547 @default.
- W3201210810 hasConcept C38652104 @default.
- W3201210810 hasConcept C41008148 @default.
- W3201210810 hasConcept C41895202 @default.
- W3201210810 hasConcept C47446073 @default.
- W3201210810 hasConcept C50522688 @default.
- W3201210810 hasConcept C75291252 @default.
- W3201210810 hasConcept C97541855 @default.
- W3201210810 hasConceptScore W3201210810C114614502 @default.
- W3201210810 hasConceptScore W3201210810C138885662 @default.
- W3201210810 hasConceptScore W3201210810C154945302 @default.
- W3201210810 hasConceptScore W3201210810C162324750 @default.
- W3201210810 hasConceptScore W3201210810C184670325 @default.
- W3201210810 hasConceptScore W3201210810C186886427 @default.
- W3201210810 hasConceptScore W3201210810C199360897 @default.
- W3201210810 hasConceptScore W3201210810C2775924081 @default.
- W3201210810 hasConceptScore W3201210810C2777303404 @default.
- W3201210810 hasConceptScore W3201210810C2780463512 @default.
- W3201210810 hasConceptScore W3201210810C2780626000 @default.
- W3201210810 hasConceptScore W3201210810C33923547 @default.
- W3201210810 hasConceptScore W3201210810C38652104 @default.
- W3201210810 hasConceptScore W3201210810C41008148 @default.
- W3201210810 hasConceptScore W3201210810C41895202 @default.
- W3201210810 hasConceptScore W3201210810C47446073 @default.
- W3201210810 hasConceptScore W3201210810C50522688 @default.
- W3201210810 hasConceptScore W3201210810C75291252 @default.
- W3201210810 hasConceptScore W3201210810C97541855 @default.
- W3201210810 hasLocation W32012108101 @default.
- W3201210810 hasOpenAccess W3201210810 @default.
- W3201210810 hasPrimaryLocation W32012108101 @default.
- W3201210810 hasRelatedWork W194754089 @default.
- W3201210810 hasRelatedWork W1979308911 @default.
- W3201210810 hasRelatedWork W2116157560 @default.
- W3201210810 hasRelatedWork W2145273096 @default.
- W3201210810 hasRelatedWork W2345574366 @default.
- W3201210810 hasRelatedWork W2405915117 @default.
- W3201210810 hasRelatedWork W2515409829 @default.
- W3201210810 hasRelatedWork W2576787642 @default.
- W3201210810 hasRelatedWork W2617694506 @default.
- W3201210810 hasRelatedWork W2897200624 @default.
- W3201210810 hasRelatedWork W2933599700 @default.
- W3201210810 hasRelatedWork W2963452950 @default.
- W3201210810 hasRelatedWork W3032032218 @default.
- W3201210810 hasRelatedWork W3037551211 @default.
- W3201210810 hasRelatedWork W3048454540 @default.
- W3201210810 hasRelatedWork W309123013 @default.
- W3201210810 hasRelatedWork W3105184920 @default.
- W3201210810 hasRelatedWork W3155957742 @default.
- W3201210810 hasRelatedWork W3173218700 @default.
- W3201210810 hasRelatedWork W3197006425 @default.
- W3201210810 isParatext "false" @default.
- W3201210810 isRetracted "false" @default.
- W3201210810 magId "3201210810" @default.
- W3201210810 workType "article" @default.