Matches in SemOpenAlex for { <https://semopenalex.org/work/W4309804935> ?p ?o ?g. }
Showing items 1 to 86 of
86
with 100 items per page.
- W4309804935 abstract "An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulated 3D world. We then asked annotators to record moments where they believed that agents either progressed toward or regressed from their human-instructed goal. Using this annotation data we leveraged a novel method - which we call Inter-temporal Bradley-Terry (IBT) modelling - to build a reward model that captures human judgments. Agents trained to optimise rewards delivered from IBT reward models improved with respect to all of our metrics, including subsequent human judgment during live interactions with agents. Altogether our results demonstrate how one can successfully leverage human judgments to improve agent behaviour, allowing us to use reinforcement learning in complex, embodied domains without programmatic reward functions. Videos of agent behaviour may be found at https://youtu.be/v_Z9F2_eKk4." @default.
- W4309804935 created "2022-11-29" @default.
- W4309804935 creator A5008613026 @default.
- W4309804935 creator A5010781971 @default.
- W4309804935 creator A5013028446 @default.
- W4309804935 creator A5014071624 @default.
- W4309804935 creator A5030538328 @default.
- W4309804935 creator A5033390440 @default.
- W4309804935 creator A5033437530 @default.
- W4309804935 creator A5037458498 @default.
- W4309804935 creator A5039426831 @default.
- W4309804935 creator A5040662871 @default.
- W4309804935 creator A5044961078 @default.
- W4309804935 creator A5045881704 @default.
- W4309804935 creator A5049124418 @default.
- W4309804935 creator A5054905400 @default.
- W4309804935 creator A5059348157 @default.
- W4309804935 creator A5066294254 @default.
- W4309804935 creator A5080601982 @default.
- W4309804935 creator A5081473628 @default.
- W4309804935 creator A5089917436 @default.
- W4309804935 date "2022-11-21" @default.
- W4309804935 modified "2023-09-30" @default.
- W4309804935 title "Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback" @default.
- W4309804935 doi "https://doi.org/10.48550/arxiv.2211.11602" @default.
- W4309804935 hasPublicationYear "2022" @default.
- W4309804935 type Work @default.
- W4309804935 citedByCount "1" @default.
- W4309804935 countsByYear W43098049352023 @default.
- W4309804935 crossrefType "posted-content" @default.
- W4309804935 hasAuthorship W4309804935A5008613026 @default.
- W4309804935 hasAuthorship W4309804935A5010781971 @default.
- W4309804935 hasAuthorship W4309804935A5013028446 @default.
- W4309804935 hasAuthorship W4309804935A5014071624 @default.
- W4309804935 hasAuthorship W4309804935A5030538328 @default.
- W4309804935 hasAuthorship W4309804935A5033390440 @default.
- W4309804935 hasAuthorship W4309804935A5033437530 @default.
- W4309804935 hasAuthorship W4309804935A5037458498 @default.
- W4309804935 hasAuthorship W4309804935A5039426831 @default.
- W4309804935 hasAuthorship W4309804935A5040662871 @default.
- W4309804935 hasAuthorship W4309804935A5044961078 @default.
- W4309804935 hasAuthorship W4309804935A5045881704 @default.
- W4309804935 hasAuthorship W4309804935A5049124418 @default.
- W4309804935 hasAuthorship W4309804935A5054905400 @default.
- W4309804935 hasAuthorship W4309804935A5059348157 @default.
- W4309804935 hasAuthorship W4309804935A5066294254 @default.
- W4309804935 hasAuthorship W4309804935A5080601982 @default.
- W4309804935 hasAuthorship W4309804935A5081473628 @default.
- W4309804935 hasAuthorship W4309804935A5089917436 @default.
- W4309804935 hasBestOaLocation W43098049351 @default.
- W4309804935 hasConcept C100609095 @default.
- W4309804935 hasConcept C103683099 @default.
- W4309804935 hasConcept C107457646 @default.
- W4309804935 hasConcept C126388530 @default.
- W4309804935 hasConcept C153083717 @default.
- W4309804935 hasConcept C154945302 @default.
- W4309804935 hasConcept C15744967 @default.
- W4309804935 hasConcept C41008148 @default.
- W4309804935 hasConcept C77805123 @default.
- W4309804935 hasConcept C97541855 @default.
- W4309804935 hasConceptScore W4309804935C100609095 @default.
- W4309804935 hasConceptScore W4309804935C103683099 @default.
- W4309804935 hasConceptScore W4309804935C107457646 @default.
- W4309804935 hasConceptScore W4309804935C126388530 @default.
- W4309804935 hasConceptScore W4309804935C153083717 @default.
- W4309804935 hasConceptScore W4309804935C154945302 @default.
- W4309804935 hasConceptScore W4309804935C15744967 @default.
- W4309804935 hasConceptScore W4309804935C41008148 @default.
- W4309804935 hasConceptScore W4309804935C77805123 @default.
- W4309804935 hasConceptScore W4309804935C97541855 @default.
- W4309804935 hasLocation W43098049351 @default.
- W4309804935 hasOpenAccess W4309804935 @default.
- W4309804935 hasPrimaryLocation W43098049351 @default.
- W4309804935 hasRelatedWork W1488803062 @default.
- W4309804935 hasRelatedWork W1522117956 @default.
- W4309804935 hasRelatedWork W1595897272 @default.
- W4309804935 hasRelatedWork W2028773068 @default.
- W4309804935 hasRelatedWork W2031296774 @default.
- W4309804935 hasRelatedWork W2068486122 @default.
- W4309804935 hasRelatedWork W2111004320 @default.
- W4309804935 hasRelatedWork W2742181818 @default.
- W4309804935 hasRelatedWork W2942109448 @default.
- W4309804935 hasRelatedWork W3109657217 @default.
- W4309804935 isParatext "false" @default.
- W4309804935 isRetracted "false" @default.
- W4309804935 workType "article" @default.