SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287752772> ?p ?o ?g. }

Showing items 1 to 72 of 72 with 100 items per page.

W4287752772 abstract "Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from human teachers can be collected either passively or actively in a variety of forms: passive data sources include demonstrations, (e.g., kinesthetic guidance), whereas preferences (e.g., comparative rankings) are actively elicited. Prior research has independently applied reward learning to these different data sources. However, there exist many domains where multiple sources are complementary and expressive. Motivated by this general problem, we present a framework to integrate multiple sources of information, which are either passively or actively collected from human users. In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on their true reward. This algorithm not only enables us combine multiple data sources, but it also informs the robot when it should leverage each type of information. Further, our approach accounts for the human's ability to provide data: yielding user-friendly preference queries which are also theoretically optimal. Our extensive simulated experiments and user studies on a Fetch mobile manipulator demonstrate the superiority and the usability of our integrated framework." @default.
W4287752772 created "2022-07-26" @default.
W4287752772 creator A5026703888 @default.
W4287752772 creator A5031426401 @default.
W4287752772 creator A5059508573 @default.
W4287752772 creator A5063608480 @default.
W4287752772 creator A5074251295 @default.
W4287752772 creator A5080725225 @default.
W4287752772 date "2020-06-24" @default.
W4287752772 modified "2023-09-27" @default.
W4287752772 title "Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences" @default.
W4287752772 doi "https://doi.org/10.48550/arxiv.2006.14091" @default.
W4287752772 hasPublicationYear "2020" @default.
W4287752772 type Work @default.
W4287752772 citedByCount "0" @default.
W4287752772 crossrefType "posted-content" @default.
W4287752772 hasAuthorship W4287752772A5026703888 @default.
W4287752772 hasAuthorship W4287752772A5031426401 @default.
W4287752772 hasAuthorship W4287752772A5059508573 @default.
W4287752772 hasAuthorship W4287752772A5063608480 @default.
W4287752772 hasAuthorship W4287752772A5074251295 @default.
W4287752772 hasAuthorship W4287752772A5080725225 @default.
W4287752772 hasBestOaLocation W42877527721 @default.
W4287752772 hasConcept C105795698 @default.
W4287752772 hasConcept C107457646 @default.
W4287752772 hasConcept C119857082 @default.
W4287752772 hasConcept C136197465 @default.
W4287752772 hasConcept C14036430 @default.
W4287752772 hasConcept C145420912 @default.
W4287752772 hasConcept C153083717 @default.
W4287752772 hasConcept C154945302 @default.
W4287752772 hasConcept C170130773 @default.
W4287752772 hasConcept C2781249084 @default.
W4287752772 hasConcept C33923547 @default.
W4287752772 hasConcept C41008148 @default.
W4287752772 hasConcept C55457006 @default.
W4287752772 hasConcept C78458016 @default.
W4287752772 hasConcept C86803240 @default.
W4287752772 hasConcept C97541855 @default.
W4287752772 hasConceptScore W4287752772C105795698 @default.
W4287752772 hasConceptScore W4287752772C107457646 @default.
W4287752772 hasConceptScore W4287752772C119857082 @default.
W4287752772 hasConceptScore W4287752772C136197465 @default.
W4287752772 hasConceptScore W4287752772C14036430 @default.
W4287752772 hasConceptScore W4287752772C145420912 @default.
W4287752772 hasConceptScore W4287752772C153083717 @default.
W4287752772 hasConceptScore W4287752772C154945302 @default.
W4287752772 hasConceptScore W4287752772C170130773 @default.
W4287752772 hasConceptScore W4287752772C2781249084 @default.
W4287752772 hasConceptScore W4287752772C33923547 @default.
W4287752772 hasConceptScore W4287752772C41008148 @default.
W4287752772 hasConceptScore W4287752772C55457006 @default.
W4287752772 hasConceptScore W4287752772C78458016 @default.
W4287752772 hasConceptScore W4287752772C86803240 @default.
W4287752772 hasConceptScore W4287752772C97541855 @default.
W4287752772 hasLocation W42877527721 @default.
W4287752772 hasLocation W42877527722 @default.
W4287752772 hasOpenAccess W4287752772 @default.
W4287752772 hasPrimaryLocation W42877527721 @default.
W4287752772 hasRelatedWork W1994147225 @default.
W4287752772 hasRelatedWork W2049870071 @default.
W4287752772 hasRelatedWork W2908092972 @default.
W4287752772 hasRelatedWork W3022038857 @default.
W4287752772 hasRelatedWork W3080901762 @default.
W4287752772 hasRelatedWork W3203000071 @default.
W4287752772 hasRelatedWork W4287626175 @default.
W4287752772 hasRelatedWork W4307308173 @default.
W4287752772 hasRelatedWork W4318719223 @default.
W4287752772 hasRelatedWork W4319083788 @default.
W4287752772 isParatext "false" @default.
W4287752772 isRetracted "false" @default.
W4287752772 workType "article" @default.