Matches in SemOpenAlex for { <https://semopenalex.org/work/W3208414618> ?p ?o ?g. }
- W3208414618 abstract "Reinforcement learning (RL) requires access to a reward function thatincentivizes the right behavior, but these are notoriously hard to specify forcomplex tasks. Preference-based RL provides an alternative: learning policiesusing a teacher's preferences without pre-defined rewards, thus overcomingconcerns associated with reward engineering. However, it is difficult toquantify the progress in preference-based RL due to the lack of a commonlyadopted benchmark. In this paper, we introduce B-Pref: a benchmark speciallydesigned for preference-based RL. A key challenge with such a benchmark isproviding the ability to evaluate candidate algorithms quickly, which makesrelying on real human input for evaluation prohibitive. At the same time,simulating human input as giving perfect preferences for the ground truthreward function is unrealistic. B-Pref alleviates this by simulating teacherswith a wide array of irrationalities, and proposes metrics not solely forperformance but also for robustness to these potential irrationalities. Weshowcase the utility of B-Pref by using it to analyze algorithmic designchoices, such as selecting informative queries, for state-of-the-artpreference-based RL algorithms. We hope that B-Pref can serve as a commonstarting point to study preference-based RL more systematically. Source code isavailable at https://github.com/rll-research/B-Pref." @default.
- W3208414618 created "2021-11-08" @default.
- W3208414618 creator A5005997281 @default.
- W3208414618 creator A5033571780 @default.
- W3208414618 creator A5049349154 @default.
- W3208414618 creator A5057441223 @default.
- W3208414618 date "2021-11-04" @default.
- W3208414618 modified "2023-09-27" @default.
- W3208414618 title "B-Pref: Benchmarking Preference-Based Reinforcement Learning." @default.
- W3208414618 cites W122021961 @default.
- W3208414618 cites W1771410628 @default.
- W3208414618 cites W1977655452 @default.
- W3208414618 cites W2012392077 @default.
- W3208414618 cites W2034806191 @default.
- W3208414618 cites W2098584016 @default.
- W3208414618 cites W2101524054 @default.
- W3208414618 cites W2114188922 @default.
- W3208414618 cites W2116671302 @default.
- W3208414618 cites W2121863487 @default.
- W3208414618 cites W2122350398 @default.
- W3208414618 cites W2145339207 @default.
- W3208414618 cites W2156869222 @default.
- W3208414618 cites W2183341477 @default.
- W3208414618 cites W2462906003 @default.
- W3208414618 cites W2580300496 @default.
- W3208414618 cites W2735318784 @default.
- W3208414618 cites W2736601468 @default.
- W3208414618 cites W2763110165 @default.
- W3208414618 cites W2766447205 @default.
- W3208414618 cites W2781585732 @default.
- W3208414618 cites W2901707424 @default.
- W3208414618 cites W2902298341 @default.
- W3208414618 cites W2902907165 @default.
- W3208414618 cites W2911719076 @default.
- W3208414618 cites W2911940799 @default.
- W3208414618 cites W2951952563 @default.
- W3208414618 cites W2953326529 @default.
- W3208414618 cites W2962943921 @default.
- W3208414618 cites W2963120839 @default.
- W3208414618 cites W2963484919 @default.
- W3208414618 cites W2963489214 @default.
- W3208414618 cites W2963641140 @default.
- W3208414618 cites W2963646405 @default.
- W3208414618 cites W2963680188 @default.
- W3208414618 cites W2963827721 @default.
- W3208414618 cites W2964043796 @default.
- W3208414618 cites W2964059111 @default.
- W3208414618 cites W2964121744 @default.
- W3208414618 cites W2964263543 @default.
- W3208414618 cites W2964296021 @default.
- W3208414618 cites W2979211489 @default.
- W3208414618 cites W2981344907 @default.
- W3208414618 cites W2982316857 @default.
- W3208414618 cites W2990747716 @default.
- W3208414618 cites W2995709298 @default.
- W3208414618 cites W2996037775 @default.
- W3208414618 cites W3016525976 @default.
- W3208414618 cites W3030163527 @default.
- W3208414618 cites W3032377877 @default.
- W3208414618 cites W3034946435 @default.
- W3208414618 cites W3035015331 @default.
- W3208414618 cites W3035644784 @default.
- W3208414618 cites W3036619998 @default.
- W3208414618 cites W3098053103 @default.
- W3208414618 cites W3098584432 @default.
- W3208414618 cites W3098985263 @default.
- W3208414618 cites W3101009114 @default.
- W3208414618 cites W3107153805 @default.
- W3208414618 cites W3107646372 @default.
- W3208414618 cites W3115293622 @default.
- W3208414618 cites W3119908121 @default.
- W3208414618 cites W3131944163 @default.
- W3208414618 cites W3134444185 @default.
- W3208414618 cites W3166834300 @default.
- W3208414618 cites W3168856269 @default.
- W3208414618 cites W3171136997 @default.
- W3208414618 cites W3200980294 @default.
- W3208414618 cites W3211095199 @default.
- W3208414618 cites W3214152028 @default.
- W3208414618 cites W64088143 @default.
- W3208414618 hasPublicationYear "2021" @default.
- W3208414618 type Work @default.
- W3208414618 sameAs 3208414618 @default.
- W3208414618 citedByCount "0" @default.
- W3208414618 crossrefType "posted-content" @default.
- W3208414618 hasAuthorship W3208414618A5005997281 @default.
- W3208414618 hasAuthorship W3208414618A5033571780 @default.
- W3208414618 hasAuthorship W3208414618A5049349154 @default.
- W3208414618 hasAuthorship W3208414618A5057441223 @default.
- W3208414618 hasConcept C104317684 @default.
- W3208414618 hasConcept C105795698 @default.
- W3208414618 hasConcept C111919701 @default.
- W3208414618 hasConcept C119857082 @default.
- W3208414618 hasConcept C13280743 @default.
- W3208414618 hasConcept C14036430 @default.
- W3208414618 hasConcept C144133560 @default.
- W3208414618 hasConcept C154945302 @default.
- W3208414618 hasConcept C162853370 @default.
- W3208414618 hasConcept C177264268 @default.
- W3208414618 hasConcept C181204326 @default.