Matches in SemOpenAlex for { <https://semopenalex.org/work/W2968771440> ?p ?o ?g. }
- W2968771440 abstract "Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we study when an RL agent has an instrumental goal to tamper with its reward process, and describe design principles that prevent instrumental goals for two different types of reward tampering (reward function tampering and RF-input tampering). Combined, the design principles can prevent both types of reward tampering from being instrumental goals. The analysis benefits from causal influence diagrams to provide intuitive yet precise formalizations." @default.
- W2968771440 created "2019-08-22" @default.
- W2968771440 creator A5020224050 @default.
- W2968771440 creator A5052300917 @default.
- W2968771440 creator A5073944062 @default.
- W2968771440 creator A5082761102 @default.
- W2968771440 date "2019-08-13" @default.
- W2968771440 modified "2023-09-27" @default.
- W2968771440 title "Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective" @default.
- W2968771440 cites W133789137 @default.
- W2968771440 cites W145787216 @default.
- W2968771440 cites W1516659296 @default.
- W2968771440 cites W1534384411 @default.
- W2968771440 cites W1567742585 @default.
- W2968771440 cites W1581742186 @default.
- W2968771440 cites W1586718744 @default.
- W2968771440 cites W1624823515 @default.
- W2968771440 cites W163024403 @default.
- W2968771440 cites W1689629105 @default.
- W2968771440 cites W1993091619 @default.
- W2968771440 cites W1998754086 @default.
- W2968771440 cites W2020595559 @default.
- W2968771440 cites W2025440394 @default.
- W2968771440 cites W2038908222 @default.
- W2968771440 cites W2059214603 @default.
- W2968771440 cites W2061562262 @default.
- W2968771440 cites W2103561211 @default.
- W2968771440 cites W2121863487 @default.
- W2968771440 cites W2133752818 @default.
- W2968771440 cites W2139774323 @default.
- W2968771440 cites W2143891888 @default.
- W2968771440 cites W2144240978 @default.
- W2968771440 cites W2144349330 @default.
- W2968771440 cites W2144863733 @default.
- W2968771440 cites W2145339207 @default.
- W2968771440 cites W2168359464 @default.
- W2968771440 cites W2188233853 @default.
- W2968771440 cites W2215775476 @default.
- W2968771440 cites W2224222633 @default.
- W2968771440 cites W2257979135 @default.
- W2968771440 cites W2350695713 @default.
- W2968771440 cites W2383406194 @default.
- W2968771440 cites W2416133397 @default.
- W2968771440 cites W2574075983 @default.
- W2968771440 cites W2736629007 @default.
- W2968771440 cites W2738669288 @default.
- W2968771440 cites W2738675347 @default.
- W2968771440 cites W2759471388 @default.
- W2968771440 cites W2761873684 @default.
- W2968771440 cites W2768908787 @default.
- W2968771440 cites W2770150859 @default.
- W2968771440 cites W2772709170 @default.
- W2968771440 cites W2792012198 @default.
- W2968771440 cites W2888826999 @default.
- W2968771440 cites W2896930824 @default.
- W2968771440 cites W2900559324 @default.
- W2968771440 cites W2901707424 @default.
- W2968771440 cites W2902125520 @default.
- W2968771440 cites W2913758949 @default.
- W2968771440 cites W2917742641 @default.
- W2968771440 cites W2917770073 @default.
- W2968771440 cites W2920362155 @default.
- W2968771440 cites W2948625193 @default.
- W2968771440 cites W2949800005 @default.
- W2968771440 cites W2951273977 @default.
- W2968771440 cites W2955240493 @default.
- W2968771440 cites W2963289505 @default.
- W2968771440 cites W2963569233 @default.
- W2968771440 cites W2963646405 @default.
- W2968771440 cites W2963960193 @default.
- W2968771440 cites W2964043796 @default.
- W2968771440 cites W2964263543 @default.
- W2968771440 cites W2964281483 @default.
- W2968771440 cites W3022566517 @default.
- W2968771440 cites W3035644784 @default.
- W2968771440 cites W3082042211 @default.
- W2968771440 cites W3094020431 @default.
- W2968771440 cites W3101172017 @default.
- W2968771440 cites W3101852789 @default.
- W2968771440 cites W3103451896 @default.
- W2968771440 cites W3104160712 @default.
- W2968771440 cites W3105871743 @default.
- W2968771440 cites W3115918552 @default.
- W2968771440 cites W3118210634 @default.
- W2968771440 cites W3151924600 @default.
- W2968771440 cites W591538471 @default.
- W2968771440 cites W648152870 @default.
- W2968771440 hasPublicationYear "2019" @default.
- W2968771440 type Work @default.
- W2968771440 sameAs 2968771440 @default.
- W2968771440 citedByCount "14" @default.
- W2968771440 countsByYear W29687714402018 @default.
- W2968771440 countsByYear W29687714402019 @default.
- W2968771440 countsByYear W29687714402020 @default.
- W2968771440 countsByYear W29687714402021 @default.
- W2968771440 crossrefType "posted-content" @default.
- W2968771440 hasAuthorship W2968771440A5020224050 @default.
- W2968771440 hasAuthorship W2968771440A5052300917 @default.
- W2968771440 hasAuthorship W2968771440A5073944062 @default.
- W2968771440 hasAuthorship W2968771440A5082761102 @default.