Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387687940> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W4387687940 abstract "Implementing a reward function that perfectly captures a complex task in the real world is impractical. As a result, it is often appropriate to think of the reward function as a proxy for the true objective rather than as its definition. We study this phenomenon through the lens of Goodhart's law, which predicts that increasing optimisation of an imperfect proxy beyond some critical point decreases performance on the true objective. First, we propose a way to quantify the magnitude of this effect and show empirically that optimising an imperfect proxy reward often leads to the behaviour predicted by Goodhart's law for a wide range of environments and reward functions. We then provide a geometric explanation for why Goodhart's law occurs in Markov decision processes. We use these theoretical insights to propose an optimal early stopping method that provably avoids the aforementioned pitfall and derive theoretical regret bounds for this method. Moreover, we derive a training method that maximises worst-case reward, for the setting where there is uncertainty about the true reward function. Finally, we evaluate our early stopping method experimentally. Our results support a foundation for a theoretically-principled study of reinforcement learning under reward misspecification." @default.
- W4387687940 created "2023-10-17" @default.
- W4387687940 creator A5008517862 @default.
- W4387687940 creator A5041653578 @default.
- W4387687940 creator A5060020102 @default.
- W4387687940 creator A5079709391 @default.
- W4387687940 creator A5093073519 @default.
- W4387687940 creator A5093073520 @default.
- W4387687940 date "2023-10-13" @default.
- W4387687940 modified "2023-10-18" @default.
- W4387687940 title "Goodhart's Law in Reinforcement Learning" @default.
- W4387687940 doi "https://doi.org/10.48550/arxiv.2310.09144" @default.
- W4387687940 hasPublicationYear "2023" @default.
- W4387687940 type Work @default.
- W4387687940 citedByCount "0" @default.
- W4387687940 crossrefType "posted-content" @default.
- W4387687940 hasAuthorship W4387687940A5008517862 @default.
- W4387687940 hasAuthorship W4387687940A5041653578 @default.
- W4387687940 hasAuthorship W4387687940A5060020102 @default.
- W4387687940 hasAuthorship W4387687940A5079709391 @default.
- W4387687940 hasAuthorship W4387687940A5093073519 @default.
- W4387687940 hasAuthorship W4387687940A5093073520 @default.
- W4387687940 hasBestOaLocation W43876879401 @default.
- W4387687940 hasConcept C119857082 @default.
- W4387687940 hasConcept C121332964 @default.
- W4387687940 hasConcept C132010649 @default.
- W4387687940 hasConcept C138885662 @default.
- W4387687940 hasConcept C14036430 @default.
- W4387687940 hasConcept C144237770 @default.
- W4387687940 hasConcept C149782125 @default.
- W4387687940 hasConcept C154945302 @default.
- W4387687940 hasConcept C15744967 @default.
- W4387687940 hasConcept C162324750 @default.
- W4387687940 hasConcept C188147891 @default.
- W4387687940 hasConcept C2780148112 @default.
- W4387687940 hasConcept C2780310539 @default.
- W4387687940 hasConcept C41008148 @default.
- W4387687940 hasConcept C41895202 @default.
- W4387687940 hasConcept C50335755 @default.
- W4387687940 hasConcept C50817715 @default.
- W4387687940 hasConcept C62520636 @default.
- W4387687940 hasConcept C78458016 @default.
- W4387687940 hasConcept C86803240 @default.
- W4387687940 hasConcept C97541855 @default.
- W4387687940 hasConceptScore W4387687940C119857082 @default.
- W4387687940 hasConceptScore W4387687940C121332964 @default.
- W4387687940 hasConceptScore W4387687940C132010649 @default.
- W4387687940 hasConceptScore W4387687940C138885662 @default.
- W4387687940 hasConceptScore W4387687940C14036430 @default.
- W4387687940 hasConceptScore W4387687940C144237770 @default.
- W4387687940 hasConceptScore W4387687940C149782125 @default.
- W4387687940 hasConceptScore W4387687940C154945302 @default.
- W4387687940 hasConceptScore W4387687940C15744967 @default.
- W4387687940 hasConceptScore W4387687940C162324750 @default.
- W4387687940 hasConceptScore W4387687940C188147891 @default.
- W4387687940 hasConceptScore W4387687940C2780148112 @default.
- W4387687940 hasConceptScore W4387687940C2780310539 @default.
- W4387687940 hasConceptScore W4387687940C41008148 @default.
- W4387687940 hasConceptScore W4387687940C41895202 @default.
- W4387687940 hasConceptScore W4387687940C50335755 @default.
- W4387687940 hasConceptScore W4387687940C50817715 @default.
- W4387687940 hasConceptScore W4387687940C62520636 @default.
- W4387687940 hasConceptScore W4387687940C78458016 @default.
- W4387687940 hasConceptScore W4387687940C86803240 @default.
- W4387687940 hasConceptScore W4387687940C97541855 @default.
- W4387687940 hasLocation W43876879401 @default.
- W4387687940 hasOpenAccess W4387687940 @default.
- W4387687940 hasPrimaryLocation W43876879401 @default.
- W4387687940 hasRelatedWork W118270247 @default.
- W4387687940 hasRelatedWork W1947085858 @default.
- W4387687940 hasRelatedWork W2101991911 @default.
- W4387687940 hasRelatedWork W2155070487 @default.
- W4387687940 hasRelatedWork W2174986909 @default.
- W4387687940 hasRelatedWork W2527791220 @default.
- W4387687940 hasRelatedWork W3123835761 @default.
- W4387687940 hasRelatedWork W4292701710 @default.
- W4387687940 hasRelatedWork W4311589891 @default.
- W4387687940 hasRelatedWork W4376155396 @default.
- W4387687940 isParatext "false" @default.
- W4387687940 isRetracted "false" @default.
- W4387687940 workType "article" @default.