Matches in SemOpenAlex for { <https://semopenalex.org/work/W4224256562> ?p ?o ?g. }
Showing items 1 to 61 of
61
with 100 items per page.
- W4224256562 abstract "We present a general framework for training safe agents whose naive incentives are unsafe. As an example, manipulative or deceptive behaviour can improve rewards but should be avoided. Most approaches fail here: agents maximize expected return by any means necessary. We formally describe settings with 'delicate' parts of the state which should not be used as a means to an end. We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis. The resulting agents have no incentive to control the delicate state. We further show how our framework unifies and generalizes existing proposals." @default.
- W4224256562 created "2022-04-26" @default.
- W4224256562 creator A5020224050 @default.
- W4224256562 creator A5051690066 @default.
- W4224256562 creator A5061187034 @default.
- W4224256562 date "2022-04-21" @default.
- W4224256562 modified "2023-09-27" @default.
- W4224256562 title "Path-Specific Objectives for Safer Agent Incentives" @default.
- W4224256562 doi "https://doi.org/10.48550/arxiv.2204.10018" @default.
- W4224256562 hasPublicationYear "2022" @default.
- W4224256562 type Work @default.
- W4224256562 citedByCount "0" @default.
- W4224256562 crossrefType "posted-content" @default.
- W4224256562 hasAuthorship W4224256562A5020224050 @default.
- W4224256562 hasAuthorship W4224256562A5051690066 @default.
- W4224256562 hasAuthorship W4224256562A5061187034 @default.
- W4224256562 hasBestOaLocation W42242565621 @default.
- W4224256562 hasConcept C112930515 @default.
- W4224256562 hasConcept C11413529 @default.
- W4224256562 hasConcept C144133560 @default.
- W4224256562 hasConcept C154945302 @default.
- W4224256562 hasConcept C162324750 @default.
- W4224256562 hasConcept C175444787 @default.
- W4224256562 hasConcept C199360897 @default.
- W4224256562 hasConcept C2775924081 @default.
- W4224256562 hasConcept C2776654903 @default.
- W4224256562 hasConcept C2777735758 @default.
- W4224256562 hasConcept C29122968 @default.
- W4224256562 hasConcept C38652104 @default.
- W4224256562 hasConcept C41008148 @default.
- W4224256562 hasConcept C48103436 @default.
- W4224256562 hasConceptScore W4224256562C112930515 @default.
- W4224256562 hasConceptScore W4224256562C11413529 @default.
- W4224256562 hasConceptScore W4224256562C144133560 @default.
- W4224256562 hasConceptScore W4224256562C154945302 @default.
- W4224256562 hasConceptScore W4224256562C162324750 @default.
- W4224256562 hasConceptScore W4224256562C175444787 @default.
- W4224256562 hasConceptScore W4224256562C199360897 @default.
- W4224256562 hasConceptScore W4224256562C2775924081 @default.
- W4224256562 hasConceptScore W4224256562C2776654903 @default.
- W4224256562 hasConceptScore W4224256562C2777735758 @default.
- W4224256562 hasConceptScore W4224256562C29122968 @default.
- W4224256562 hasConceptScore W4224256562C38652104 @default.
- W4224256562 hasConceptScore W4224256562C41008148 @default.
- W4224256562 hasConceptScore W4224256562C48103436 @default.
- W4224256562 hasLocation W42242565621 @default.
- W4224256562 hasOpenAccess W4224256562 @default.
- W4224256562 hasPrimaryLocation W42242565621 @default.
- W4224256562 hasRelatedWork W1498800420 @default.
- W4224256562 hasRelatedWork W1515663861 @default.
- W4224256562 hasRelatedWork W2007572891 @default.
- W4224256562 hasRelatedWork W2030361773 @default.
- W4224256562 hasRelatedWork W2085855985 @default.
- W4224256562 hasRelatedWork W2108420737 @default.
- W4224256562 hasRelatedWork W2370114627 @default.
- W4224256562 hasRelatedWork W2381194467 @default.
- W4224256562 hasRelatedWork W2381984008 @default.
- W4224256562 hasRelatedWork W4238355463 @default.
- W4224256562 isParatext "false" @default.
- W4224256562 isRetracted "false" @default.
- W4224256562 workType "article" @default.