Matches in SemOpenAlex for { <https://semopenalex.org/work/W4288596469> ?p ?o ?g. }
Showing items 1 to 77 of
77
with 100 items per page.
- W4288596469 abstract "Reinforcement learning (RL) agents have traditionally been tasked with maximizing the value function of a Markov decision process (MDP), either in continuous settings, with fixed discount factor $gamma < 1$, or in episodic settings, with $gamma = 1$. While this has proven effective for specific tasks with well-defined objectives (e.g., games), it has never been established that fixed discounting is suitable for general purpose use (e.g., as a model of human preferences). This paper characterizes rationality in sequential decision making using a set of seven axioms and arrives at a form of discounting that generalizes traditional fixed discounting. In particular, our framework admits a state-action dependent discount factor that is not constrained to be less than 1, so long as there is eventual long run discounting. Although this broadens the range of possible preference structures in continuous settings, we show that there exists a unique optimizing MDP with fixed $gamma < 1$ whose optimal value function matches the true utility of the optimal policy, and we quantify the difference between value and utility for suboptimal policies. Our work can be seen as providing a normative justification for (a slight generalization of) Martha White's RL task formalism (2017) and other recent departures from the traditional RL, and is relevant to task specification in RL, inverse RL and preference-based RL." @default.
- W4288596469 created "2022-07-29" @default.
- W4288596469 creator A5010938443 @default.
- W4288596469 date "2019-02-07" @default.
- W4288596469 modified "2023-09-25" @default.
- W4288596469 title "Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach" @default.
- W4288596469 doi "https://doi.org/10.48550/arxiv.1902.02893" @default.
- W4288596469 hasPublicationYear "2019" @default.
- W4288596469 type Work @default.
- W4288596469 citedByCount "0" @default.
- W4288596469 crossrefType "posted-content" @default.
- W4288596469 hasAuthorship W4288596469A5010938443 @default.
- W4288596469 hasBestOaLocation W42885964691 @default.
- W4288596469 hasConcept C10138342 @default.
- W4288596469 hasConcept C105795698 @default.
- W4288596469 hasConcept C106189395 @default.
- W4288596469 hasConcept C111472728 @default.
- W4288596469 hasConcept C134306372 @default.
- W4288596469 hasConcept C138885662 @default.
- W4288596469 hasConcept C144237770 @default.
- W4288596469 hasConcept C14646407 @default.
- W4288596469 hasConcept C149782125 @default.
- W4288596469 hasConcept C154945302 @default.
- W4288596469 hasConcept C159886148 @default.
- W4288596469 hasConcept C162324750 @default.
- W4288596469 hasConcept C167729594 @default.
- W4288596469 hasConcept C175444787 @default.
- W4288596469 hasConcept C177148314 @default.
- W4288596469 hasConcept C205706631 @default.
- W4288596469 hasConcept C2524010 @default.
- W4288596469 hasConcept C2779110102 @default.
- W4288596469 hasConcept C2781249084 @default.
- W4288596469 hasConcept C33923547 @default.
- W4288596469 hasConcept C41008148 @default.
- W4288596469 hasConcept C44725695 @default.
- W4288596469 hasConcept C6177178 @default.
- W4288596469 hasConcept C97541855 @default.
- W4288596469 hasConceptScore W4288596469C10138342 @default.
- W4288596469 hasConceptScore W4288596469C105795698 @default.
- W4288596469 hasConceptScore W4288596469C106189395 @default.
- W4288596469 hasConceptScore W4288596469C111472728 @default.
- W4288596469 hasConceptScore W4288596469C134306372 @default.
- W4288596469 hasConceptScore W4288596469C138885662 @default.
- W4288596469 hasConceptScore W4288596469C144237770 @default.
- W4288596469 hasConceptScore W4288596469C14646407 @default.
- W4288596469 hasConceptScore W4288596469C149782125 @default.
- W4288596469 hasConceptScore W4288596469C154945302 @default.
- W4288596469 hasConceptScore W4288596469C159886148 @default.
- W4288596469 hasConceptScore W4288596469C162324750 @default.
- W4288596469 hasConceptScore W4288596469C167729594 @default.
- W4288596469 hasConceptScore W4288596469C175444787 @default.
- W4288596469 hasConceptScore W4288596469C177148314 @default.
- W4288596469 hasConceptScore W4288596469C205706631 @default.
- W4288596469 hasConceptScore W4288596469C2524010 @default.
- W4288596469 hasConceptScore W4288596469C2779110102 @default.
- W4288596469 hasConceptScore W4288596469C2781249084 @default.
- W4288596469 hasConceptScore W4288596469C33923547 @default.
- W4288596469 hasConceptScore W4288596469C41008148 @default.
- W4288596469 hasConceptScore W4288596469C44725695 @default.
- W4288596469 hasConceptScore W4288596469C6177178 @default.
- W4288596469 hasConceptScore W4288596469C97541855 @default.
- W4288596469 hasLocation W42885964691 @default.
- W4288596469 hasOpenAccess W4288596469 @default.
- W4288596469 hasPrimaryLocation W42885964691 @default.
- W4288596469 hasRelatedWork W1604295828 @default.
- W4288596469 hasRelatedWork W1966929897 @default.
- W4288596469 hasRelatedWork W2029308864 @default.
- W4288596469 hasRelatedWork W2093681361 @default.
- W4288596469 hasRelatedWork W2139465701 @default.
- W4288596469 hasRelatedWork W2919328485 @default.
- W4288596469 hasRelatedWork W3125240153 @default.
- W4288596469 hasRelatedWork W3148334763 @default.
- W4288596469 hasRelatedWork W2042930332 @default.
- W4288596469 hasRelatedWork W2424302625 @default.
- W4288596469 isParatext "false" @default.
- W4288596469 isRetracted "false" @default.
- W4288596469 workType "article" @default.