Matches in SemOpenAlex for { <https://semopenalex.org/work/W3007455034> ?p ?o ?g. }
- W3007455034 abstract "We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. Prior efforts are predominately afflicted by computational challenges associated with the fact that risk-sensitive MDPs are time-inconsistent. To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning. The caution measures the distributional risk of a policy, which is a function of the policy's long-term state occupancy distribution. To solve this problem in an online model-free manner, we propose a stochastic variant of primal-dual method that uses Kullback-Lieber (KL) divergence as its proximal term. We establish that the number of iterations/samples required to attain approximately optimal solutions of this scheme matches tight dependencies on the cardinality of the state and action spaces, but differs in its dependence on the infinity norm of the gradient of the risk measure. Experiments demonstrate the merits of this approach for improving the reliability of reward accumulation without additional computational burdens." @default.
- W3007455034 created "2020-03-06" @default.
- W3007455034 creator A5025896653 @default.
- W3007455034 creator A5039563144 @default.
- W3007455034 creator A5052683643 @default.
- W3007455034 creator A5073029812 @default.
- W3007455034 date "2020-02-27" @default.
- W3007455034 modified "2023-10-02" @default.
- W3007455034 title "Cautious Reinforcement Learning via Distributional Risk in the Dual Domain" @default.
- W3007455034 cites W1191599655 @default.
- W3007455034 cites W1529558080 @default.
- W3007455034 cites W1557287189 @default.
- W3007455034 cites W1745373831 @default.
- W3007455034 cites W1757796397 @default.
- W3007455034 cites W1856231595 @default.
- W3007455034 cites W1969147614 @default.
- W3007455034 cites W1972190631 @default.
- W3007455034 cites W1992586302 @default.
- W3007455034 cites W2001009060 @default.
- W3007455034 cites W2019291268 @default.
- W3007455034 cites W2099506495 @default.
- W3007455034 cites W2121863487 @default.
- W3007455034 cites W2126282494 @default.
- W3007455034 cites W2128521145 @default.
- W3007455034 cites W2170923204 @default.
- W3007455034 cites W2462780152 @default.
- W3007455034 cites W2562316362 @default.
- W3007455034 cites W2765415241 @default.
- W3007455034 cites W2765892966 @default.
- W3007455034 cites W2798543980 @default.
- W3007455034 cites W2913434950 @default.
- W3007455034 cites W2962803570 @default.
- W3007455034 cites W2963082979 @default.
- W3007455034 cites W2963423916 @default.
- W3007455034 cites W2970036354 @default.
- W3007455034 cites W2970927156 @default.
- W3007455034 cites W299346747 @default.
- W3007455034 cites W2998059334 @default.
- W3007455034 cites W3011338904 @default.
- W3007455034 cites W3123298421 @default.
- W3007455034 cites W3124407081 @default.
- W3007455034 cites W51049863 @default.
- W3007455034 doi "https://doi.org/10.48550/arxiv.2002.12475" @default.
- W3007455034 hasPublicationYear "2020" @default.
- W3007455034 type Work @default.
- W3007455034 sameAs 3007455034 @default.
- W3007455034 citedByCount "7" @default.
- W3007455034 countsByYear W30074550342020 @default.
- W3007455034 countsByYear W30074550342021 @default.
- W3007455034 crossrefType "posted-content" @default.
- W3007455034 hasAuthorship W3007455034A5025896653 @default.
- W3007455034 hasAuthorship W3007455034A5039563144 @default.
- W3007455034 hasAuthorship W3007455034A5052683643 @default.
- W3007455034 hasAuthorship W3007455034A5073029812 @default.
- W3007455034 hasBestOaLocation W30074550341 @default.
- W3007455034 hasConcept C10138342 @default.
- W3007455034 hasConcept C105795698 @default.
- W3007455034 hasConcept C106189395 @default.
- W3007455034 hasConcept C124101348 @default.
- W3007455034 hasConcept C124952713 @default.
- W3007455034 hasConcept C126255220 @default.
- W3007455034 hasConcept C142362112 @default.
- W3007455034 hasConcept C14646407 @default.
- W3007455034 hasConcept C154945302 @default.
- W3007455034 hasConcept C159886148 @default.
- W3007455034 hasConcept C162324750 @default.
- W3007455034 hasConcept C2780821815 @default.
- W3007455034 hasConcept C2780980858 @default.
- W3007455034 hasConcept C2781472820 @default.
- W3007455034 hasConcept C33923547 @default.
- W3007455034 hasConcept C41008148 @default.
- W3007455034 hasConcept C87117476 @default.
- W3007455034 hasConcept C97541855 @default.
- W3007455034 hasConceptScore W3007455034C10138342 @default.
- W3007455034 hasConceptScore W3007455034C105795698 @default.
- W3007455034 hasConceptScore W3007455034C106189395 @default.
- W3007455034 hasConceptScore W3007455034C124101348 @default.
- W3007455034 hasConceptScore W3007455034C124952713 @default.
- W3007455034 hasConceptScore W3007455034C126255220 @default.
- W3007455034 hasConceptScore W3007455034C142362112 @default.
- W3007455034 hasConceptScore W3007455034C14646407 @default.
- W3007455034 hasConceptScore W3007455034C154945302 @default.
- W3007455034 hasConceptScore W3007455034C159886148 @default.
- W3007455034 hasConceptScore W3007455034C162324750 @default.
- W3007455034 hasConceptScore W3007455034C2780821815 @default.
- W3007455034 hasConceptScore W3007455034C2780980858 @default.
- W3007455034 hasConceptScore W3007455034C2781472820 @default.
- W3007455034 hasConceptScore W3007455034C33923547 @default.
- W3007455034 hasConceptScore W3007455034C41008148 @default.
- W3007455034 hasConceptScore W3007455034C87117476 @default.
- W3007455034 hasConceptScore W3007455034C97541855 @default.
- W3007455034 hasLocation W30074550341 @default.
- W3007455034 hasOpenAccess W3007455034 @default.
- W3007455034 hasPrimaryLocation W30074550341 @default.
- W3007455034 hasRelatedWork W1521228173 @default.
- W3007455034 hasRelatedWork W1556532828 @default.
- W3007455034 hasRelatedWork W1985560493 @default.
- W3007455034 hasRelatedWork W2128775537 @default.
- W3007455034 hasRelatedWork W2156021013 @default.
- W3007455034 hasRelatedWork W2482498454 @default.