Matches in SemOpenAlex for { <https://semopenalex.org/work/W4382317710> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4382317710 endingPage "15027" @default.
- W4382317710 startingPage "15019" @default.
- W4382317710 abstract "Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy. This differs from human decision-making, where gains and losses are valued differently and outlying outcomes are given increased consideration. It also fails to capitalize on opportunities to improve safety and/or performance through the incorporation of distributional context. Several approaches to distributional DRL have been investigated, with one popular strategy being to evaluate the projected distribution of returns for possible actions. We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized. This approach allows for outcomes to be weighed based on relative quality, can be used for both continuous and discrete action spaces, and may naturally be applied in both constrained and unconstrained settings. We show how to compute an asymptotically consistent estimate of the policy gradient for a broad class of risk-sensitive objectives via sampling, subsequently incorporating variance reduction and regularization measures to facilitate effective on-policy learning. We then demonstrate that the use of moderately pessimistic risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies. We test the approach using different risk profiles in six OpenAI Safety Gym environments, comparing to state of the art on-policy methods. Without cost constraints, we find that pessimistic risk profiles can be used to reduce cost while improving total reward accumulation. With cost constraints, they are seen to provide higher positive rewards than risk-neutral approaches at the prescribed allowable cost." @default.
- W4382317710 created "2023-06-28" @default.
- W4382317710 creator A5017757715 @default.
- W4382317710 creator A5020285476 @default.
- W4382317710 creator A5063457506 @default.
- W4382317710 creator A5068872460 @default.
- W4382317710 creator A5074196075 @default.
- W4382317710 date "2023-06-26" @default.
- W4382317710 modified "2023-09-23" @default.
- W4382317710 title "A Risk-Sensitive Approach to Policy Optimization" @default.
- W4382317710 doi "https://doi.org/10.1609/aaai.v37i12.26753" @default.
- W4382317710 hasPublicationYear "2023" @default.
- W4382317710 type Work @default.
- W4382317710 citedByCount "0" @default.
- W4382317710 crossrefType "journal-article" @default.
- W4382317710 hasAuthorship W4382317710A5017757715 @default.
- W4382317710 hasAuthorship W4382317710A5020285476 @default.
- W4382317710 hasAuthorship W4382317710A5063457506 @default.
- W4382317710 hasAuthorship W4382317710A5068872460 @default.
- W4382317710 hasAuthorship W4382317710A5074196075 @default.
- W4382317710 hasBestOaLocation W43823177101 @default.
- W4382317710 hasConcept C119857082 @default.
- W4382317710 hasConcept C121955636 @default.
- W4382317710 hasConcept C126255220 @default.
- W4382317710 hasConcept C149782125 @default.
- W4382317710 hasConcept C151730666 @default.
- W4382317710 hasConcept C154945302 @default.
- W4382317710 hasConcept C162324750 @default.
- W4382317710 hasConcept C196083921 @default.
- W4382317710 hasConcept C2776135515 @default.
- W4382317710 hasConcept C2779343474 @default.
- W4382317710 hasConcept C33923547 @default.
- W4382317710 hasConcept C41008148 @default.
- W4382317710 hasConcept C62644790 @default.
- W4382317710 hasConcept C86803240 @default.
- W4382317710 hasConcept C97541855 @default.
- W4382317710 hasConceptScore W4382317710C119857082 @default.
- W4382317710 hasConceptScore W4382317710C121955636 @default.
- W4382317710 hasConceptScore W4382317710C126255220 @default.
- W4382317710 hasConceptScore W4382317710C149782125 @default.
- W4382317710 hasConceptScore W4382317710C151730666 @default.
- W4382317710 hasConceptScore W4382317710C154945302 @default.
- W4382317710 hasConceptScore W4382317710C162324750 @default.
- W4382317710 hasConceptScore W4382317710C196083921 @default.
- W4382317710 hasConceptScore W4382317710C2776135515 @default.
- W4382317710 hasConceptScore W4382317710C2779343474 @default.
- W4382317710 hasConceptScore W4382317710C33923547 @default.
- W4382317710 hasConceptScore W4382317710C41008148 @default.
- W4382317710 hasConceptScore W4382317710C62644790 @default.
- W4382317710 hasConceptScore W4382317710C86803240 @default.
- W4382317710 hasConceptScore W4382317710C97541855 @default.
- W4382317710 hasIssue "12" @default.
- W4382317710 hasLocation W43823177101 @default.
- W4382317710 hasOpenAccess W4382317710 @default.
- W4382317710 hasPrimaryLocation W43823177101 @default.
- W4382317710 hasRelatedWork W2029278774 @default.
- W4382317710 hasRelatedWork W2338705551 @default.
- W4382317710 hasRelatedWork W2800191438 @default.
- W4382317710 hasRelatedWork W2930446528 @default.
- W4382317710 hasRelatedWork W2946396478 @default.
- W4382317710 hasRelatedWork W2953144887 @default.
- W4382317710 hasRelatedWork W2953299687 @default.
- W4382317710 hasRelatedWork W4286962126 @default.
- W4382317710 hasRelatedWork W4294789573 @default.
- W4382317710 hasRelatedWork W2859870583 @default.
- W4382317710 hasVolume "37" @default.
- W4382317710 isParatext "false" @default.
- W4382317710 isRetracted "false" @default.
- W4382317710 workType "article" @default.