Matches in SemOpenAlex for { <https://semopenalex.org/work/W2956137335> ?p ?o ?g. }
Showing items 1 to 98 of
98
with 100 items per page.
- W2956137335 abstract "Current reinforcement learning methods fail if the reward function is imperfect, i.e. if the agent observes reward different from what it actually receives. We study this problem within the formalism of Corrupt Reward Markov Decision Processes (CRMDPs). We show that if the reward corruption in a CRMDP is sufficiently spiky, the environment is solvable. We fully characterize the regret bound of a Spiky CRMDP, and introduce an algorithm that is able to detect its corrupt states. We show that this algorithm can be used to learn the optimal policy with any common reinforcement learning algorithm. Finally, we investigate our algorithm in a pair of simple gridworld environments, finding that our algorithm can detect the corrupt states and learn the optimal policy despite the corruption." @default.
- W2956137335 created "2019-07-12" @default.
- W2956137335 creator A5045783983 @default.
- W2956137335 creator A5059597480 @default.
- W2956137335 creator A5080249452 @default.
- W2956137335 creator A5089812227 @default.
- W2956137335 date "2019-06-30" @default.
- W2956137335 modified "2023-09-23" @default.
- W2956137335 title "Detecting Spiky Corruption in Markov Decision Processes" @default.
- W2956137335 cites W2061562262 @default.
- W2956137335 cites W2095671661 @default.
- W2956137335 cites W2100960835 @default.
- W2956137335 cites W2105486945 @default.
- W2956137335 cites W2118979615 @default.
- W2956137335 cites W2154549708 @default.
- W2956137335 cites W2462906003 @default.
- W2956137335 cites W2558634851 @default.
- W2956137335 cites W2618574054 @default.
- W2956137335 cites W2736601468 @default.
- W2956137335 cites W2963289505 @default.
- W2956137335 cites W2964263543 @default.
- W2956137335 hasPublicationYear "2019" @default.
- W2956137335 type Work @default.
- W2956137335 sameAs 2956137335 @default.
- W2956137335 citedByCount "1" @default.
- W2956137335 countsByYear W29561373352020 @default.
- W2956137335 crossrefType "posted-content" @default.
- W2956137335 hasAuthorship W2956137335A5045783983 @default.
- W2956137335 hasAuthorship W2956137335A5059597480 @default.
- W2956137335 hasAuthorship W2956137335A5080249452 @default.
- W2956137335 hasAuthorship W2956137335A5089812227 @default.
- W2956137335 hasConcept C105795698 @default.
- W2956137335 hasConcept C106189395 @default.
- W2956137335 hasConcept C11413529 @default.
- W2956137335 hasConcept C119857082 @default.
- W2956137335 hasConcept C124952713 @default.
- W2956137335 hasConcept C126255220 @default.
- W2956137335 hasConcept C138885662 @default.
- W2956137335 hasConcept C142362112 @default.
- W2956137335 hasConcept C153349607 @default.
- W2956137335 hasConcept C154945302 @default.
- W2956137335 hasConcept C159886148 @default.
- W2956137335 hasConcept C2780027415 @default.
- W2956137335 hasConcept C2780310539 @default.
- W2956137335 hasConcept C33923547 @default.
- W2956137335 hasConcept C41008148 @default.
- W2956137335 hasConcept C41895202 @default.
- W2956137335 hasConcept C50817715 @default.
- W2956137335 hasConcept C558565934 @default.
- W2956137335 hasConcept C73301696 @default.
- W2956137335 hasConcept C97541855 @default.
- W2956137335 hasConcept C98763669 @default.
- W2956137335 hasConceptScore W2956137335C105795698 @default.
- W2956137335 hasConceptScore W2956137335C106189395 @default.
- W2956137335 hasConceptScore W2956137335C11413529 @default.
- W2956137335 hasConceptScore W2956137335C119857082 @default.
- W2956137335 hasConceptScore W2956137335C124952713 @default.
- W2956137335 hasConceptScore W2956137335C126255220 @default.
- W2956137335 hasConceptScore W2956137335C138885662 @default.
- W2956137335 hasConceptScore W2956137335C142362112 @default.
- W2956137335 hasConceptScore W2956137335C153349607 @default.
- W2956137335 hasConceptScore W2956137335C154945302 @default.
- W2956137335 hasConceptScore W2956137335C159886148 @default.
- W2956137335 hasConceptScore W2956137335C2780027415 @default.
- W2956137335 hasConceptScore W2956137335C2780310539 @default.
- W2956137335 hasConceptScore W2956137335C33923547 @default.
- W2956137335 hasConceptScore W2956137335C41008148 @default.
- W2956137335 hasConceptScore W2956137335C41895202 @default.
- W2956137335 hasConceptScore W2956137335C50817715 @default.
- W2956137335 hasConceptScore W2956137335C558565934 @default.
- W2956137335 hasConceptScore W2956137335C73301696 @default.
- W2956137335 hasConceptScore W2956137335C97541855 @default.
- W2956137335 hasConceptScore W2956137335C98763669 @default.
- W2956137335 hasOpenAccess W2956137335 @default.
- W2956137335 hasRelatedWork W1484441361 @default.
- W2956137335 hasRelatedWork W1495870736 @default.
- W2956137335 hasRelatedWork W2172596127 @default.
- W2956137335 hasRelatedWork W2410288126 @default.
- W2956137335 hasRelatedWork W2551398049 @default.
- W2956137335 hasRelatedWork W2572029559 @default.
- W2956137335 hasRelatedWork W2606820277 @default.
- W2956137335 hasRelatedWork W2621642732 @default.
- W2956137335 hasRelatedWork W2751659792 @default.
- W2956137335 hasRelatedWork W2801695465 @default.
- W2956137335 hasRelatedWork W2898895787 @default.
- W2956137335 hasRelatedWork W2914941842 @default.
- W2956137335 hasRelatedWork W2951986729 @default.
- W2956137335 hasRelatedWork W2970011083 @default.
- W2956137335 hasRelatedWork W3005438958 @default.
- W2956137335 hasRelatedWork W3046372382 @default.
- W2956137335 hasRelatedWork W3157598639 @default.
- W2956137335 hasRelatedWork W3165905482 @default.
- W2956137335 hasRelatedWork W3189755998 @default.
- W2956137335 hasRelatedWork W641459143 @default.
- W2956137335 isParatext "false" @default.
- W2956137335 isRetracted "false" @default.
- W2956137335 magId "2956137335" @default.
- W2956137335 workType "article" @default.