Matches in SemOpenAlex for { <https://semopenalex.org/work/W2990062287> ?p ?o ?g. }
- W2990062287 abstract "We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits. We provide a framework which modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based on optimism in the face of uncertainty, by complementing them with principles from elimination. Importantly, our framework circumvents the major challenges posed by naively applying action elimination in the RL setting, as formalized by a lower bound we demonstrate. Our framework yields efficient algorithms which (a) attain near-optimal regret in the absence of corruptions and (b) adapt to unknown levels corruption, enjoying regret guarantees which degrade gracefully in the total corruption encountered. To showcase the generality of our approach, we derive results for both tabular settings (where states and actions are finite) as well as linear-function-approximation settings (where the dynamics and rewards admit a linear underlying representation). Notably, our work provides the first sublinear regret guarantee which accommodates any deviation from purely i.i.d. transitions in the bandit-feedback model for episodic reinforcement learning." @default.
- W2990062287 created "2019-12-05" @default.
- W2990062287 creator A5032266950 @default.
- W2990062287 creator A5037154191 @default.
- W2990062287 creator A5058550942 @default.
- W2990062287 creator A5091288542 @default.
- W2990062287 date "2019-11-20" @default.
- W2990062287 modified "2023-09-27" @default.
- W2990062287 title "Corruption robust exploration in episodic reinforcement learning" @default.
- W2990062287 cites W107583932 @default.
- W2990062287 cites W165458731 @default.
- W2990062287 cites W1850488217 @default.
- W2990062287 cites W2074680702 @default.
- W2990062287 cites W2077902449 @default.
- W2990062287 cites W2119738618 @default.
- W2990062287 cites W2147967768 @default.
- W2990062287 cites W2150234726 @default.
- W2990062287 cites W2154806059 @default.
- W2990062287 cites W2157016390 @default.
- W2990062287 cites W2225232604 @default.
- W2990062287 cites W2241126168 @default.
- W2990062287 cites W2591610401 @default.
- W2990062287 cites W2794925984 @default.
- W2990062287 cites W2907502549 @default.
- W2990062287 cites W2946284958 @default.
- W2990062287 cites W2946300093 @default.
- W2990062287 cites W2948080528 @default.
- W2990062287 cites W2962874735 @default.
- W2990062287 cites W2963049774 @default.
- W2990062287 cites W2963582321 @default.
- W2990062287 cites W2963635789 @default.
- W2990062287 cites W2963771282 @default.
- W2990062287 cites W2963921604 @default.
- W2990062287 cites W2964054583 @default.
- W2990062287 cites W2964299116 @default.
- W2990062287 cites W2964675730 @default.
- W2990062287 cites W2971249033 @default.
- W2990062287 cites W2971986223 @default.
- W2990062287 cites W2979828492 @default.
- W2990062287 cites W2991929641 @default.
- W2990062287 cites W2991935368 @default.
- W2990062287 cites W3037101165 @default.
- W2990062287 cites W3039913305 @default.
- W2990062287 cites W3041881490 @default.
- W2990062287 cites W3046395471 @default.
- W2990062287 cites W3098434978 @default.
- W2990062287 cites W3101144313 @default.
- W2990062287 cites W3103681770 @default.
- W2990062287 cites W3106393551 @default.
- W2990062287 cites W3107215175 @default.
- W2990062287 cites W3107747824 @default.
- W2990062287 cites W3182934104 @default.
- W2990062287 hasPublicationYear "2019" @default.
- W2990062287 type Work @default.
- W2990062287 sameAs 2990062287 @default.
- W2990062287 citedByCount "4" @default.
- W2990062287 countsByYear W29900622872021 @default.
- W2990062287 crossrefType "posted-content" @default.
- W2990062287 hasAuthorship W2990062287A5032266950 @default.
- W2990062287 hasAuthorship W2990062287A5037154191 @default.
- W2990062287 hasAuthorship W2990062287A5058550942 @default.
- W2990062287 hasAuthorship W2990062287A5091288542 @default.
- W2990062287 hasConcept C117160843 @default.
- W2990062287 hasConcept C119857082 @default.
- W2990062287 hasConcept C126255220 @default.
- W2990062287 hasConcept C134306372 @default.
- W2990062287 hasConcept C154945302 @default.
- W2990062287 hasConcept C162324750 @default.
- W2990062287 hasConcept C17744445 @default.
- W2990062287 hasConcept C187736073 @default.
- W2990062287 hasConcept C199539241 @default.
- W2990062287 hasConcept C2776359362 @default.
- W2990062287 hasConcept C2780767217 @default.
- W2990062287 hasConcept C33923547 @default.
- W2990062287 hasConcept C41008148 @default.
- W2990062287 hasConcept C50817715 @default.
- W2990062287 hasConcept C94625758 @default.
- W2990062287 hasConcept C97541855 @default.
- W2990062287 hasConceptScore W2990062287C117160843 @default.
- W2990062287 hasConceptScore W2990062287C119857082 @default.
- W2990062287 hasConceptScore W2990062287C126255220 @default.
- W2990062287 hasConceptScore W2990062287C134306372 @default.
- W2990062287 hasConceptScore W2990062287C154945302 @default.
- W2990062287 hasConceptScore W2990062287C162324750 @default.
- W2990062287 hasConceptScore W2990062287C17744445 @default.
- W2990062287 hasConceptScore W2990062287C187736073 @default.
- W2990062287 hasConceptScore W2990062287C199539241 @default.
- W2990062287 hasConceptScore W2990062287C2776359362 @default.
- W2990062287 hasConceptScore W2990062287C2780767217 @default.
- W2990062287 hasConceptScore W2990062287C33923547 @default.
- W2990062287 hasConceptScore W2990062287C41008148 @default.
- W2990062287 hasConceptScore W2990062287C50817715 @default.
- W2990062287 hasConceptScore W2990062287C94625758 @default.
- W2990062287 hasConceptScore W2990062287C97541855 @default.
- W2990062287 hasLocation W29900622871 @default.
- W2990062287 hasOpenAccess W2990062287 @default.
- W2990062287 hasPrimaryLocation W29900622871 @default.
- W2990062287 hasRelatedWork W1982948368 @default.
- W2990062287 hasRelatedWork W1996625075 @default.
- W2990062287 hasRelatedWork W2100415632 @default.