Matches in SemOpenAlex for { <https://semopenalex.org/work/W3035880215> ?p ?o ?g. }
- W3035880215 abstract "The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL), often coming with strong theoretical guarantees. However, it remains a challenge to scale these approaches to the deep RL paradigm, which has achieved a great deal of attention in recent years. In this paper, we introduce a tractable approach to optimism via noise augmented Markov Decision Processes (MDPs), which we show can obtain a competitive regret bound: $tilde{mathcal{O}}( |mathcal{S}|Hsqrt{|mathcal{S}||mathcal{A}| T } )$ when augmenting using Gaussian noise, where $T$ is the total number of environment steps. This tractability allows us to apply our approach to the deep RL setting, where we rigorously evaluate the key factors for success of optimistic model-based RL algorithms, bridging the gap between theory and practice." @default.
- W3035880215 created "2020-06-25" @default.
- W3035880215 creator A5006761546 @default.
- W3035880215 creator A5031842812 @default.
- W3035880215 creator A5049818013 @default.
- W3035880215 creator A5058617210 @default.
- W3035880215 creator A5083828420 @default.
- W3035880215 date "2020-06-21" @default.
- W3035880215 modified "2023-09-27" @default.
- W3035880215 title "On Optimism in Model-Based Reinforcement Learning." @default.
- W3035880215 cites W1515851193 @default.
- W3035880215 cites W1521230616 @default.
- W3035880215 cites W1560021816 @default.
- W3035880215 cites W1583155004 @default.
- W3035880215 cites W1625390266 @default.
- W3035880215 cites W1821977771 @default.
- W3035880215 cites W1850488217 @default.
- W3035880215 cites W1980035368 @default.
- W3035880215 cites W1995945562 @default.
- W3035880215 cites W2119738618 @default.
- W3035880215 cites W2140135625 @default.
- W3035880215 cites W2257979135 @default.
- W3035880215 cites W2526781987 @default.
- W3035880215 cites W2561776174 @default.
- W3035880215 cites W2599934248 @default.
- W3035880215 cites W2750990725 @default.
- W3035880215 cites W2769648743 @default.
- W3035880215 cites W2772709170 @default.
- W3035880215 cites W2773557179 @default.
- W3035880215 cites W2774354230 @default.
- W3035880215 cites W2794095813 @default.
- W3035880215 cites W2804585439 @default.
- W3035880215 cites W2892230114 @default.
- W3035880215 cites W2904246096 @default.
- W3035880215 cites W2947782105 @default.
- W3035880215 cites W2950624398 @default.
- W3035880215 cites W2952136328 @default.
- W3035880215 cites W2952606116 @default.
- W3035880215 cites W2962723954 @default.
- W3035880215 cites W2962804251 @default.
- W3035880215 cites W2962902376 @default.
- W3035880215 cites W2963049774 @default.
- W3035880215 cites W2963276097 @default.
- W3035880215 cites W2963335248 @default.
- W3035880215 cites W2963484919 @default.
- W3035880215 cites W2963695785 @default.
- W3035880215 cites W2963767098 @default.
- W3035880215 cites W2963846183 @default.
- W3035880215 cites W2963923407 @default.
- W3035880215 cites W2963938771 @default.
- W3035880215 cites W2963960193 @default.
- W3035880215 cites W2963971282 @default.
- W3035880215 cites W2964054583 @default.
- W3035880215 cites W2964067469 @default.
- W3035880215 cites W2967357336 @default.
- W3035880215 cites W2970770768 @default.
- W3035880215 cites W2970961171 @default.
- W3035880215 cites W2971249033 @default.
- W3035880215 cites W2973815009 @default.
- W3035880215 cites W2985924367 @default.
- W3035880215 cites W2994714051 @default.
- W3035880215 cites W2995300516 @default.
- W3035880215 cites W2996070979 @default.
- W3035880215 cites W3006049620 @default.
- W3035880215 cites W3008003828 @default.
- W3035880215 cites W3025660841 @default.
- W3035880215 cites W3034973310 @default.
- W3035880215 cites W3046395471 @default.
- W3035880215 cites W3101676093 @default.
- W3035880215 cites W3158726102 @default.
- W3035880215 cites W51508254 @default.
- W3035880215 cites W2337490104 @default.
- W3035880215 hasPublicationYear "2020" @default.
- W3035880215 type Work @default.
- W3035880215 sameAs 3035880215 @default.
- W3035880215 citedByCount "11" @default.
- W3035880215 countsByYear W30358802152020 @default.
- W3035880215 countsByYear W30358802152021 @default.
- W3035880215 countsByYear W30358802152022 @default.
- W3035880215 crossrefType "posted-content" @default.
- W3035880215 hasAuthorship W3035880215A5006761546 @default.
- W3035880215 hasAuthorship W3035880215A5031842812 @default.
- W3035880215 hasAuthorship W3035880215A5049818013 @default.
- W3035880215 hasAuthorship W3035880215A5058617210 @default.
- W3035880215 hasAuthorship W3035880215A5083828420 @default.
- W3035880215 hasConcept C105795698 @default.
- W3035880215 hasConcept C106189395 @default.
- W3035880215 hasConcept C111472728 @default.
- W3035880215 hasConcept C115961682 @default.
- W3035880215 hasConcept C119857082 @default.
- W3035880215 hasConcept C121332964 @default.
- W3035880215 hasConcept C126255220 @default.
- W3035880215 hasConcept C138885662 @default.
- W3035880215 hasConcept C144237770 @default.
- W3035880215 hasConcept C154945302 @default.
- W3035880215 hasConcept C159886148 @default.
- W3035880215 hasConcept C163716315 @default.
- W3035880215 hasConcept C174348530 @default.
- W3035880215 hasConcept C204017024 @default.
- W3035880215 hasConcept C31258907 @default.