Matches in SemOpenAlex for { <https://semopenalex.org/work/W2103708221> ?p ?o ?g. }
- W2103708221 abstract "We consider the problem of learning to optimize an unknown Markov decision process (MDP). We show that, if the MDP can be parameterized within some known function class, we can obtain regret bounds that scale with the dimensionality, rather than cardinality, of the system. We characterize this dependence explicitly as $tilde{O}(sqrt{d_K d_E T})$ where $T$ is time elapsed, $d_K$ is the Kolmogorov dimension and $d_E$ is the emph{eluder dimension}. These represent the first unified regret bounds for model-based reinforcement learning and provide state of the art guarantees in several important settings. Moreover, we present a simple and computationally efficient algorithm emph{posterior sampling for reinforcement learning} (PSRL) that satisfies these bounds." @default.
- W2103708221 created "2016-06-24" @default.
- W2103708221 creator A5015899120 @default.
- W2103708221 creator A5045543562 @default.
- W2103708221 date "2014-06-07" @default.
- W2103708221 modified "2023-09-25" @default.
- W2103708221 title "Model-based Reinforcement Learning and the Eluder Dimension" @default.
- W2103708221 cites W1542595278 @default.
- W2103708221 cites W1850488217 @default.
- W2103708221 cites W2019363670 @default.
- W2103708221 cites W2039522160 @default.
- W2103708221 cites W2098432798 @default.
- W2103708221 cites W2119738618 @default.
- W2103708221 cites W2129113961 @default.
- W2103708221 cites W2129670787 @default.
- W2103708221 cites W2133419240 @default.
- W2103708221 cites W2134807560 @default.
- W2103708221 cites W2163840227 @default.
- W2103708221 cites W2488247662 @default.
- W2103708221 cites W2949366694 @default.
- W2103708221 cites W2953295707 @default.
- W2103708221 hasPublicationYear "2014" @default.
- W2103708221 type Work @default.
- W2103708221 sameAs 2103708221 @default.
- W2103708221 citedByCount "32" @default.
- W2103708221 countsByYear W21037082212014 @default.
- W2103708221 countsByYear W21037082212015 @default.
- W2103708221 countsByYear W21037082212016 @default.
- W2103708221 countsByYear W21037082212017 @default.
- W2103708221 countsByYear W21037082212018 @default.
- W2103708221 countsByYear W21037082212019 @default.
- W2103708221 countsByYear W21037082212020 @default.
- W2103708221 countsByYear W21037082212021 @default.
- W2103708221 crossrefType "posted-content" @default.
- W2103708221 hasAuthorship W2103708221A5015899120 @default.
- W2103708221 hasAuthorship W2103708221A5045543562 @default.
- W2103708221 hasConcept C105795698 @default.
- W2103708221 hasConcept C106189395 @default.
- W2103708221 hasConcept C111030470 @default.
- W2103708221 hasConcept C111472728 @default.
- W2103708221 hasConcept C114614502 @default.
- W2103708221 hasConcept C118615104 @default.
- W2103708221 hasConcept C119857082 @default.
- W2103708221 hasConcept C124101348 @default.
- W2103708221 hasConcept C126255220 @default.
- W2103708221 hasConcept C138885662 @default.
- W2103708221 hasConcept C154945302 @default.
- W2103708221 hasConcept C159886148 @default.
- W2103708221 hasConcept C165464430 @default.
- W2103708221 hasConcept C2778445095 @default.
- W2103708221 hasConcept C2780586882 @default.
- W2103708221 hasConcept C33676613 @default.
- W2103708221 hasConcept C33923547 @default.
- W2103708221 hasConcept C36686422 @default.
- W2103708221 hasConcept C41008148 @default.
- W2103708221 hasConcept C50817715 @default.
- W2103708221 hasConcept C87117476 @default.
- W2103708221 hasConcept C97541855 @default.
- W2103708221 hasConceptScore W2103708221C105795698 @default.
- W2103708221 hasConceptScore W2103708221C106189395 @default.
- W2103708221 hasConceptScore W2103708221C111030470 @default.
- W2103708221 hasConceptScore W2103708221C111472728 @default.
- W2103708221 hasConceptScore W2103708221C114614502 @default.
- W2103708221 hasConceptScore W2103708221C118615104 @default.
- W2103708221 hasConceptScore W2103708221C119857082 @default.
- W2103708221 hasConceptScore W2103708221C124101348 @default.
- W2103708221 hasConceptScore W2103708221C126255220 @default.
- W2103708221 hasConceptScore W2103708221C138885662 @default.
- W2103708221 hasConceptScore W2103708221C154945302 @default.
- W2103708221 hasConceptScore W2103708221C159886148 @default.
- W2103708221 hasConceptScore W2103708221C165464430 @default.
- W2103708221 hasConceptScore W2103708221C2778445095 @default.
- W2103708221 hasConceptScore W2103708221C2780586882 @default.
- W2103708221 hasConceptScore W2103708221C33676613 @default.
- W2103708221 hasConceptScore W2103708221C33923547 @default.
- W2103708221 hasConceptScore W2103708221C36686422 @default.
- W2103708221 hasConceptScore W2103708221C41008148 @default.
- W2103708221 hasConceptScore W2103708221C50817715 @default.
- W2103708221 hasConceptScore W2103708221C87117476 @default.
- W2103708221 hasConceptScore W2103708221C97541855 @default.
- W2103708221 hasLocation W21037082211 @default.
- W2103708221 hasOpenAccess W2103708221 @default.
- W2103708221 hasPrimaryLocation W21037082211 @default.
- W2103708221 hasRelatedWork W107583932 @default.
- W2103708221 hasRelatedWork W1505937442 @default.
- W2103708221 hasRelatedWork W1582436621 @default.
- W2103708221 hasRelatedWork W1850488217 @default.
- W2103708221 hasRelatedWork W2039522160 @default.
- W2103708221 hasRelatedWork W2111764152 @default.
- W2103708221 hasRelatedWork W2119567691 @default.
- W2103708221 hasRelatedWork W2119738618 @default.
- W2103708221 hasRelatedWork W2129670787 @default.
- W2103708221 hasRelatedWork W2145339207 @default.
- W2103708221 hasRelatedWork W2149721706 @default.
- W2103708221 hasRelatedWork W2163840227 @default.
- W2103708221 hasRelatedWork W2489939061 @default.
- W2103708221 hasRelatedWork W2545659366 @default.
- W2103708221 hasRelatedWork W2953295707 @default.
- W2103708221 hasRelatedWork W2963049774 @default.
- W2103708221 hasRelatedWork W2963713569 @default.