Matches in SemOpenAlex for { <https://semopenalex.org/work/W3091279148> ?p ?o ?g. }
Showing items 1 to 88 of
88
with 100 items per page.
- W3091279148 abstract "We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) in the tabular setting. We propose a model-based algorithm named UCBVI-$gamma$, which is based on the optimism in the face of uncertainty principle and the Bernstein-type bonus. It achieves $tilde{O}big({sqrt{SAT}}/{(1-gamma)^{1.5}}big)$ regret, where $S$ is the number of states, $A$ is the number of actions, $gamma$ is the discount factor and $T$ is the number of steps. In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least $tilde{Omega}big({sqrt{SAT}}/{(1-gamma)^{1.5}}big)$. Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI-$gamma$ is near optimal for discounted MDPs." @default.
- W3091279148 created "2020-10-08" @default.
- W3091279148 creator A5051448391 @default.
- W3091279148 creator A5057534674 @default.
- W3091279148 creator A5080386620 @default.
- W3091279148 date "2020-10-01" @default.
- W3091279148 modified "2023-09-27" @default.
- W3091279148 title "Minimax Optimal Reinforcement Learning for Discounted MDPs" @default.
- W3091279148 cites W107583932 @default.
- W3091279148 cites W1505937442 @default.
- W3091279148 cites W1526654727 @default.
- W3091279148 cites W1850488217 @default.
- W3091279148 cites W1867103660 @default.
- W3091279148 cites W1988526405 @default.
- W3091279148 cites W2120678009 @default.
- W3091279148 cites W2122701159 @default.
- W3091279148 cites W2129670787 @default.
- W3091279148 cites W2805861379 @default.
- W3091279148 cites W2944264312 @default.
- W3091279148 cites W2948080528 @default.
- W3091279148 cites W2963158178 @default.
- W3091279148 cites W2963872309 @default.
- W3091279148 cites W2964054583 @default.
- W3091279148 cites W3006008954 @default.
- W3091279148 hasPublicationYear "2020" @default.
- W3091279148 type Work @default.
- W3091279148 sameAs 3091279148 @default.
- W3091279148 citedByCount "1" @default.
- W3091279148 countsByYear W30912791482020 @default.
- W3091279148 crossrefType "posted-content" @default.
- W3091279148 hasAuthorship W3091279148A5051448391 @default.
- W3091279148 hasAuthorship W3091279148A5057534674 @default.
- W3091279148 hasAuthorship W3091279148A5080386620 @default.
- W3091279148 hasConcept C105795698 @default.
- W3091279148 hasConcept C106189395 @default.
- W3091279148 hasConcept C114614502 @default.
- W3091279148 hasConcept C126255220 @default.
- W3091279148 hasConcept C134306372 @default.
- W3091279148 hasConcept C149728462 @default.
- W3091279148 hasConcept C154945302 @default.
- W3091279148 hasConcept C159886148 @default.
- W3091279148 hasConcept C33923547 @default.
- W3091279148 hasConcept C39927690 @default.
- W3091279148 hasConcept C41008148 @default.
- W3091279148 hasConcept C50817715 @default.
- W3091279148 hasConcept C77553402 @default.
- W3091279148 hasConcept C97541855 @default.
- W3091279148 hasConceptScore W3091279148C105795698 @default.
- W3091279148 hasConceptScore W3091279148C106189395 @default.
- W3091279148 hasConceptScore W3091279148C114614502 @default.
- W3091279148 hasConceptScore W3091279148C126255220 @default.
- W3091279148 hasConceptScore W3091279148C134306372 @default.
- W3091279148 hasConceptScore W3091279148C149728462 @default.
- W3091279148 hasConceptScore W3091279148C154945302 @default.
- W3091279148 hasConceptScore W3091279148C159886148 @default.
- W3091279148 hasConceptScore W3091279148C33923547 @default.
- W3091279148 hasConceptScore W3091279148C39927690 @default.
- W3091279148 hasConceptScore W3091279148C41008148 @default.
- W3091279148 hasConceptScore W3091279148C50817715 @default.
- W3091279148 hasConceptScore W3091279148C77553402 @default.
- W3091279148 hasConceptScore W3091279148C97541855 @default.
- W3091279148 hasLocation W30912791481 @default.
- W3091279148 hasOpenAccess W3091279148 @default.
- W3091279148 hasPrimaryLocation W30912791481 @default.
- W3091279148 hasRelatedWork W1743526214 @default.
- W3091279148 hasRelatedWork W1878258996 @default.
- W3091279148 hasRelatedWork W2944461362 @default.
- W3091279148 hasRelatedWork W2945119207 @default.
- W3091279148 hasRelatedWork W2946500988 @default.
- W3091279148 hasRelatedWork W2951831808 @default.
- W3091279148 hasRelatedWork W2953295707 @default.
- W3091279148 hasRelatedWork W2964226284 @default.
- W3091279148 hasRelatedWork W2970720882 @default.
- W3091279148 hasRelatedWork W2970884920 @default.
- W3091279148 hasRelatedWork W3087784034 @default.
- W3091279148 hasRelatedWork W3089536325 @default.
- W3091279148 hasRelatedWork W3091875946 @default.
- W3091279148 hasRelatedWork W3092302772 @default.
- W3091279148 hasRelatedWork W3128235563 @default.
- W3091279148 hasRelatedWork W3133034649 @default.
- W3091279148 hasRelatedWork W3157247518 @default.
- W3091279148 hasRelatedWork W3177489476 @default.
- W3091279148 hasRelatedWork W3209987675 @default.
- W3091279148 hasRelatedWork W3212343910 @default.
- W3091279148 isParatext "false" @default.
- W3091279148 isRetracted "false" @default.
- W3091279148 magId "3091279148" @default.
- W3091279148 workType "article" @default.