Matches in SemOpenAlex for { <https://semopenalex.org/work/W3167202906> ?p ?o ?g. }
Showing items 1 to 91 of
91
with 100 items per page.
- W3167202906 abstract "We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. Unlike existing approaches with regret guarantees, it does not use any kind of partitioning of the state-action space. For problems with $K$ episodes and horizon $H$, we provide a regret bound of O H 3 K max(1 2 , 2d 2d+1) , where $d$ is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and has been previously applied to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards." @default.
- W3167202906 created "2021-06-22" @default.
- W3167202906 creator A5009541020 @default.
- W3167202906 creator A5037297959 @default.
- W3167202906 creator A5070500506 @default.
- W3167202906 creator A5083219425 @default.
- W3167202906 creator A5091526684 @default.
- W3167202906 date "2021-07-01" @default.
- W3167202906 modified "2023-10-14" @default.
- W3167202906 title "Kernel-Based Reinforcement Learning: A Finite-Time Analysis" @default.
- W3167202906 cites W1002758560 @default.
- W3167202906 cites W1582436621 @default.
- W3167202906 cites W1701974503 @default.
- W3167202906 cites W1747856733 @default.
- W3167202906 cites W1850488217 @default.
- W3167202906 cites W1867447366 @default.
- W3167202906 cites W1998376807 @default.
- W3167202906 cites W2009533501 @default.
- W3167202906 cites W2103708221 @default.
- W3167202906 cites W2119567691 @default.
- W3167202906 cites W2119738618 @default.
- W3167202906 cites W2133419240 @default.
- W3167202906 cites W21891419 @default.
- W3167202906 cites W2489939061 @default.
- W3167202906 cites W2783502267 @default.
- W3167202906 cites W2907502549 @default.
- W3167202906 cites W2943632322 @default.
- W3167202906 cites W2946019081 @default.
- W3167202906 cites W2963465244 @default.
- W3167202906 cites W2964054583 @default.
- W3167202906 cites W2964178973 @default.
- W3167202906 cites W2964284806 @default.
- W3167202906 cites W2970720882 @default.
- W3167202906 cites W2970870329 @default.
- W3167202906 cites W3034442282 @default.
- W3167202906 cites W3035273634 @default.
- W3167202906 cites W3037341018 @default.
- W3167202906 cites W3046395471 @default.
- W3167202906 cites W3097990964 @default.
- W3167202906 cites W53582479 @default.
- W3167202906 hasPublicationYear "2021" @default.
- W3167202906 type Work @default.
- W3167202906 sameAs 3167202906 @default.
- W3167202906 citedByCount "0" @default.
- W3167202906 crossrefType "proceedings-article" @default.
- W3167202906 hasAuthorship W3167202906A5009541020 @default.
- W3167202906 hasAuthorship W3167202906A5037297959 @default.
- W3167202906 hasAuthorship W3167202906A5070500506 @default.
- W3167202906 hasAuthorship W3167202906A5083219425 @default.
- W3167202906 hasAuthorship W3167202906A5091526684 @default.
- W3167202906 hasBestOaLocation W31672029061 @default.
- W3167202906 hasConcept C118615104 @default.
- W3167202906 hasConcept C119857082 @default.
- W3167202906 hasConcept C127413603 @default.
- W3167202906 hasConcept C154945302 @default.
- W3167202906 hasConcept C33923547 @default.
- W3167202906 hasConcept C41008148 @default.
- W3167202906 hasConcept C66938386 @default.
- W3167202906 hasConcept C67203356 @default.
- W3167202906 hasConcept C74193536 @default.
- W3167202906 hasConcept C97541855 @default.
- W3167202906 hasConceptScore W3167202906C118615104 @default.
- W3167202906 hasConceptScore W3167202906C119857082 @default.
- W3167202906 hasConceptScore W3167202906C127413603 @default.
- W3167202906 hasConceptScore W3167202906C154945302 @default.
- W3167202906 hasConceptScore W3167202906C33923547 @default.
- W3167202906 hasConceptScore W3167202906C41008148 @default.
- W3167202906 hasConceptScore W3167202906C66938386 @default.
- W3167202906 hasConceptScore W3167202906C67203356 @default.
- W3167202906 hasConceptScore W3167202906C74193536 @default.
- W3167202906 hasConceptScore W3167202906C97541855 @default.
- W3167202906 hasLocation W31672029061 @default.
- W3167202906 hasLocation W31672029062 @default.
- W3167202906 hasLocation W31672029063 @default.
- W3167202906 hasLocation W31672029064 @default.
- W3167202906 hasOpenAccess W3167202906 @default.
- W3167202906 hasPrimaryLocation W31672029061 @default.
- W3167202906 hasRelatedWork W260766989 @default.
- W3167202906 hasRelatedWork W2909304650 @default.
- W3167202906 hasRelatedWork W2959276766 @default.
- W3167202906 hasRelatedWork W2961085424 @default.
- W3167202906 hasRelatedWork W3074294383 @default.
- W3167202906 hasRelatedWork W3139193008 @default.
- W3167202906 hasRelatedWork W4206669594 @default.
- W3167202906 hasRelatedWork W4295941380 @default.
- W3167202906 hasRelatedWork W4306674287 @default.
- W3167202906 hasRelatedWork W4319083788 @default.
- W3167202906 isParatext "false" @default.
- W3167202906 isRetracted "false" @default.
- W3167202906 magId "3167202906" @default.
- W3167202906 workType "article" @default.