Matches in SemOpenAlex for { <https://semopenalex.org/work/W3009498344> ?p ?o ?g. }
- W3009498344 abstract "Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces. In this paper, we consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric that characterizes the proximity between different states and actions. We propose ZoomRL, an online algorithm that leverages ideas from continuous bandits to learn an adaptive discretization of the joint space by zooming in more promising and frequently visited regions while carefully balancing the exploitation-exploration trade-off. We show that ZoomRL achieves a worst-case regret $tilde{O}(H^{frac{5}{2}} K^{frac{d+1}{d+2}})$ where $H$ is the planning horizon, $K$ is the number of episodes and $d$ is the covering dimension of the space with respect to the metric. Moreover, our algorithm enjoys improved metric-dependent guarantees that reflect the geometry of the underlying space. Finally, we show that our algorithm is robust to small misspecification errors." @default.
- W3009498344 created "2020-03-13" @default.
- W3009498344 creator A5001087292 @default.
- W3009498344 creator A5045338436 @default.
- W3009498344 creator A5067427868 @default.
- W3009498344 date "2020-03-09" @default.
- W3009498344 modified "2023-10-01" @default.
- W3009498344 title "Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces" @default.
- W3009498344 cites W1509780496 @default.
- W3009498344 cites W1701974503 @default.
- W3009498344 cites W1747856733 @default.
- W3009498344 cites W1850488217 @default.
- W3009498344 cites W1981121741 @default.
- W3009498344 cites W2073107347 @default.
- W3009498344 cites W2082691056 @default.
- W3009498344 cites W2099179721 @default.
- W3009498344 cites W2103581319 @default.
- W3009498344 cites W2133419240 @default.
- W3009498344 cites W2148434045 @default.
- W3009498344 cites W2150153095 @default.
- W3009498344 cites W2151544200 @default.
- W3009498344 cites W21891419 @default.
- W3009498344 cites W2268509491 @default.
- W3009498344 cites W2489939061 @default.
- W3009498344 cites W2568646110 @default.
- W3009498344 cites W2797811993 @default.
- W3009498344 cites W2903569780 @default.
- W3009498344 cites W2943357710 @default.
- W3009498344 cites W2943632322 @default.
- W3009498344 cites W2963049774 @default.
- W3009498344 cites W2964054583 @default.
- W3009498344 cites W2970720882 @default.
- W3009498344 cites W2980603164 @default.
- W3009498344 hasPublicationYear "2020" @default.
- W3009498344 type Work @default.
- W3009498344 sameAs 3009498344 @default.
- W3009498344 citedByCount "10" @default.
- W3009498344 countsByYear W30094983442020 @default.
- W3009498344 countsByYear W30094983442021 @default.
- W3009498344 crossrefType "posted-content" @default.
- W3009498344 hasAuthorship W3009498344A5001087292 @default.
- W3009498344 hasAuthorship W3009498344A5045338436 @default.
- W3009498344 hasAuthorship W3009498344A5067427868 @default.
- W3009498344 hasConcept C105795698 @default.
- W3009498344 hasConcept C111919701 @default.
- W3009498344 hasConcept C11413529 @default.
- W3009498344 hasConcept C114614502 @default.
- W3009498344 hasConcept C118615104 @default.
- W3009498344 hasConcept C119857082 @default.
- W3009498344 hasConcept C121332964 @default.
- W3009498344 hasConcept C127413603 @default.
- W3009498344 hasConcept C134306372 @default.
- W3009498344 hasConcept C154945302 @default.
- W3009498344 hasConcept C176217482 @default.
- W3009498344 hasConcept C17744445 @default.
- W3009498344 hasConcept C198043062 @default.
- W3009498344 hasConcept C199539241 @default.
- W3009498344 hasConcept C21547014 @default.
- W3009498344 hasConcept C2776359362 @default.
- W3009498344 hasConcept C2778572836 @default.
- W3009498344 hasConcept C2780791683 @default.
- W3009498344 hasConcept C33676613 @default.
- W3009498344 hasConcept C33923547 @default.
- W3009498344 hasConcept C41008148 @default.
- W3009498344 hasConcept C48103436 @default.
- W3009498344 hasConcept C50817715 @default.
- W3009498344 hasConcept C62520636 @default.
- W3009498344 hasConcept C72434380 @default.
- W3009498344 hasConcept C73000952 @default.
- W3009498344 hasConcept C80444323 @default.
- W3009498344 hasConcept C94625758 @default.
- W3009498344 hasConcept C97541855 @default.
- W3009498344 hasConceptScore W3009498344C105795698 @default.
- W3009498344 hasConceptScore W3009498344C111919701 @default.
- W3009498344 hasConceptScore W3009498344C11413529 @default.
- W3009498344 hasConceptScore W3009498344C114614502 @default.
- W3009498344 hasConceptScore W3009498344C118615104 @default.
- W3009498344 hasConceptScore W3009498344C119857082 @default.
- W3009498344 hasConceptScore W3009498344C121332964 @default.
- W3009498344 hasConceptScore W3009498344C127413603 @default.
- W3009498344 hasConceptScore W3009498344C134306372 @default.
- W3009498344 hasConceptScore W3009498344C154945302 @default.
- W3009498344 hasConceptScore W3009498344C176217482 @default.
- W3009498344 hasConceptScore W3009498344C17744445 @default.
- W3009498344 hasConceptScore W3009498344C198043062 @default.
- W3009498344 hasConceptScore W3009498344C199539241 @default.
- W3009498344 hasConceptScore W3009498344C21547014 @default.
- W3009498344 hasConceptScore W3009498344C2776359362 @default.
- W3009498344 hasConceptScore W3009498344C2778572836 @default.
- W3009498344 hasConceptScore W3009498344C2780791683 @default.
- W3009498344 hasConceptScore W3009498344C33676613 @default.
- W3009498344 hasConceptScore W3009498344C33923547 @default.
- W3009498344 hasConceptScore W3009498344C41008148 @default.
- W3009498344 hasConceptScore W3009498344C48103436 @default.
- W3009498344 hasConceptScore W3009498344C50817715 @default.
- W3009498344 hasConceptScore W3009498344C62520636 @default.
- W3009498344 hasConceptScore W3009498344C72434380 @default.
- W3009498344 hasConceptScore W3009498344C73000952 @default.
- W3009498344 hasConceptScore W3009498344C80444323 @default.
- W3009498344 hasConceptScore W3009498344C94625758 @default.