Matches in SemOpenAlex for { <https://semopenalex.org/work/W2993386228> ?p ?o ?g. }
Showing items 1 to 83 of
83
with 100 items per page.
- W2993386228 abstract "Many popular reinforcement learning problems (e.g., navigation in a maze, some Atari games, mountain car) are instances of the episodic setting under its stochastic shortest path (SSP) formulation, where an agent has to achieve a goal state while minimizing the cumulative cost. Despite the popularity of this setting, the exploration-exploitation dilemma has been sparsely studied in general SSP problems, with most of the theoretical literature focusing on different problems (i.e., fixed-horizon and infinite-horizon) or making the restrictive loop-free SSP assumption (i.e., no state can be visited twice during an episode). In this paper, we study the general SSP problem with no assumption on its dynamics (some policies may actually never reach the goal). We introduce UC-SSP, the first no-regret algorithm in this setting, and prove a regret bound scaling as $displaystyle widetilde{mathcal{O}}( D S sqrt{ A D K})$ after $K$ episodes for any unknown SSP with $S$ states, $A$ actions, positive costs and SSP-diameter $D$, defined as the smallest expected hitting time from any starting state to the goal. We achieve this result by crafting a novel stopping rule, such that UC-SSP may interrupt the current policy if it is taking too long to achieve the goal and switch to alternative policies that are designed to rapidly terminate the episode." @default.
- W2993386228 created "2019-12-13" @default.
- W2993386228 creator A5000492848 @default.
- W2993386228 creator A5014791481 @default.
- W2993386228 creator A5070500506 @default.
- W2993386228 creator A5071798388 @default.
- W2993386228 creator A5091526684 @default.
- W2993386228 date "2020-01-01" @default.
- W2993386228 modified "2023-10-03" @default.
- W2993386228 title "No-regret exploration in goal-oriented reinforcement learning" @default.
- W2993386228 cites W1850488217 @default.
- W2993386228 cites W202180931 @default.
- W2993386228 cites W2062573385 @default.
- W2993386228 cites W2098432798 @default.
- W2993386228 cites W2114091628 @default.
- W2993386228 cites W2114202040 @default.
- W2993386228 cites W2150234726 @default.
- W2993386228 cites W21934178 @default.
- W2993386228 cites W2241126168 @default.
- W2993386228 cites W2290051315 @default.
- W2993386228 cites W2552231430 @default.
- W2993386228 cites W2592489447 @default.
- W2993386228 cites W2832404192 @default.
- W2993386228 cites W2946284958 @default.
- W2993386228 cites W2962723383 @default.
- W2993386228 cites W2963554715 @default.
- W2993386228 cites W2963767098 @default.
- W2993386228 cites W2963906652 @default.
- W2993386228 cites W2964054583 @default.
- W2993386228 cites W2967785980 @default.
- W2993386228 cites W2971085818 @default.
- W2993386228 cites W3034870712 @default.
- W2993386228 cites W3035759338 @default.
- W2993386228 cites W3041070598 @default.
- W2993386228 hasPublicationYear "2020" @default.
- W2993386228 type Work @default.
- W2993386228 sameAs 2993386228 @default.
- W2993386228 citedByCount "1" @default.
- W2993386228 countsByYear W29933862282020 @default.
- W2993386228 crossrefType "proceedings-article" @default.
- W2993386228 hasAuthorship W2993386228A5000492848 @default.
- W2993386228 hasAuthorship W2993386228A5014791481 @default.
- W2993386228 hasAuthorship W2993386228A5070500506 @default.
- W2993386228 hasAuthorship W2993386228A5071798388 @default.
- W2993386228 hasAuthorship W2993386228A5091526684 @default.
- W2993386228 hasBestOaLocation W29933862281 @default.
- W2993386228 hasConcept C107457646 @default.
- W2993386228 hasConcept C119857082 @default.
- W2993386228 hasConcept C154945302 @default.
- W2993386228 hasConcept C15744967 @default.
- W2993386228 hasConcept C41008148 @default.
- W2993386228 hasConcept C50817715 @default.
- W2993386228 hasConcept C77805123 @default.
- W2993386228 hasConcept C84653758 @default.
- W2993386228 hasConcept C97541855 @default.
- W2993386228 hasConceptScore W2993386228C107457646 @default.
- W2993386228 hasConceptScore W2993386228C119857082 @default.
- W2993386228 hasConceptScore W2993386228C154945302 @default.
- W2993386228 hasConceptScore W2993386228C15744967 @default.
- W2993386228 hasConceptScore W2993386228C41008148 @default.
- W2993386228 hasConceptScore W2993386228C50817715 @default.
- W2993386228 hasConceptScore W2993386228C77805123 @default.
- W2993386228 hasConceptScore W2993386228C84653758 @default.
- W2993386228 hasConceptScore W2993386228C97541855 @default.
- W2993386228 hasLocation W29933862281 @default.
- W2993386228 hasLocation W29933862282 @default.
- W2993386228 hasLocation W29933862283 @default.
- W2993386228 hasOpenAccess W2993386228 @default.
- W2993386228 hasPrimaryLocation W29933862281 @default.
- W2993386228 hasRelatedWork W2002805310 @default.
- W2993386228 hasRelatedWork W2132908009 @default.
- W2993386228 hasRelatedWork W2558906668 @default.
- W2993386228 hasRelatedWork W2945119207 @default.
- W2993386228 hasRelatedWork W4284890489 @default.
- W2993386228 hasRelatedWork W4285324069 @default.
- W2993386228 hasRelatedWork W4292701710 @default.
- W2993386228 hasRelatedWork W4294827289 @default.
- W2993386228 hasRelatedWork W4296078469 @default.
- W2993386228 hasRelatedWork W4313913681 @default.
- W2993386228 isParatext "false" @default.
- W2993386228 isRetracted "false" @default.
- W2993386228 magId "2993386228" @default.
- W2993386228 workType "article" @default.