Matches in SemOpenAlex for { <https://semopenalex.org/work/W3209907215> ?p ?o ?g. }
- W3209907215 abstract "We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems as linear mixture SSPs. We propose a novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which can attain an $tilde{mathcal{O}}(d B_{star}^{1.5}sqrt{K/c_{min}})$ regret. Here $K$ is the number of episodes, $d$ is the dimension of the feature mapping in the mixture model, $B_{star}$ bounds the expected cumulative cost of the optimal policy, and $c_{min}>0$ is the lower bound of the cost function. Our algorithm also applies to the case when $c_{min} = 0$, and an $tilde{mathcal{O}}(K^{2/3})$ regret is guaranteed. To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP. Moreover, we design a refined Bernstein-type confidence set and propose an improved algorithm, which provably achieves an $tilde{mathcal{O}}(d B_{star}sqrt{K/c_{min}})$ regret. In complement to the regret upper bounds, we also prove a lower bound of $Omega(dB_{star} sqrt{K})$. Hence, our improved algorithm matches the lower bound up to a $1/sqrt{c_{min}}$ factor and poly-logarithmic factors, achieving a near-optimal regret guarantee." @default.
- W3209907215 created "2021-11-08" @default.
- W3209907215 creator A5005741793 @default.
- W3209907215 creator A5051448391 @default.
- W3209907215 creator A5057534674 @default.
- W3209907215 creator A5065562247 @default.
- W3209907215 date "2021-10-25" @default.
- W3209907215 modified "2023-09-23" @default.
- W3209907215 title "Learning Stochastic Shortest Path with Linear Function Approximation" @default.
- W3209907215 cites W1850488217 @default.
- W3209907215 cites W1867103660 @default.
- W3209907215 cites W2081287319 @default.
- W3209907215 cites W2098432798 @default.
- W3209907215 cites W2119738618 @default.
- W3209907215 cites W2140019566 @default.
- W3209907215 cites W2293729149 @default.
- W3209907215 cites W2405532007 @default.
- W3209907215 cites W2592489447 @default.
- W3209907215 cites W2956123884 @default.
- W3209907215 cites W2964001908 @default.
- W3209907215 cites W2970534317 @default.
- W3209907215 cites W2995638039 @default.
- W3209907215 cites W3034335560 @default.
- W3209907215 cites W3034452038 @default.
- W3209907215 cites W3034870712 @default.
- W3209907215 cites W3034871777 @default.
- W3209907215 cites W3035273634 @default.
- W3209907215 cites W3037341018 @default.
- W3209907215 cites W3037850847 @default.
- W3209907215 cites W3046395471 @default.
- W3209907215 cites W3095342738 @default.
- W3209907215 cites W3099009443 @default.
- W3209907215 cites W3100499156 @default.
- W3209907215 cites W3104032756 @default.
- W3209907215 cites W3109462044 @default.
- W3209907215 cites W3126554436 @default.
- W3209907215 cites W3127390000 @default.
- W3209907215 cites W3138931520 @default.
- W3209907215 cites W3145375328 @default.
- W3209907215 cites W3156636067 @default.
- W3209907215 cites W3158811078 @default.
- W3209907215 cites W3168056074 @default.
- W3209907215 cites W3169423158 @default.
- W3209907215 cites W3169492666 @default.
- W3209907215 cites W3170614855 @default.
- W3209907215 cites W3175554921 @default.
- W3209907215 cites W3197999671 @default.
- W3209907215 doi "https://doi.org/10.48550/arxiv.2110.12727" @default.
- W3209907215 hasPublicationYear "2021" @default.
- W3209907215 type Work @default.
- W3209907215 sameAs 3209907215 @default.
- W3209907215 citedByCount "1" @default.
- W3209907215 countsByYear W32099072152021 @default.
- W3209907215 crossrefType "posted-content" @default.
- W3209907215 hasAuthorship W3209907215A5005741793 @default.
- W3209907215 hasAuthorship W3209907215A5051448391 @default.
- W3209907215 hasAuthorship W3209907215A5057534674 @default.
- W3209907215 hasAuthorship W3209907215A5065562247 @default.
- W3209907215 hasBestOaLocation W32099072151 @default.
- W3209907215 hasConcept C105795698 @default.
- W3209907215 hasConcept C114614502 @default.
- W3209907215 hasConcept C117160843 @default.
- W3209907215 hasConcept C118615104 @default.
- W3209907215 hasConcept C132525143 @default.
- W3209907215 hasConcept C134306372 @default.
- W3209907215 hasConcept C14036430 @default.
- W3209907215 hasConcept C199360897 @default.
- W3209907215 hasConcept C22590252 @default.
- W3209907215 hasConcept C2777735758 @default.
- W3209907215 hasConcept C2780897414 @default.
- W3209907215 hasConcept C33676613 @default.
- W3209907215 hasConcept C33923547 @default.
- W3209907215 hasConcept C39927690 @default.
- W3209907215 hasConcept C41008148 @default.
- W3209907215 hasConcept C50817715 @default.
- W3209907215 hasConcept C77553402 @default.
- W3209907215 hasConcept C78458016 @default.
- W3209907215 hasConcept C86803240 @default.
- W3209907215 hasConceptScore W3209907215C105795698 @default.
- W3209907215 hasConceptScore W3209907215C114614502 @default.
- W3209907215 hasConceptScore W3209907215C117160843 @default.
- W3209907215 hasConceptScore W3209907215C118615104 @default.
- W3209907215 hasConceptScore W3209907215C132525143 @default.
- W3209907215 hasConceptScore W3209907215C134306372 @default.
- W3209907215 hasConceptScore W3209907215C14036430 @default.
- W3209907215 hasConceptScore W3209907215C199360897 @default.
- W3209907215 hasConceptScore W3209907215C22590252 @default.
- W3209907215 hasConceptScore W3209907215C2777735758 @default.
- W3209907215 hasConceptScore W3209907215C2780897414 @default.
- W3209907215 hasConceptScore W3209907215C33676613 @default.
- W3209907215 hasConceptScore W3209907215C33923547 @default.
- W3209907215 hasConceptScore W3209907215C39927690 @default.
- W3209907215 hasConceptScore W3209907215C41008148 @default.
- W3209907215 hasConceptScore W3209907215C50817715 @default.
- W3209907215 hasConceptScore W3209907215C77553402 @default.
- W3209907215 hasConceptScore W3209907215C78458016 @default.
- W3209907215 hasConceptScore W3209907215C86803240 @default.
- W3209907215 hasLocation W32099072151 @default.
- W3209907215 hasOpenAccess W3209907215 @default.
- W3209907215 hasPrimaryLocation W32099072151 @default.