Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287333228> ?p ?o ?g. }
Showing items 1 to 68 of
68
with 100 items per page.
- W4287333228 abstract "This paper develops an unified framework to study finite-sample convergence guarantees of a large class of value-based asynchronous reinforcement learning (RL) algorithms. We do this by first reformulating the RL algorithms as textit{Markovian Stochastic Approximation} (SA) algorithms to solve fixed-point equations. We then develop a Lyapunov analysis and derive mean-square error bounds on the convergence of the Markovian SA. Based on this result, we establish finite-sample mean-square convergence bounds for asynchronous RL algorithms such as $Q$-learning, $n$-step TD, TD$(lambda)$, and off-policy TD algorithms including V-trace. As a by-product, by analyzing the convergence bounds of $n$-step TD and TD$(lambda)$, we provide theoretical insights into the bias-variance trade-off, i.e., efficiency of bootstrapping in RL. This was first posed as an open problem in (Sutton, 1999)." @default.
- W4287333228 created "2022-07-25" @default.
- W4287333228 creator A5021806638 @default.
- W4287333228 creator A5028903768 @default.
- W4287333228 creator A5034837735 @default.
- W4287333228 creator A5058269077 @default.
- W4287333228 date "2021-02-02" @default.
- W4287333228 modified "2023-10-07" @default.
- W4287333228 title "A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants" @default.
- W4287333228 hasPublicationYear "2021" @default.
- W4287333228 type Work @default.
- W4287333228 citedByCount "0" @default.
- W4287333228 crossrefType "posted-content" @default.
- W4287333228 hasAuthorship W4287333228A5021806638 @default.
- W4287333228 hasAuthorship W4287333228A5028903768 @default.
- W4287333228 hasAuthorship W4287333228A5034837735 @default.
- W4287333228 hasAuthorship W4287333228A5058269077 @default.
- W4287333228 hasBestOaLocation W42873332281 @default.
- W4287333228 hasConcept C11413529 @default.
- W4287333228 hasConcept C126255220 @default.
- W4287333228 hasConcept C135692309 @default.
- W4287333228 hasConcept C149782125 @default.
- W4287333228 hasConcept C151319957 @default.
- W4287333228 hasConcept C154945302 @default.
- W4287333228 hasConcept C162324750 @default.
- W4287333228 hasConcept C207609745 @default.
- W4287333228 hasConcept C2524010 @default.
- W4287333228 hasConcept C2777303404 @default.
- W4287333228 hasConcept C28826006 @default.
- W4287333228 hasConcept C31258907 @default.
- W4287333228 hasConcept C33923547 @default.
- W4287333228 hasConcept C41008148 @default.
- W4287333228 hasConcept C50522688 @default.
- W4287333228 hasConcept C55479107 @default.
- W4287333228 hasConcept C97541855 @default.
- W4287333228 hasConceptScore W4287333228C11413529 @default.
- W4287333228 hasConceptScore W4287333228C126255220 @default.
- W4287333228 hasConceptScore W4287333228C135692309 @default.
- W4287333228 hasConceptScore W4287333228C149782125 @default.
- W4287333228 hasConceptScore W4287333228C151319957 @default.
- W4287333228 hasConceptScore W4287333228C154945302 @default.
- W4287333228 hasConceptScore W4287333228C162324750 @default.
- W4287333228 hasConceptScore W4287333228C207609745 @default.
- W4287333228 hasConceptScore W4287333228C2524010 @default.
- W4287333228 hasConceptScore W4287333228C2777303404 @default.
- W4287333228 hasConceptScore W4287333228C28826006 @default.
- W4287333228 hasConceptScore W4287333228C31258907 @default.
- W4287333228 hasConceptScore W4287333228C33923547 @default.
- W4287333228 hasConceptScore W4287333228C41008148 @default.
- W4287333228 hasConceptScore W4287333228C50522688 @default.
- W4287333228 hasConceptScore W4287333228C55479107 @default.
- W4287333228 hasConceptScore W4287333228C97541855 @default.
- W4287333228 hasLocation W42873332281 @default.
- W4287333228 hasOpenAccess W4287333228 @default.
- W4287333228 hasPrimaryLocation W42873332281 @default.
- W4287333228 hasRelatedWork W10376161 @default.
- W4287333228 hasRelatedWork W10980626 @default.
- W4287333228 hasRelatedWork W1119538 @default.
- W4287333228 hasRelatedWork W11613682 @default.
- W4287333228 hasRelatedWork W222915 @default.
- W4287333228 hasRelatedWork W5532710 @default.
- W4287333228 hasRelatedWork W5709971 @default.
- W4287333228 hasRelatedWork W6614468 @default.
- W4287333228 hasRelatedWork W7366101 @default.
- W4287333228 hasRelatedWork W822699 @default.
- W4287333228 isParatext "false" @default.
- W4287333228 isRetracted "false" @default.
- W4287333228 workType "article" @default.