Matches in SemOpenAlex for { <https://semopenalex.org/work/W2970999177> ?p ?o ?g. }
Showing items 1 to 98 of
98
with 100 items per page.
- W2970999177 endingPage "8675" @default.
- W2970999177 startingPage "8665" @default.
- W2970999177 abstract "SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement learning. We investigate the SARSA algorithm with linear function approximation under the setting, where a single sample trajectory is available. With a Lipschitz continuous policy improvement operator that is smooth enough, SARSA has been shown to converge asymptotically. However, its non-asymptotic analysis is challenging and remains unsolved due to the non-i.i.d. samples, and the fact that the behavior policy changes dynamically with time. In this paper, we develop a novel technique to explicitly characterize the stochastic bias of a type of stochastic approximation procedures with time-varying Markov transition kernels. Our approach enables non-asymptotic convergence analyses of this type of stochastic approximation algorithms, which may be of independent interest. Using our bias characterization technique and a gradient descent type of analysis, we further provide the finite-sample analysis on the mean square error of the SARSA algorithm. In the end, we present a fitted SARSA algorithm, which includes the original SARSA algorithm and its variant as special cases. This fitted SARSA algorithm provides a framework for textit{iterative} on-policy fitted policy iteration, which is more memory and computationally efficient. For this fitted SARSA algorithm, we also present its finite-sample analysis." @default.
- W2970999177 created "2019-09-05" @default.
- W2970999177 creator A5012545205 @default.
- W2970999177 creator A5013771140 @default.
- W2970999177 creator A5042048777 @default.
- W2970999177 date "2019-02-01" @default.
- W2970999177 modified "2023-09-24" @default.
- W2970999177 title "Finite-Sample Analysis for SARSA with Linear Function Approximation" @default.
- W2970999177 hasPublicationYear "2019" @default.
- W2970999177 type Work @default.
- W2970999177 sameAs 2970999177 @default.
- W2970999177 citedByCount "45" @default.
- W2970999177 countsByYear W29709991772019 @default.
- W2970999177 countsByYear W29709991772020 @default.
- W2970999177 countsByYear W29709991772021 @default.
- W2970999177 countsByYear W29709991772022 @default.
- W2970999177 crossrefType "proceedings-article" @default.
- W2970999177 hasAuthorship W2970999177A5012545205 @default.
- W2970999177 hasAuthorship W2970999177A5013771140 @default.
- W2970999177 hasAuthorship W2970999177A5042048777 @default.
- W2970999177 hasConcept C105795698 @default.
- W2970999177 hasConcept C106189395 @default.
- W2970999177 hasConcept C11413529 @default.
- W2970999177 hasConcept C126255220 @default.
- W2970999177 hasConcept C127162648 @default.
- W2970999177 hasConcept C134306372 @default.
- W2970999177 hasConcept C154945302 @default.
- W2970999177 hasConcept C159886148 @default.
- W2970999177 hasConcept C162324750 @default.
- W2970999177 hasConcept C22324862 @default.
- W2970999177 hasConcept C26517878 @default.
- W2970999177 hasConcept C2777303404 @default.
- W2970999177 hasConcept C28826006 @default.
- W2970999177 hasConcept C31258907 @default.
- W2970999177 hasConcept C33923547 @default.
- W2970999177 hasConcept C38652104 @default.
- W2970999177 hasConcept C41008148 @default.
- W2970999177 hasConcept C50522688 @default.
- W2970999177 hasConcept C50644808 @default.
- W2970999177 hasConcept C55479107 @default.
- W2970999177 hasConcept C57869625 @default.
- W2970999177 hasConcept C91873725 @default.
- W2970999177 hasConcept C97541855 @default.
- W2970999177 hasConcept C98763669 @default.
- W2970999177 hasConceptScore W2970999177C105795698 @default.
- W2970999177 hasConceptScore W2970999177C106189395 @default.
- W2970999177 hasConceptScore W2970999177C11413529 @default.
- W2970999177 hasConceptScore W2970999177C126255220 @default.
- W2970999177 hasConceptScore W2970999177C127162648 @default.
- W2970999177 hasConceptScore W2970999177C134306372 @default.
- W2970999177 hasConceptScore W2970999177C154945302 @default.
- W2970999177 hasConceptScore W2970999177C159886148 @default.
- W2970999177 hasConceptScore W2970999177C162324750 @default.
- W2970999177 hasConceptScore W2970999177C22324862 @default.
- W2970999177 hasConceptScore W2970999177C26517878 @default.
- W2970999177 hasConceptScore W2970999177C2777303404 @default.
- W2970999177 hasConceptScore W2970999177C28826006 @default.
- W2970999177 hasConceptScore W2970999177C31258907 @default.
- W2970999177 hasConceptScore W2970999177C33923547 @default.
- W2970999177 hasConceptScore W2970999177C38652104 @default.
- W2970999177 hasConceptScore W2970999177C41008148 @default.
- W2970999177 hasConceptScore W2970999177C50522688 @default.
- W2970999177 hasConceptScore W2970999177C50644808 @default.
- W2970999177 hasConceptScore W2970999177C55479107 @default.
- W2970999177 hasConceptScore W2970999177C57869625 @default.
- W2970999177 hasConceptScore W2970999177C91873725 @default.
- W2970999177 hasConceptScore W2970999177C97541855 @default.
- W2970999177 hasConceptScore W2970999177C98763669 @default.
- W2970999177 hasLocation W29709991771 @default.
- W2970999177 hasOpenAccess W2970999177 @default.
- W2970999177 hasPrimaryLocation W29709991771 @default.
- W2970999177 hasRelatedWork W13294968 @default.
- W2970999177 hasRelatedWork W1646707810 @default.
- W2970999177 hasRelatedWork W2071983464 @default.
- W2970999177 hasRelatedWork W2075268401 @default.
- W2970999177 hasRelatedWork W2100677568 @default.
- W2970999177 hasRelatedWork W2121863487 @default.
- W2970999177 hasRelatedWork W2145339207 @default.
- W2970999177 hasRelatedWork W2151661095 @default.
- W2970999177 hasRelatedWork W2155027007 @default.
- W2970999177 hasRelatedWork W2156737235 @default.
- W2970999177 hasRelatedWork W2165905123 @default.
- W2970999177 hasRelatedWork W2257979135 @default.
- W2970999177 hasRelatedWork W2395162158 @default.
- W2970999177 hasRelatedWork W2964043796 @default.
- W2970999177 hasRelatedWork W2964123095 @default.
- W2970999177 hasRelatedWork W2966363432 @default.
- W2970999177 hasRelatedWork W2970128053 @default.
- W2970999177 hasRelatedWork W2970961807 @default.
- W2970999177 hasRelatedWork W2977813751 @default.
- W2970999177 hasRelatedWork W3136903997 @default.
- W2970999177 hasVolume "32" @default.
- W2970999177 isParatext "false" @default.
- W2970999177 isRetracted "false" @default.
- W2970999177 magId "2970999177" @default.
- W2970999177 workType "article" @default.