Matches in SemOpenAlex for { <https://semopenalex.org/work/W2964112145> ?p ?o ?g. }
- W2964112145 endingPage "7802" @default.
- W2964112145 startingPage "7745" @default.
- W2964112145 abstract "We consider the emphatic temporal-difference (TD) algorithm, ETD(λ), for learning the value functions of stationary policies in a discounted, finite state and action Markov decision process. The ETD(λ) algorithm was recently proposed by Sutton, Mahmood, and White (2016) to solve a long-standing divergence problem of the standard TD algorithm when it is applied to off-policy training, where data from an exploratory policy are used to evaluate other policies of interest. The almost sure convergence of ETD(λ) has been proved in our recent work under general off-policy training conditions, but for a narrow range of diminishing stepsize. In this paper we present convergence results for constrained versions of ETD(λ) with constant stepsize and with diminishing stepsize from a broad range. Our results characterize the asymptotic behavior of the trajectory of iterates produced by those algorithms, and are derived by combining key properties of ETD(λ) with powerful convergence theorems from the weak convergence methods in stochastic approximation theory. For the case of constant stepsize, in addition to analyzing the behavior of the algorithms in the limit as the stepsize parameter approaches zero, we also analyze their behavior for a fixed stepsize and bound the deviations of their averaged iterates from the desired solution. These results are obtained by exploiting the weak Feller property of the Markov chains associated with the algorithms, and by using ergodic theorems for weak Feller Markov chains, in conjunction with the convergence results we get from the weak convergence methods. Besides ETD(λ), our analysis also applies to the off-policy TD(λ) algorithm, when the divergence issue is avoided by setting λ sufficiently large. It yields, for that case, new results on the asymptotic convergence properties of constrained off-policy TD(λ) with constant or slowly diminishing stepsize." @default.
- W2964112145 created "2019-07-30" @default.
- W2964112145 creator A5014937312 @default.
- W2964112145 date "2016-01-01" @default.
- W2964112145 modified "2023-09-24" @default.
- W2964112145 title "Weak convergence properties of constrained emphatic temporal-difference learning with constant and slowly diminishing stepsize" @default.
- W2964112145 cites W1499021337 @default.
- W2964112145 cites W1514587017 @default.
- W2964112145 cites W1547925194 @default.
- W2964112145 cites W1576452626 @default.
- W2964112145 cites W1594216983 @default.
- W2964112145 cites W1597303641 @default.
- W2964112145 cites W1600046456 @default.
- W2964112145 cites W1646707810 @default.
- W2964112145 cites W1835716857 @default.
- W2964112145 cites W1995713768 @default.
- W2964112145 cites W2002142848 @default.
- W2964112145 cites W2019172585 @default.
- W2964112145 cites W2072931156 @default.
- W2964112145 cites W2073733021 @default.
- W2964112145 cites W2075167161 @default.
- W2964112145 cites W2075268401 @default.
- W2964112145 cites W2086161653 @default.
- W2964112145 cites W2087018957 @default.
- W2964112145 cites W2100677568 @default.
- W2964112145 cites W2104753538 @default.
- W2964112145 cites W2107741520 @default.
- W2964112145 cites W2114901408 @default.
- W2964112145 cites W2117355432 @default.
- W2964112145 cites W2119567691 @default.
- W2964112145 cites W2121703796 @default.
- W2964112145 cites W2130599357 @default.
- W2964112145 cites W2139418546 @default.
- W2964112145 cites W2140778663 @default.
- W2964112145 cites W2141022000 @default.
- W2964112145 cites W2156737235 @default.
- W2964112145 cites W2158126207 @default.
- W2964112145 cites W2198988526 @default.
- W2964112145 cites W2235056388 @default.
- W2964112145 cites W2286297039 @default.
- W2964112145 cites W2374285318 @default.
- W2964112145 cites W2395162158 @default.
- W2964112145 cites W2398850217 @default.
- W2964112145 cites W2473364827 @default.
- W2964112145 cites W2799137445 @default.
- W2964112145 cites W2802739963 @default.
- W2964112145 cites W2911283634 @default.
- W2964112145 cites W2949257336 @default.
- W2964112145 cites W2949510529 @default.
- W2964112145 cites W2950545402 @default.
- W2964112145 cites W2951143668 @default.
- W2964112145 cites W3150304496 @default.
- W2964112145 cites W359568995 @default.
- W2964112145 cites W649943522 @default.
- W2964112145 cites W779665318 @default.
- W2964112145 hasPublicationYear "2016" @default.
- W2964112145 type Work @default.
- W2964112145 sameAs 2964112145 @default.
- W2964112145 citedByCount "11" @default.
- W2964112145 countsByYear W29641121452015 @default.
- W2964112145 countsByYear W29641121452016 @default.
- W2964112145 countsByYear W29641121452017 @default.
- W2964112145 countsByYear W29641121452018 @default.
- W2964112145 countsByYear W29641121452020 @default.
- W2964112145 countsByYear W29641121452021 @default.
- W2964112145 crossrefType "journal-article" @default.
- W2964112145 hasAuthorship W2964112145A5014937312 @default.
- W2964112145 hasConcept C105795698 @default.
- W2964112145 hasConcept C106189395 @default.
- W2964112145 hasConcept C122044880 @default.
- W2964112145 hasConcept C126255220 @default.
- W2964112145 hasConcept C134306372 @default.
- W2964112145 hasConcept C138885662 @default.
- W2964112145 hasConcept C140479938 @default.
- W2964112145 hasConcept C151201525 @default.
- W2964112145 hasConcept C159886148 @default.
- W2964112145 hasConcept C162324750 @default.
- W2964112145 hasConcept C199360897 @default.
- W2964112145 hasConcept C207390915 @default.
- W2964112145 hasConcept C26517878 @default.
- W2964112145 hasConcept C2777027219 @default.
- W2964112145 hasConcept C2777303404 @default.
- W2964112145 hasConcept C28826006 @default.
- W2964112145 hasConcept C33923547 @default.
- W2964112145 hasConcept C38652104 @default.
- W2964112145 hasConcept C41008148 @default.
- W2964112145 hasConcept C41895202 @default.
- W2964112145 hasConcept C50522688 @default.
- W2964112145 hasConcept C55479107 @default.
- W2964112145 hasConcept C57869625 @default.
- W2964112145 hasConcept C57945734 @default.
- W2964112145 hasConcept C76178495 @default.
- W2964112145 hasConcept C98763669 @default.
- W2964112145 hasConceptScore W2964112145C105795698 @default.
- W2964112145 hasConceptScore W2964112145C106189395 @default.
- W2964112145 hasConceptScore W2964112145C122044880 @default.
- W2964112145 hasConceptScore W2964112145C126255220 @default.
- W2964112145 hasConceptScore W2964112145C134306372 @default.