Matches in SemOpenAlex for { <https://semopenalex.org/work/W359568995> ?p ?o ?g. }
- W359568995 endingPage "1616" @default.
- W359568995 startingPage "1609" @default.
- W359568995 abstract "We introduce the first temporal-difference learning algorithm that is stable with linear function approximation and off-policy training, for any finite Markov decision process, behavior policy, and target policy, and whose complexity scales linearly in the number of parameters. We consider an i.i.d. policy-evaluation setting in which the data need not come from on-policy experience. The gradient temporal-difference (GTD) algorithm estimates the expected update vector of the TD(0) algorithm and performs stochastic gradient descent on its L2 norm. We prove that this algorithm is stable and convergent under the usual stochastic approximation conditions to the same least-squares solution as found by the LSTD, but without LSTD's quadratic computational complexity. GTD is online and incremental, and does not involve multiplying by products of likelihood ratios as in importance-sampling methods." @default.
- W359568995 created "2016-06-24" @default.
- W359568995 creator A5004923102 @default.
- W359568995 creator A5069856068 @default.
- W359568995 creator A5084305551 @default.
- W359568995 date "2008-12-08" @default.
- W359568995 modified "2023-10-05" @default.
- W359568995 title "A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation" @default.
- W359568995 cites W103174825 @default.
- W359568995 cites W1514587017 @default.
- W359568995 cites W1515851193 @default.
- W359568995 cites W1547105496 @default.
- W359568995 cites W1575327902 @default.
- W359568995 cites W1576452626 @default.
- W359568995 cites W1597317312 @default.
- W359568995 cites W1600046456 @default.
- W359568995 cites W1646707810 @default.
- W359568995 cites W1778554682 @default.
- W359568995 cites W182596629 @default.
- W359568995 cites W2071983464 @default.
- W359568995 cites W2072931156 @default.
- W359568995 cites W2100677568 @default.
- W359568995 cites W2109910161 @default.
- W359568995 cites W2114537044 @default.
- W359568995 cites W2115253045 @default.
- W359568995 cites W2121863487 @default.
- W359568995 cites W2130005627 @default.
- W359568995 cites W2132351269 @default.
- W359568995 cites W2139418546 @default.
- W359568995 cites W2146823374 @default.
- W359568995 cites W2158091072 @default.
- W359568995 cites W3011120880 @default.
- W359568995 cites W86816279 @default.
- W359568995 hasPublicationYear "2008" @default.
- W359568995 type Work @default.
- W359568995 sameAs 359568995 @default.
- W359568995 citedByCount "76" @default.
- W359568995 countsByYear W3595689952012 @default.
- W359568995 countsByYear W3595689952013 @default.
- W359568995 countsByYear W3595689952014 @default.
- W359568995 countsByYear W3595689952015 @default.
- W359568995 countsByYear W3595689952016 @default.
- W359568995 countsByYear W3595689952017 @default.
- W359568995 countsByYear W3595689952018 @default.
- W359568995 countsByYear W3595689952019 @default.
- W359568995 countsByYear W3595689952020 @default.
- W359568995 countsByYear W3595689952021 @default.
- W359568995 crossrefType "proceedings-article" @default.
- W359568995 hasAuthorship W359568995A5004923102 @default.
- W359568995 hasAuthorship W359568995A5069856068 @default.
- W359568995 hasAuthorship W359568995A5084305551 @default.
- W359568995 hasConcept C105795698 @default.
- W359568995 hasConcept C106189395 @default.
- W359568995 hasConcept C11413529 @default.
- W359568995 hasConcept C126255220 @default.
- W359568995 hasConcept C129844170 @default.
- W359568995 hasConcept C14036430 @default.
- W359568995 hasConcept C148764684 @default.
- W359568995 hasConcept C154945302 @default.
- W359568995 hasConcept C159886148 @default.
- W359568995 hasConcept C17744445 @default.
- W359568995 hasConcept C191795146 @default.
- W359568995 hasConcept C196340769 @default.
- W359568995 hasConcept C199539241 @default.
- W359568995 hasConcept C206688291 @default.
- W359568995 hasConcept C2524010 @default.
- W359568995 hasConcept C26517878 @default.
- W359568995 hasConcept C28826006 @default.
- W359568995 hasConcept C33923547 @default.
- W359568995 hasConcept C38652104 @default.
- W359568995 hasConcept C41008148 @default.
- W359568995 hasConcept C50644808 @default.
- W359568995 hasConcept C55479107 @default.
- W359568995 hasConcept C78458016 @default.
- W359568995 hasConcept C86803240 @default.
- W359568995 hasConcept C91873725 @default.
- W359568995 hasConcept C97541855 @default.
- W359568995 hasConceptScore W359568995C105795698 @default.
- W359568995 hasConceptScore W359568995C106189395 @default.
- W359568995 hasConceptScore W359568995C11413529 @default.
- W359568995 hasConceptScore W359568995C126255220 @default.
- W359568995 hasConceptScore W359568995C129844170 @default.
- W359568995 hasConceptScore W359568995C14036430 @default.
- W359568995 hasConceptScore W359568995C148764684 @default.
- W359568995 hasConceptScore W359568995C154945302 @default.
- W359568995 hasConceptScore W359568995C159886148 @default.
- W359568995 hasConceptScore W359568995C17744445 @default.
- W359568995 hasConceptScore W359568995C191795146 @default.
- W359568995 hasConceptScore W359568995C196340769 @default.
- W359568995 hasConceptScore W359568995C199539241 @default.
- W359568995 hasConceptScore W359568995C206688291 @default.
- W359568995 hasConceptScore W359568995C2524010 @default.
- W359568995 hasConceptScore W359568995C26517878 @default.
- W359568995 hasConceptScore W359568995C28826006 @default.
- W359568995 hasConceptScore W359568995C33923547 @default.
- W359568995 hasConceptScore W359568995C38652104 @default.
- W359568995 hasConceptScore W359568995C41008148 @default.
- W359568995 hasConceptScore W359568995C50644808 @default.