Matches in SemOpenAlex for { <https://semopenalex.org/work/W3049337243> ?p ?o ?g. }
- W3049337243 abstract "Temporal difference (TD) learning is one of the main foundations of modern reinforcement learning. This paper studies the use of TD(0), a canonical TD algorithm, to estimate the value function of a given policy from a batch of data. In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of times that action occurred in the batch -- not the true probability of the action under the given policy. To address this limitation, we introduce textit{policy sampling error corrected}-TD(0) (PSEC-TD(0)). PSEC-TD(0) first estimates the empirical distribution of actions in each state in the batch and then uses importance sampling to correct for the mismatch between the empirical weighting and the correct weighting for updates following each action. We refine the concept of a certainty-equivalence estimate and argue that PSEC-TD(0) is a more data efficient estimator than TD(0) for a fixed batch of data. Finally, we conduct an empirical evaluation of PSEC-TD(0) on three batch value function learning tasks, with a hyperparameter sensitivity analysis, and show that PSEC-TD(0) produces value function estimates with lower mean squared error than TD(0)." @default.
- W3049337243 created "2020-08-21" @default.
- W3049337243 creator A5001594330 @default.
- W3049337243 creator A5008014974 @default.
- W3049337243 creator A5018095069 @default.
- W3049337243 creator A5070355024 @default.
- W3049337243 date "2020-08-15" @default.
- W3049337243 modified "2023-09-27" @default.
- W3049337243 title "Reducing Sampling Error in Batch Temporal Difference Learning" @default.
- W3049337243 cites W1514587017 @default.
- W3049337243 cites W1545148916 @default.
- W3049337243 cites W1646707810 @default.
- W3049337243 cites W191658262 @default.
- W3049337243 cites W1977655452 @default.
- W3049337243 cites W2020609518 @default.
- W3049337243 cites W2034806082 @default.
- W3049337243 cites W2045569659 @default.
- W3049337243 cites W2062989416 @default.
- W3049337243 cites W2072931156 @default.
- W3049337243 cites W2075268401 @default.
- W3049337243 cites W2100677568 @default.
- W3049337243 cites W2100752967 @default.
- W3049337243 cites W2119567691 @default.
- W3049337243 cites W2119717200 @default.
- W3049337243 cites W2121863487 @default.
- W3049337243 cites W2124175081 @default.
- W3049337243 cites W2156737235 @default.
- W3049337243 cites W2158782408 @default.
- W3049337243 cites W2205490832 @default.
- W3049337243 cites W2341171179 @default.
- W3049337243 cites W2557999663 @default.
- W3049337243 cites W2593237273 @default.
- W3049337243 cites W2736601468 @default.
- W3049337243 cites W2890951405 @default.
- W3049337243 cites W2899652047 @default.
- W3049337243 cites W2963052985 @default.
- W3049337243 cites W2964043796 @default.
- W3049337243 cites W2964055673 @default.
- W3049337243 cites W2964121744 @default.
- W3049337243 cites W3037207827 @default.
- W3049337243 cites W3100944043 @default.
- W3049337243 cites W3122193054 @default.
- W3049337243 cites W3139377883 @default.
- W3049337243 cites W779665318 @default.
- W3049337243 cites W2529573675 @default.
- W3049337243 hasPublicationYear "2020" @default.
- W3049337243 type Work @default.
- W3049337243 sameAs 3049337243 @default.
- W3049337243 citedByCount "1" @default.
- W3049337243 countsByYear W30493372432021 @default.
- W3049337243 crossrefType "posted-content" @default.
- W3049337243 hasAuthorship W3049337243A5001594330 @default.
- W3049337243 hasAuthorship W3049337243A5008014974 @default.
- W3049337243 hasAuthorship W3049337243A5018095069 @default.
- W3049337243 hasAuthorship W3049337243A5070355024 @default.
- W3049337243 hasConcept C105795698 @default.
- W3049337243 hasConcept C106131492 @default.
- W3049337243 hasConcept C11413529 @default.
- W3049337243 hasConcept C126838900 @default.
- W3049337243 hasConcept C14036430 @default.
- W3049337243 hasConcept C140779682 @default.
- W3049337243 hasConcept C154945302 @default.
- W3049337243 hasConcept C183115368 @default.
- W3049337243 hasConcept C185429906 @default.
- W3049337243 hasConcept C196340769 @default.
- W3049337243 hasConcept C2776291640 @default.
- W3049337243 hasConcept C31972630 @default.
- W3049337243 hasConcept C33923547 @default.
- W3049337243 hasConcept C41008148 @default.
- W3049337243 hasConcept C71924100 @default.
- W3049337243 hasConcept C78458016 @default.
- W3049337243 hasConcept C8642999 @default.
- W3049337243 hasConcept C86803240 @default.
- W3049337243 hasConcept C97541855 @default.
- W3049337243 hasConcept C98385598 @default.
- W3049337243 hasConceptScore W3049337243C105795698 @default.
- W3049337243 hasConceptScore W3049337243C106131492 @default.
- W3049337243 hasConceptScore W3049337243C11413529 @default.
- W3049337243 hasConceptScore W3049337243C126838900 @default.
- W3049337243 hasConceptScore W3049337243C14036430 @default.
- W3049337243 hasConceptScore W3049337243C140779682 @default.
- W3049337243 hasConceptScore W3049337243C154945302 @default.
- W3049337243 hasConceptScore W3049337243C183115368 @default.
- W3049337243 hasConceptScore W3049337243C185429906 @default.
- W3049337243 hasConceptScore W3049337243C196340769 @default.
- W3049337243 hasConceptScore W3049337243C2776291640 @default.
- W3049337243 hasConceptScore W3049337243C31972630 @default.
- W3049337243 hasConceptScore W3049337243C33923547 @default.
- W3049337243 hasConceptScore W3049337243C41008148 @default.
- W3049337243 hasConceptScore W3049337243C71924100 @default.
- W3049337243 hasConceptScore W3049337243C78458016 @default.
- W3049337243 hasConceptScore W3049337243C8642999 @default.
- W3049337243 hasConceptScore W3049337243C86803240 @default.
- W3049337243 hasConceptScore W3049337243C97541855 @default.
- W3049337243 hasConceptScore W3049337243C98385598 @default.
- W3049337243 hasLocation W30493372431 @default.
- W3049337243 hasOpenAccess W3049337243 @default.
- W3049337243 hasPrimaryLocation W30493372431 @default.
- W3049337243 hasRelatedWork W1483082734 @default.
- W3049337243 hasRelatedWork W1506728749 @default.