Matches in SemOpenAlex for { <https://semopenalex.org/work/W3118117029> ?p ?o ?g. }
Showing items 1 to 86 of
86
with 100 items per page.
- W3118117029 abstract "For model-free reinforcement learning, one of the main difficulty of stochastic Bellman residual minimization is the double sampling problem, i.e., while only one single sample for the next state is available in the model-free setting, two independent samples for the next state are required in order to perform unbiased stochastic gradient descent. We propose new algorithms for addressing this problem based on the idea of borrowing extra randomness from the future. When the transition kernel varies slowly with respect to the state, it is shown that the training trajectory of new algorithms is close to the one of unbiased stochastic gradient descent. Numerical results for policy evaluation in both tabular and neural network settings are provided to confirm the theoretical findings." @default.
- W3118117029 created "2021-01-05" @default.
- W3118117029 creator A5011918131 @default.
- W3118117029 creator A5063403085 @default.
- W3118117029 date "2019-11-30" @default.
- W3118117029 modified "2023-09-23" @default.
- W3118117029 title "Borrowing From the Future: An Attempt to Address Double Sampling" @default.
- W3118117029 doi "https://doi.org/10.48550/arxiv.1912.00304" @default.
- W3118117029 hasPublicationYear "2019" @default.
- W3118117029 type Work @default.
- W3118117029 sameAs 3118117029 @default.
- W3118117029 citedByCount "1" @default.
- W3118117029 countsByYear W31181170292022 @default.
- W3118117029 crossrefType "posted-content" @default.
- W3118117029 hasAuthorship W3118117029A5011918131 @default.
- W3118117029 hasAuthorship W3118117029A5063403085 @default.
- W3118117029 hasBestOaLocation W31181170291 @default.
- W3118117029 hasConcept C105795698 @default.
- W3118117029 hasConcept C106131492 @default.
- W3118117029 hasConcept C11413529 @default.
- W3118117029 hasConcept C114614502 @default.
- W3118117029 hasConcept C121332964 @default.
- W3118117029 hasConcept C125112378 @default.
- W3118117029 hasConcept C126255220 @default.
- W3118117029 hasConcept C1276947 @default.
- W3118117029 hasConcept C13662910 @default.
- W3118117029 hasConcept C140779682 @default.
- W3118117029 hasConcept C147764199 @default.
- W3118117029 hasConcept C154945302 @default.
- W3118117029 hasConcept C155512373 @default.
- W3118117029 hasConcept C185592680 @default.
- W3118117029 hasConcept C19499675 @default.
- W3118117029 hasConcept C198531522 @default.
- W3118117029 hasConcept C206688291 @default.
- W3118117029 hasConcept C31972630 @default.
- W3118117029 hasConcept C33923547 @default.
- W3118117029 hasConcept C41008148 @default.
- W3118117029 hasConcept C43617362 @default.
- W3118117029 hasConcept C48103436 @default.
- W3118117029 hasConcept C50644808 @default.
- W3118117029 hasConcept C52740198 @default.
- W3118117029 hasConcept C74193536 @default.
- W3118117029 hasConcept C97541855 @default.
- W3118117029 hasConceptScore W3118117029C105795698 @default.
- W3118117029 hasConceptScore W3118117029C106131492 @default.
- W3118117029 hasConceptScore W3118117029C11413529 @default.
- W3118117029 hasConceptScore W3118117029C114614502 @default.
- W3118117029 hasConceptScore W3118117029C121332964 @default.
- W3118117029 hasConceptScore W3118117029C125112378 @default.
- W3118117029 hasConceptScore W3118117029C126255220 @default.
- W3118117029 hasConceptScore W3118117029C1276947 @default.
- W3118117029 hasConceptScore W3118117029C13662910 @default.
- W3118117029 hasConceptScore W3118117029C140779682 @default.
- W3118117029 hasConceptScore W3118117029C147764199 @default.
- W3118117029 hasConceptScore W3118117029C154945302 @default.
- W3118117029 hasConceptScore W3118117029C155512373 @default.
- W3118117029 hasConceptScore W3118117029C185592680 @default.
- W3118117029 hasConceptScore W3118117029C19499675 @default.
- W3118117029 hasConceptScore W3118117029C198531522 @default.
- W3118117029 hasConceptScore W3118117029C206688291 @default.
- W3118117029 hasConceptScore W3118117029C31972630 @default.
- W3118117029 hasConceptScore W3118117029C33923547 @default.
- W3118117029 hasConceptScore W3118117029C41008148 @default.
- W3118117029 hasConceptScore W3118117029C43617362 @default.
- W3118117029 hasConceptScore W3118117029C48103436 @default.
- W3118117029 hasConceptScore W3118117029C50644808 @default.
- W3118117029 hasConceptScore W3118117029C52740198 @default.
- W3118117029 hasConceptScore W3118117029C74193536 @default.
- W3118117029 hasConceptScore W3118117029C97541855 @default.
- W3118117029 hasLocation W31181170291 @default.
- W3118117029 hasOpenAccess W3118117029 @default.
- W3118117029 hasPrimaryLocation W31181170291 @default.
- W3118117029 hasRelatedWork W1606274310 @default.
- W3118117029 hasRelatedWork W2002748013 @default.
- W3118117029 hasRelatedWork W2074639794 @default.
- W3118117029 hasRelatedWork W2359549665 @default.
- W3118117029 hasRelatedWork W3009360357 @default.
- W3118117029 hasRelatedWork W3159953444 @default.
- W3118117029 hasRelatedWork W4205266671 @default.
- W3118117029 hasRelatedWork W4244907869 @default.
- W3118117029 hasRelatedWork W4307415184 @default.
- W3118117029 hasRelatedWork W4372259970 @default.
- W3118117029 isParatext "false" @default.
- W3118117029 isRetracted "false" @default.
- W3118117029 magId "3118117029" @default.
- W3118117029 workType "article" @default.