Matches in SemOpenAlex for { <https://semopenalex.org/work/W2996233539> ?p ?o ?g. }
- W2996233539 abstract "We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies. In particular, we introduce a novel doubly-robust estimator for the OPE problem in RL, based on the Targeted Maximum Likelihood Estimation principle from the statistical causal inference literature. We also introduce several variance reduction techniques that lead to impressive performance gains in off-policy evaluation. We show empirically that our estimator uniformly wins over existing off-policy evaluation methods across multiple RL environments and various levels of model misspecification. Finally, we further the existing theoretical analysis of estimators for the RL off-policy estimation problem by showing their $O_P(1/sqrt{n})$ rate of convergence and characterizing their asymptotic distribution." @default.
- W2996233539 created "2019-12-26" @default.
- W2996233539 creator A5053658670 @default.
- W2996233539 creator A5055204699 @default.
- W2996233539 creator A5056851835 @default.
- W2996233539 creator A5082325811 @default.
- W2996233539 date "2019-12-13" @default.
- W2996233539 modified "2023-09-24" @default.
- W2996233539 title "More Efficient Off-Policy Evaluation through Regularized Targeted Learning" @default.
- W2996233539 cites W130914654 @default.
- W2996233539 cites W1514587017 @default.
- W2996233539 cites W1515851193 @default.
- W2996233539 cites W1585861384 @default.
- W2996233539 cites W1809653203 @default.
- W2996233539 cites W1971712327 @default.
- W2996233539 cites W2008949192 @default.
- W2996233539 cites W2009187570 @default.
- W2996233539 cites W2014373672 @default.
- W2996233539 cites W2022450888 @default.
- W2996233539 cites W2039811614 @default.
- W2996233539 cites W2091580968 @default.
- W2996233539 cites W2108692343 @default.
- W2996233539 cites W2121506959 @default.
- W2996233539 cites W2137370054 @default.
- W2996233539 cites W2150291618 @default.
- W2996233539 cites W2234859443 @default.
- W2996233539 cites W2275802500 @default.
- W2996233539 cites W2425855276 @default.
- W2996233539 cites W2790990039 @default.
- W2996233539 cites W2914656440 @default.
- W2996233539 cites W2962785510 @default.
- W2996233539 cites W2964297722 @default.
- W2996233539 hasPublicationYear "2019" @default.
- W2996233539 type Work @default.
- W2996233539 sameAs 2996233539 @default.
- W2996233539 citedByCount "3" @default.
- W2996233539 countsByYear W29962335392020 @default.
- W2996233539 countsByYear W29962335392021 @default.
- W2996233539 crossrefType "posted-content" @default.
- W2996233539 hasAuthorship W2996233539A5053658670 @default.
- W2996233539 hasAuthorship W2996233539A5055204699 @default.
- W2996233539 hasAuthorship W2996233539A5056851835 @default.
- W2996233539 hasAuthorship W2996233539A5082325811 @default.
- W2996233539 hasConcept C105795698 @default.
- W2996233539 hasConcept C110121322 @default.
- W2996233539 hasConcept C111335779 @default.
- W2996233539 hasConcept C119857082 @default.
- W2996233539 hasConcept C121955636 @default.
- W2996233539 hasConcept C126255220 @default.
- W2996233539 hasConcept C134306372 @default.
- W2996233539 hasConcept C149782125 @default.
- W2996233539 hasConcept C154945302 @default.
- W2996233539 hasConcept C162324750 @default.
- W2996233539 hasConcept C185429906 @default.
- W2996233539 hasConcept C196083921 @default.
- W2996233539 hasConcept C2524010 @default.
- W2996233539 hasConcept C26517878 @default.
- W2996233539 hasConcept C2776214188 @default.
- W2996233539 hasConcept C2777303404 @default.
- W2996233539 hasConcept C2779436431 @default.
- W2996233539 hasConcept C33923547 @default.
- W2996233539 hasConcept C38652104 @default.
- W2996233539 hasConcept C41008148 @default.
- W2996233539 hasConcept C50522688 @default.
- W2996233539 hasConcept C57869625 @default.
- W2996233539 hasConcept C62644790 @default.
- W2996233539 hasConcept C65778772 @default.
- W2996233539 hasConcept C97541855 @default.
- W2996233539 hasConceptScore W2996233539C105795698 @default.
- W2996233539 hasConceptScore W2996233539C110121322 @default.
- W2996233539 hasConceptScore W2996233539C111335779 @default.
- W2996233539 hasConceptScore W2996233539C119857082 @default.
- W2996233539 hasConceptScore W2996233539C121955636 @default.
- W2996233539 hasConceptScore W2996233539C126255220 @default.
- W2996233539 hasConceptScore W2996233539C134306372 @default.
- W2996233539 hasConceptScore W2996233539C149782125 @default.
- W2996233539 hasConceptScore W2996233539C154945302 @default.
- W2996233539 hasConceptScore W2996233539C162324750 @default.
- W2996233539 hasConceptScore W2996233539C185429906 @default.
- W2996233539 hasConceptScore W2996233539C196083921 @default.
- W2996233539 hasConceptScore W2996233539C2524010 @default.
- W2996233539 hasConceptScore W2996233539C26517878 @default.
- W2996233539 hasConceptScore W2996233539C2776214188 @default.
- W2996233539 hasConceptScore W2996233539C2777303404 @default.
- W2996233539 hasConceptScore W2996233539C2779436431 @default.
- W2996233539 hasConceptScore W2996233539C33923547 @default.
- W2996233539 hasConceptScore W2996233539C38652104 @default.
- W2996233539 hasConceptScore W2996233539C41008148 @default.
- W2996233539 hasConceptScore W2996233539C50522688 @default.
- W2996233539 hasConceptScore W2996233539C57869625 @default.
- W2996233539 hasConceptScore W2996233539C62644790 @default.
- W2996233539 hasConceptScore W2996233539C65778772 @default.
- W2996233539 hasConceptScore W2996233539C97541855 @default.
- W2996233539 hasLocation W29962335391 @default.
- W2996233539 hasOpenAccess W2996233539 @default.
- W2996233539 hasPrimaryLocation W29962335391 @default.
- W2996233539 hasRelatedWork W2107741520 @default.
- W2996233539 hasRelatedWork W2114901408 @default.
- W2996233539 hasRelatedWork W2188353343 @default.
- W2996233539 hasRelatedWork W2460675832 @default.