Matches in SemOpenAlex for { <https://semopenalex.org/work/W4380136160> ?p ?o ?g. }
Showing items 1 to 67 of
67
with 100 items per page.
- W4380136160 abstract "We propose a novel $K$-nearest neighbor resampling procedure for estimating the performance of a policy from historical data containing realized episodes of a decision process generated under a different policy. We focus on feedback policies that depend deterministically on the current state in environments with continuous state-action spaces and system-inherent stochasticity effected by chosen actions. Such settings are common in a wide range of high-stake applications and are actively investigated in the context of stochastic control. Our procedure exploits that similar state/action pairs (in a metric sense) are associated with similar rewards and state transitions. This enables our resampling procedure to tackle the counterfactual estimation problem underlying off-policy evaluation (OPE) by simulating trajectories similarly to Monte Carlo methods. Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization and does not explicitly assume a parametric model for the environment's dynamics. These properties make the proposed resampling algorithm particularly useful for stochastic control environments. We prove that our method is statistically consistent in estimating the performance of a policy in the OPE setting under weak assumptions and for data sets containing entire episodes rather than independent transitions. To establish the consistency, we generalize Stone's Theorem, a well-known result in nonparametric statistics on local averaging, to include episodic data and the counterfactual estimation underlying OPE. Numerical experiments demonstrate the effectiveness of the algorithm in a variety of stochastic control settings including a linear quadratic regulator, trade execution in limit order books and online stochastic bin packing." @default.
- W4380136160 created "2023-06-10" @default.
- W4380136160 creator A5013285444 @default.
- W4380136160 creator A5053452059 @default.
- W4380136160 creator A5086027523 @default.
- W4380136160 date "2023-06-07" @default.
- W4380136160 modified "2023-09-24" @default.
- W4380136160 title "$K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control" @default.
- W4380136160 doi "https://doi.org/10.48550/arxiv.2306.04836" @default.
- W4380136160 hasPublicationYear "2023" @default.
- W4380136160 type Work @default.
- W4380136160 citedByCount "0" @default.
- W4380136160 crossrefType "posted-content" @default.
- W4380136160 hasAuthorship W4380136160A5013285444 @default.
- W4380136160 hasAuthorship W4380136160A5053452059 @default.
- W4380136160 hasAuthorship W4380136160A5086027523 @default.
- W4380136160 hasBestOaLocation W43801361601 @default.
- W4380136160 hasConcept C102366305 @default.
- W4380136160 hasConcept C108650721 @default.
- W4380136160 hasConcept C111472728 @default.
- W4380136160 hasConcept C113238511 @default.
- W4380136160 hasConcept C11413529 @default.
- W4380136160 hasConcept C119857082 @default.
- W4380136160 hasConcept C126255220 @default.
- W4380136160 hasConcept C138885662 @default.
- W4380136160 hasConcept C149782125 @default.
- W4380136160 hasConcept C150921843 @default.
- W4380136160 hasConcept C151730666 @default.
- W4380136160 hasConcept C154945302 @default.
- W4380136160 hasConcept C2776436953 @default.
- W4380136160 hasConcept C2779343474 @default.
- W4380136160 hasConcept C33923547 @default.
- W4380136160 hasConcept C41008148 @default.
- W4380136160 hasConcept C86803240 @default.
- W4380136160 hasConceptScore W4380136160C102366305 @default.
- W4380136160 hasConceptScore W4380136160C108650721 @default.
- W4380136160 hasConceptScore W4380136160C111472728 @default.
- W4380136160 hasConceptScore W4380136160C113238511 @default.
- W4380136160 hasConceptScore W4380136160C11413529 @default.
- W4380136160 hasConceptScore W4380136160C119857082 @default.
- W4380136160 hasConceptScore W4380136160C126255220 @default.
- W4380136160 hasConceptScore W4380136160C138885662 @default.
- W4380136160 hasConceptScore W4380136160C149782125 @default.
- W4380136160 hasConceptScore W4380136160C150921843 @default.
- W4380136160 hasConceptScore W4380136160C151730666 @default.
- W4380136160 hasConceptScore W4380136160C154945302 @default.
- W4380136160 hasConceptScore W4380136160C2776436953 @default.
- W4380136160 hasConceptScore W4380136160C2779343474 @default.
- W4380136160 hasConceptScore W4380136160C33923547 @default.
- W4380136160 hasConceptScore W4380136160C41008148 @default.
- W4380136160 hasConceptScore W4380136160C86803240 @default.
- W4380136160 hasLocation W43801361601 @default.
- W4380136160 hasOpenAccess W4380136160 @default.
- W4380136160 hasPrimaryLocation W43801361601 @default.
- W4380136160 hasRelatedWork W2005061125 @default.
- W4380136160 hasRelatedWork W2062746856 @default.
- W4380136160 hasRelatedWork W2072131684 @default.
- W4380136160 hasRelatedWork W2096696702 @default.
- W4380136160 hasRelatedWork W2357332593 @default.
- W4380136160 hasRelatedWork W2368650154 @default.
- W4380136160 hasRelatedWork W2384698507 @default.
- W4380136160 hasRelatedWork W2608754646 @default.
- W4380136160 hasRelatedWork W2954077720 @default.
- W4380136160 hasRelatedWork W3190162855 @default.
- W4380136160 isParatext "false" @default.
- W4380136160 isRetracted "false" @default.
- W4380136160 workType "article" @default.