Matches in SemOpenAlex for { <https://semopenalex.org/work/W4310997710> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4310997710 abstract "In off-policy reinforcement learning, a behaviour policy performs exploratory interactions with the environment to obtain state-action-reward samples which are then used to learn a target policy that optimises the expected return. This leads to a problem of off-policy evaluation, where one needs to evaluate the target policy from samples collected by the often unrelated behaviour policy. Importance sampling is a traditional statistical technique that is often applied to off-policy evaluation. While importance sampling estimators are unbiased, their variance increases exponentially with the horizon of the decision process due to computing the importance weight as a product of action probability ratios, yielding estimates with low accuracy for domains involving long-term planning. This paper proposes state-based importance sampling (SIS), which drops the action probability ratios of sub-trajectories with neglible states -- roughly speaking, those for which the chosen actions have no impact on the return estimate -- from the computation of the importance weight. Theoretical results show that this results in a reduction of the exponent in the variance upper bound as well as improving the mean squared error. An automated search algorithm based on covariance testing is proposed to identify a negligible state set which has minimal MSE when performing state-based importance sampling. Experiments are conducted on a lift domain, which include lift states where the action has no impact on the following state and reward. The results demonstrate that using the search algorithm, SIS yields reduced variance and improved accuracy compared to traditional importance sampling, per-decision importance sampling, and incremental importance sampling." @default.
- W4310997710 created "2022-12-22" @default.
- W4310997710 creator A5011218628 @default.
- W4310997710 creator A5042047114 @default.
- W4310997710 date "2022-12-07" @default.
- W4310997710 modified "2023-09-30" @default.
- W4310997710 title "Low Variance Off-policy Evaluation with State-based Importance Sampling" @default.
- W4310997710 doi "https://doi.org/10.48550/arxiv.2212.03932" @default.
- W4310997710 hasPublicationYear "2022" @default.
- W4310997710 type Work @default.
- W4310997710 citedByCount "0" @default.
- W4310997710 crossrefType "posted-content" @default.
- W4310997710 hasAuthorship W4310997710A5011218628 @default.
- W4310997710 hasAuthorship W4310997710A5042047114 @default.
- W4310997710 hasBestOaLocation W43109977101 @default.
- W4310997710 hasConcept C105795698 @default.
- W4310997710 hasConcept C106131492 @default.
- W4310997710 hasConcept C119857082 @default.
- W4310997710 hasConcept C121955636 @default.
- W4310997710 hasConcept C126255220 @default.
- W4310997710 hasConcept C139002025 @default.
- W4310997710 hasConcept C139945424 @default.
- W4310997710 hasConcept C140779682 @default.
- W4310997710 hasConcept C149782125 @default.
- W4310997710 hasConcept C154945302 @default.
- W4310997710 hasConcept C162324750 @default.
- W4310997710 hasConcept C178650346 @default.
- W4310997710 hasConcept C185429906 @default.
- W4310997710 hasConcept C19499675 @default.
- W4310997710 hasConcept C196083921 @default.
- W4310997710 hasConcept C31972630 @default.
- W4310997710 hasConcept C33923547 @default.
- W4310997710 hasConcept C41008148 @default.
- W4310997710 hasConcept C52740198 @default.
- W4310997710 hasConcept C62644790 @default.
- W4310997710 hasConcept C97541855 @default.
- W4310997710 hasConceptScore W4310997710C105795698 @default.
- W4310997710 hasConceptScore W4310997710C106131492 @default.
- W4310997710 hasConceptScore W4310997710C119857082 @default.
- W4310997710 hasConceptScore W4310997710C121955636 @default.
- W4310997710 hasConceptScore W4310997710C126255220 @default.
- W4310997710 hasConceptScore W4310997710C139002025 @default.
- W4310997710 hasConceptScore W4310997710C139945424 @default.
- W4310997710 hasConceptScore W4310997710C140779682 @default.
- W4310997710 hasConceptScore W4310997710C149782125 @default.
- W4310997710 hasConceptScore W4310997710C154945302 @default.
- W4310997710 hasConceptScore W4310997710C162324750 @default.
- W4310997710 hasConceptScore W4310997710C178650346 @default.
- W4310997710 hasConceptScore W4310997710C185429906 @default.
- W4310997710 hasConceptScore W4310997710C19499675 @default.
- W4310997710 hasConceptScore W4310997710C196083921 @default.
- W4310997710 hasConceptScore W4310997710C31972630 @default.
- W4310997710 hasConceptScore W4310997710C33923547 @default.
- W4310997710 hasConceptScore W4310997710C41008148 @default.
- W4310997710 hasConceptScore W4310997710C52740198 @default.
- W4310997710 hasConceptScore W4310997710C62644790 @default.
- W4310997710 hasConceptScore W4310997710C97541855 @default.
- W4310997710 hasLocation W43109977101 @default.
- W4310997710 hasOpenAccess W4310997710 @default.
- W4310997710 hasPrimaryLocation W43109977101 @default.
- W4310997710 hasRelatedWork W2002748013 @default.
- W4310997710 hasRelatedWork W2017052259 @default.
- W4310997710 hasRelatedWork W2031492799 @default.
- W4310997710 hasRelatedWork W2167006297 @default.
- W4310997710 hasRelatedWork W2313211764 @default.
- W4310997710 hasRelatedWork W2781092925 @default.
- W4310997710 hasRelatedWork W2980964960 @default.
- W4310997710 hasRelatedWork W3034780828 @default.
- W4310997710 hasRelatedWork W3123644762 @default.
- W4310997710 hasRelatedWork W3124567442 @default.
- W4310997710 isParatext "false" @default.
- W4310997710 isRetracted "false" @default.
- W4310997710 workType "article" @default.