Matches in SemOpenAlex for { <https://semopenalex.org/work/W3090442079> ?p ?o ?g. }
Showing items 1 to 97 of
97
with 100 items per page.
- W3090442079 endingPage "585" @default.
- W3090442079 startingPage "566" @default.
- W3090442079 abstract "Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, and artificial intelligence, as well as communication and transportation networks. In many of these cases, such as in the policy evaluation problem encountered in reinforcement learning, the goal is to estimate the long-term value function of such a process without access to the underlying population transition and reward functions. Working with samples generated under the synchronous model, we study the problem of estimating the value function of an infinite-horizon discounted MRP with finite state space in the ℓ <sub xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>∞</sub> -norm. We analyze both the standard plug-in approach to this problem and a more robust variant, and establish non-asymptotic bounds that depend on the (unknown) problem instance, as well as data-dependent bounds that can be evaluated based on the observations of state-transitions and rewards. We show that these approaches are minimax-optimal up to constant factors over natural sub-classes of MRPs. Our analysis makes use of a leave-one-out decoupling argument tailored to the policy evaluation problem, one which may be of independent interest." @default.
- W3090442079 created "2020-10-08" @default.
- W3090442079 creator A5038379562 @default.
- W3090442079 creator A5085869933 @default.
- W3090442079 date "2021-01-01" @default.
- W3090442079 modified "2023-09-30" @default.
- W3090442079 title "Instance-Dependent ℓ<sub>∞</sub>-Bounds for Policy Evaluation in Tabular Reinforcement Learning" @default.
- W3090442079 cites W1969276875 @default.
- W3090442079 cites W1969589598 @default.
- W3090442079 cites W1989151402 @default.
- W3090442079 cites W1992208280 @default.
- W3090442079 cites W2071983464 @default.
- W3090442079 cites W2072931156 @default.
- W3090442079 cites W2075672181 @default.
- W3090442079 cites W2080631849 @default.
- W3090442079 cites W2086161653 @default.
- W3090442079 cites W2093638026 @default.
- W3090442079 cites W2095487261 @default.
- W3090442079 cites W2098152875 @default.
- W3090442079 cites W2120339885 @default.
- W3090442079 cites W2120678009 @default.
- W3090442079 cites W2132351269 @default.
- W3090442079 cites W2165131254 @default.
- W3090442079 cites W2328571519 @default.
- W3090442079 cites W2594128370 @default.
- W3090442079 cites W3010140118 @default.
- W3090442079 cites W3012549477 @default.
- W3090442079 cites W3041202696 @default.
- W3090442079 cites W3098956361 @default.
- W3090442079 cites W3099973750 @default.
- W3090442079 cites W3111890340 @default.
- W3090442079 cites W391578156 @default.
- W3090442079 cites W4211030719 @default.
- W3090442079 doi "https://doi.org/10.1109/tit.2020.3027316" @default.
- W3090442079 hasPublicationYear "2021" @default.
- W3090442079 type Work @default.
- W3090442079 sameAs 3090442079 @default.
- W3090442079 citedByCount "19" @default.
- W3090442079 countsByYear W30904420792020 @default.
- W3090442079 countsByYear W30904420792021 @default.
- W3090442079 countsByYear W30904420792022 @default.
- W3090442079 countsByYear W30904420792023 @default.
- W3090442079 crossrefType "journal-article" @default.
- W3090442079 hasAuthorship W3090442079A5038379562 @default.
- W3090442079 hasAuthorship W3090442079A5085869933 @default.
- W3090442079 hasBestOaLocation W30904420791 @default.
- W3090442079 hasConcept C105795698 @default.
- W3090442079 hasConcept C106189395 @default.
- W3090442079 hasConcept C126255220 @default.
- W3090442079 hasConcept C144024400 @default.
- W3090442079 hasConcept C14646407 @default.
- W3090442079 hasConcept C149728462 @default.
- W3090442079 hasConcept C149923435 @default.
- W3090442079 hasConcept C154945302 @default.
- W3090442079 hasConcept C159886148 @default.
- W3090442079 hasConcept C188116033 @default.
- W3090442079 hasConcept C2908647359 @default.
- W3090442079 hasConcept C33923547 @default.
- W3090442079 hasConcept C41008148 @default.
- W3090442079 hasConcept C97541855 @default.
- W3090442079 hasConceptScore W3090442079C105795698 @default.
- W3090442079 hasConceptScore W3090442079C106189395 @default.
- W3090442079 hasConceptScore W3090442079C126255220 @default.
- W3090442079 hasConceptScore W3090442079C144024400 @default.
- W3090442079 hasConceptScore W3090442079C14646407 @default.
- W3090442079 hasConceptScore W3090442079C149728462 @default.
- W3090442079 hasConceptScore W3090442079C149923435 @default.
- W3090442079 hasConceptScore W3090442079C154945302 @default.
- W3090442079 hasConceptScore W3090442079C159886148 @default.
- W3090442079 hasConceptScore W3090442079C188116033 @default.
- W3090442079 hasConceptScore W3090442079C2908647359 @default.
- W3090442079 hasConceptScore W3090442079C33923547 @default.
- W3090442079 hasConceptScore W3090442079C41008148 @default.
- W3090442079 hasConceptScore W3090442079C97541855 @default.
- W3090442079 hasFunder F4320306076 @default.
- W3090442079 hasFunder F4320337345 @default.
- W3090442079 hasIssue "1" @default.
- W3090442079 hasLocation W30904420791 @default.
- W3090442079 hasOpenAccess W3090442079 @default.
- W3090442079 hasPrimaryLocation W30904420791 @default.
- W3090442079 hasRelatedWork W1511927616 @default.
- W3090442079 hasRelatedWork W1556532828 @default.
- W3090442079 hasRelatedWork W1985560493 @default.
- W3090442079 hasRelatedWork W2089415692 @default.
- W3090442079 hasRelatedWork W2114876262 @default.
- W3090442079 hasRelatedWork W2182304831 @default.
- W3090442079 hasRelatedWork W2386410636 @default.
- W3090442079 hasRelatedWork W2937181779 @default.
- W3090442079 hasRelatedWork W3115089987 @default.
- W3090442079 hasRelatedWork W4308702637 @default.
- W3090442079 hasVolume "67" @default.
- W3090442079 isParatext "false" @default.
- W3090442079 isRetracted "false" @default.
- W3090442079 magId "3090442079" @default.
- W3090442079 workType "article" @default.