Matches in SemOpenAlex for { <https://semopenalex.org/work/W3094261094> ?p ?o ?g. }
- W3094261094 abstract "Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation methods (to deal with the curse of dimensionality) can provide a means to help alleviate the excessive sample complexity burden in modern sequential decision making problems. However, the extent to which this broader approach can be effective is not well understood, where the literature largely consists of sufficient conditions. This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning. Perhaps surprisingly, our main result shows that even if: i) we have realizability in that the true value function of emph{every} policy is linear in a given set of features and 2) our off-policy data has good coverage over all features (under a strong spectral condition), then any algorithm still (information-theoretically) requires a number of offline samples that is exponential in the problem horizon in order to non-trivially estimate the value of emph{any} given policy. Our results highlight that sample-efficient offline policy evaluation is simply not possible unless significantly stronger conditions hold; such conditions include either having low distribution shift (where the offline data distribution is close to the distribution of the policy to be evaluated) or significantly stronger representational conditions (beyond realizability)." @default.
- W3094261094 created "2020-10-29" @default.
- W3094261094 creator A5002177720 @default.
- W3094261094 creator A5018792915 @default.
- W3094261094 creator A5062744744 @default.
- W3094261094 date "2020-10-22" @default.
- W3094261094 modified "2023-09-26" @default.
- W3094261094 title "What are the Statistical Limits of Offline RL with Linear Function Approximation?" @default.
- W3094261094 cites W107583932 @default.
- W3094261094 cites W1506832649 @default.
- W3094261094 cites W1514587017 @default.
- W3094261094 cites W1542886316 @default.
- W3094261094 cites W1547105496 @default.
- W3094261094 cites W1550698229 @default.
- W3094261094 cites W1553598118 @default.
- W3094261094 cites W1730555343 @default.
- W3094261094 cites W1809653203 @default.
- W3094261094 cites W191658262 @default.
- W3094261094 cites W1941445455 @default.
- W3094261094 cites W2002844873 @default.
- W3094261094 cites W2052044664 @default.
- W3094261094 cites W2101533993 @default.
- W3094261094 cites W2104335146 @default.
- W3094261094 cites W2104753538 @default.
- W3094261094 cites W2117355432 @default.
- W3094261094 cites W2121506959 @default.
- W3094261094 cites W2122689259 @default.
- W3094261094 cites W2130005627 @default.
- W3094261094 cites W2132876566 @default.
- W3094261094 cites W2142409110 @default.
- W3094261094 cites W2145339207 @default.
- W3094261094 cites W2251007294 @default.
- W3094261094 cites W2273088453 @default.
- W3094261094 cites W2396213570 @default.
- W3094261094 cites W2594640072 @default.
- W3094261094 cites W2739559388 @default.
- W3094261094 cites W2751325639 @default.
- W3094261094 cites W2806905826 @default.
- W3094261094 cites W2890022552 @default.
- W3094261094 cites W2962785510 @default.
- W3094261094 cites W2962785728 @default.
- W3094261094 cites W2962802563 @default.
- W3094261094 cites W2963323139 @default.
- W3094261094 cites W2963561234 @default.
- W3094261094 cites W2963670858 @default.
- W3094261094 cites W2963704132 @default.
- W3094261094 cites W2963971282 @default.
- W3094261094 cites W2964068481 @default.
- W3094261094 cites W2964089577 @default.
- W3094261094 cites W2964297722 @default.
- W3094261094 cites W2964349150 @default.
- W3094261094 cites W2971165405 @default.
- W3094261094 cites W2981182167 @default.
- W3094261094 cites W2981972696 @default.
- W3094261094 cites W2991522342 @default.
- W3094261094 cites W2991598122 @default.
- W3094261094 cites W2991641017 @default.
- W3094261094 cites W2994709386 @default.
- W3094261094 cites W2995638039 @default.
- W3094261094 cites W3005407492 @default.
- W3094261094 cites W3009962997 @default.
- W3094261094 cites W3012148463 @default.
- W3094261094 cites W3017586084 @default.
- W3094261094 cites W3034607397 @default.
- W3094261094 cites W3036498527 @default.
- W3094261094 cites W3048911479 @default.
- W3094261094 cites W3049166411 @default.
- W3094261094 cites W3049328922 @default.
- W3094261094 cites W3085521361 @default.
- W3094261094 cites W3091826069 @default.
- W3094261094 cites W3093206925 @default.
- W3094261094 doi "https://doi.org/10.48550/arxiv.2010.11895" @default.
- W3094261094 hasPublicationYear "2020" @default.
- W3094261094 type Work @default.
- W3094261094 sameAs 3094261094 @default.
- W3094261094 citedByCount "27" @default.
- W3094261094 countsByYear W30942610942020 @default.
- W3094261094 countsByYear W30942610942021 @default.
- W3094261094 crossrefType "posted-content" @default.
- W3094261094 hasAuthorship W3094261094A5002177720 @default.
- W3094261094 hasAuthorship W3094261094A5018792915 @default.
- W3094261094 hasAuthorship W3094261094A5062744744 @default.
- W3094261094 hasBestOaLocation W30942610941 @default.
- W3094261094 hasConcept C111030470 @default.
- W3094261094 hasConcept C111919701 @default.
- W3094261094 hasConcept C11413529 @default.
- W3094261094 hasConcept C119857082 @default.
- W3094261094 hasConcept C126255220 @default.
- W3094261094 hasConcept C136764020 @default.
- W3094261094 hasConcept C14036430 @default.
- W3094261094 hasConcept C14646407 @default.
- W3094261094 hasConcept C154945302 @default.
- W3094261094 hasConcept C177264268 @default.
- W3094261094 hasConcept C185592680 @default.
- W3094261094 hasConcept C196340769 @default.
- W3094261094 hasConcept C198531522 @default.
- W3094261094 hasConcept C199360897 @default.
- W3094261094 hasConcept C2776291640 @default.
- W3094261094 hasConcept C2776378722 @default.
- W3094261094 hasConcept C2780102126 @default.