Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287272593> ?p ?o ?g. }
Showing items 1 to 66 of
66
with 100 items per page.
- W4287272593 abstract "We study the statistical theory of offline reinforcement learning (RL) with deep ReLU network function approximation. We analyze a variant of fitted-Q iteration (FQI) algorithm under a new dynamic condition that we call Besov dynamic closure, which encompasses the conditions from prior analyses for deep neural network function approximation. Under Besov dynamic closure, we prove that the FQI-type algorithm enjoys the sample complexity of $tilde{mathcal{O}}left( kappa^{1 + d/alpha} cdot epsilon^{-2 - 2d/alpha} right)$ where $kappa$ is a distribution shift measure, $d$ is the dimensionality of the state-action space, $alpha$ is the (possibly fractional) smoothness parameter of the underlying MDP, and $epsilon$ is a user-specified precision. This is an improvement over the sample complexity of $tilde{mathcal{O}}left( K cdot kappa^{2 + d/alpha} cdot epsilon^{-2 - d/alpha} right)$ in the prior result [Yang et al., 2019] where $K$ is an algorithmic iteration number which is arbitrarily large in practice. Importantly, our sample complexity is obtained under the new general dynamic condition and a data-dependent structure where the latter is either ignored in prior algorithms or improperly handled by prior analyses. This is the first comprehensive analysis for offline RL with deep ReLU network function approximation under a general setting." @default.
- W4287272593 created "2022-07-25" @default.
- W4287272593 creator A5045540854 @default.
- W4287272593 creator A5067185856 @default.
- W4287272593 creator A5074017383 @default.
- W4287272593 creator A5086423509 @default.
- W4287272593 date "2021-03-11" @default.
- W4287272593 modified "2023-09-30" @default.
- W4287272593 title "Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks" @default.
- W4287272593 hasPublicationYear "2021" @default.
- W4287272593 type Work @default.
- W4287272593 citedByCount "0" @default.
- W4287272593 crossrefType "posted-content" @default.
- W4287272593 hasAuthorship W4287272593A5045540854 @default.
- W4287272593 hasAuthorship W4287272593A5067185856 @default.
- W4287272593 hasAuthorship W4287272593A5074017383 @default.
- W4287272593 hasAuthorship W4287272593A5086423509 @default.
- W4287272593 hasBestOaLocation W42872725931 @default.
- W4287272593 hasConcept C102634674 @default.
- W4287272593 hasConcept C11413529 @default.
- W4287272593 hasConcept C114614502 @default.
- W4287272593 hasConcept C118615104 @default.
- W4287272593 hasConcept C134306372 @default.
- W4287272593 hasConcept C14036430 @default.
- W4287272593 hasConcept C146834321 @default.
- W4287272593 hasConcept C154945302 @default.
- W4287272593 hasConcept C162324750 @default.
- W4287272593 hasConcept C2778445095 @default.
- W4287272593 hasConcept C33923547 @default.
- W4287272593 hasConcept C34447519 @default.
- W4287272593 hasConcept C41008148 @default.
- W4287272593 hasConcept C78458016 @default.
- W4287272593 hasConcept C86803240 @default.
- W4287272593 hasConcept C97541855 @default.
- W4287272593 hasConceptScore W4287272593C102634674 @default.
- W4287272593 hasConceptScore W4287272593C11413529 @default.
- W4287272593 hasConceptScore W4287272593C114614502 @default.
- W4287272593 hasConceptScore W4287272593C118615104 @default.
- W4287272593 hasConceptScore W4287272593C134306372 @default.
- W4287272593 hasConceptScore W4287272593C14036430 @default.
- W4287272593 hasConceptScore W4287272593C146834321 @default.
- W4287272593 hasConceptScore W4287272593C154945302 @default.
- W4287272593 hasConceptScore W4287272593C162324750 @default.
- W4287272593 hasConceptScore W4287272593C2778445095 @default.
- W4287272593 hasConceptScore W4287272593C33923547 @default.
- W4287272593 hasConceptScore W4287272593C34447519 @default.
- W4287272593 hasConceptScore W4287272593C41008148 @default.
- W4287272593 hasConceptScore W4287272593C78458016 @default.
- W4287272593 hasConceptScore W4287272593C86803240 @default.
- W4287272593 hasConceptScore W4287272593C97541855 @default.
- W4287272593 hasLocation W42872725931 @default.
- W4287272593 hasOpenAccess W4287272593 @default.
- W4287272593 hasPrimaryLocation W42872725931 @default.
- W4287272593 hasRelatedWork W14287752 @default.
- W4287272593 hasRelatedWork W19752829 @default.
- W4287272593 hasRelatedWork W21453999 @default.
- W4287272593 hasRelatedWork W28153534 @default.
- W4287272593 hasRelatedWork W2955827 @default.
- W4287272593 hasRelatedWork W30872231 @default.
- W4287272593 hasRelatedWork W33079676 @default.
- W4287272593 hasRelatedWork W4934480 @default.
- W4287272593 hasRelatedWork W6075650 @default.
- W4287272593 hasRelatedWork W17676366 @default.
- W4287272593 isParatext "false" @default.
- W4287272593 isRetracted "false" @default.
- W4287272593 workType "article" @default.