Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287115511> ?p ?o ?g. }
Showing items 1 to 83 of
83
with 100 items per page.
- W4287115511 abstract "There have been many recent advances on provably efficient Reinforcement Learning (RL) in problems with rich observation spaces. However, all these works share a strong realizability assumption about the optimal value function of the true MDP. Such realizability assumptions are often too strong to hold in practice. In this work, we consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $Pi$ that may not contain any near-optimal policy. We provide an algorithm for this setting whose error is bounded in terms of the rank $d$ of the underlying MDP. Specifically, our algorithm enjoys a sample complexity bound of $widetilde{O}left((H^{4d} K^{3d} log |Pi|)/epsilon^2right)$ where $H$ is the length of episodes, $K$ is the number of actions and $epsilon>0$ is the desired sub-optimality. We also provide a nearly matching lower bound for this agnostic setting that shows that the exponential dependence on rank is unavoidable, without further assumptions." @default.
- W4287115511 created "2022-07-25" @default.
- W4287115511 creator A5014637159 @default.
- W4287115511 creator A5015966835 @default.
- W4287115511 creator A5051488979 @default.
- W4287115511 creator A5058849006 @default.
- W4287115511 creator A5084265089 @default.
- W4287115511 date "2021-06-21" @default.
- W4287115511 modified "2023-09-26" @default.
- W4287115511 title "Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations" @default.
- W4287115511 doi "https://doi.org/10.48550/arxiv.2106.11519" @default.
- W4287115511 hasPublicationYear "2021" @default.
- W4287115511 type Work @default.
- W4287115511 citedByCount "0" @default.
- W4287115511 crossrefType "posted-content" @default.
- W4287115511 hasAuthorship W4287115511A5014637159 @default.
- W4287115511 hasAuthorship W4287115511A5015966835 @default.
- W4287115511 hasAuthorship W4287115511A5051488979 @default.
- W4287115511 hasAuthorship W4287115511A5058849006 @default.
- W4287115511 hasAuthorship W4287115511A5084265089 @default.
- W4287115511 hasBestOaLocation W42871155111 @default.
- W4287115511 hasConcept C105795698 @default.
- W4287115511 hasConcept C106189395 @default.
- W4287115511 hasConcept C11413529 @default.
- W4287115511 hasConcept C114614502 @default.
- W4287115511 hasConcept C118615104 @default.
- W4287115511 hasConcept C126255220 @default.
- W4287115511 hasConcept C134306372 @default.
- W4287115511 hasConcept C14036430 @default.
- W4287115511 hasConcept C14646407 @default.
- W4287115511 hasConcept C154945302 @default.
- W4287115511 hasConcept C159886148 @default.
- W4287115511 hasConcept C164226766 @default.
- W4287115511 hasConcept C165064840 @default.
- W4287115511 hasConcept C2776378722 @default.
- W4287115511 hasConcept C2777212361 @default.
- W4287115511 hasConcept C2778445095 @default.
- W4287115511 hasConcept C33923547 @default.
- W4287115511 hasConcept C34388435 @default.
- W4287115511 hasConcept C41008148 @default.
- W4287115511 hasConcept C77553402 @default.
- W4287115511 hasConcept C78458016 @default.
- W4287115511 hasConcept C86803240 @default.
- W4287115511 hasConcept C97541855 @default.
- W4287115511 hasConceptScore W4287115511C105795698 @default.
- W4287115511 hasConceptScore W4287115511C106189395 @default.
- W4287115511 hasConceptScore W4287115511C11413529 @default.
- W4287115511 hasConceptScore W4287115511C114614502 @default.
- W4287115511 hasConceptScore W4287115511C118615104 @default.
- W4287115511 hasConceptScore W4287115511C126255220 @default.
- W4287115511 hasConceptScore W4287115511C134306372 @default.
- W4287115511 hasConceptScore W4287115511C14036430 @default.
- W4287115511 hasConceptScore W4287115511C14646407 @default.
- W4287115511 hasConceptScore W4287115511C154945302 @default.
- W4287115511 hasConceptScore W4287115511C159886148 @default.
- W4287115511 hasConceptScore W4287115511C164226766 @default.
- W4287115511 hasConceptScore W4287115511C165064840 @default.
- W4287115511 hasConceptScore W4287115511C2776378722 @default.
- W4287115511 hasConceptScore W4287115511C2777212361 @default.
- W4287115511 hasConceptScore W4287115511C2778445095 @default.
- W4287115511 hasConceptScore W4287115511C33923547 @default.
- W4287115511 hasConceptScore W4287115511C34388435 @default.
- W4287115511 hasConceptScore W4287115511C41008148 @default.
- W4287115511 hasConceptScore W4287115511C77553402 @default.
- W4287115511 hasConceptScore W4287115511C78458016 @default.
- W4287115511 hasConceptScore W4287115511C86803240 @default.
- W4287115511 hasConceptScore W4287115511C97541855 @default.
- W4287115511 hasLocation W42871155111 @default.
- W4287115511 hasOpenAccess W4287115511 @default.
- W4287115511 hasPrimaryLocation W42871155111 @default.
- W4287115511 hasRelatedWork W1526654727 @default.
- W4287115511 hasRelatedWork W1953057174 @default.
- W4287115511 hasRelatedWork W2039439610 @default.
- W4287115511 hasRelatedWork W2188619031 @default.
- W4287115511 hasRelatedWork W2964299116 @default.
- W4287115511 hasRelatedWork W3109669325 @default.
- W4287115511 hasRelatedWork W3139405606 @default.
- W4287115511 hasRelatedWork W3177328134 @default.
- W4287115511 hasRelatedWork W3188220908 @default.
- W4287115511 hasRelatedWork W4221153283 @default.
- W4287115511 isParatext "false" @default.
- W4287115511 isRetracted "false" @default.
- W4287115511 workType "article" @default.