Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387687130> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W4387687130 abstract "We settle the sample complexity of policy learning for the maximization of the long run average reward associated with a uniformly ergodic Markov decision process (MDP), assuming a generative model. In this context, the existing literature provides a sample complexity upper bound of $widetilde O(|S||A|t_{text{mix}}^2 epsilon^{-2})$ and a lower bound of $Omega(|S||A|t_{text{mix}} epsilon^{-2})$. In these expressions, $|S|$ and $|A|$ denote the cardinalities of the state and action spaces respectively, $t_{text{mix}}$ serves as a uniform upper limit for the total variation mixing times, and $epsilon$ signifies the error tolerance. Therefore, a notable gap of $t_{text{mix}}$ still remains to be bridged. Our primary contribution is to establish an estimator for the optimal policy of average reward MDPs with a sample complexity of $widetilde O(|S||A|t_{text{mix}}epsilon^{-2})$, effectively reaching the lower bound in the literature. This is achieved by combining algorithmic ideas in Jin and Sidford (2021) with those of Li et al. (2020)." @default.
- W4387687130 created "2023-10-17" @default.
- W4387687130 creator A5011147039 @default.
- W4387687130 creator A5019114258 @default.
- W4387687130 creator A5076740460 @default.
- W4387687130 date "2023-10-12" @default.
- W4387687130 modified "2023-10-18" @default.
- W4387687130 title "Optimal Sample Complexity for Average Reward Markov Decision Processes" @default.
- W4387687130 doi "https://doi.org/10.48550/arxiv.2310.08833" @default.
- W4387687130 hasPublicationYear "2023" @default.
- W4387687130 type Work @default.
- W4387687130 citedByCount "0" @default.
- W4387687130 crossrefType "posted-content" @default.
- W4387687130 hasAuthorship W4387687130A5011147039 @default.
- W4387687130 hasAuthorship W4387687130A5019114258 @default.
- W4387687130 hasAuthorship W4387687130A5076740460 @default.
- W4387687130 hasBestOaLocation W43876871301 @default.
- W4387687130 hasConcept C105795698 @default.
- W4387687130 hasConcept C106189395 @default.
- W4387687130 hasConcept C114614502 @default.
- W4387687130 hasConcept C121332964 @default.
- W4387687130 hasConcept C122044880 @default.
- W4387687130 hasConcept C126255220 @default.
- W4387687130 hasConcept C134306372 @default.
- W4387687130 hasConcept C138777275 @default.
- W4387687130 hasConcept C138885662 @default.
- W4387687130 hasConcept C144237770 @default.
- W4387687130 hasConcept C151730666 @default.
- W4387687130 hasConcept C154945302 @default.
- W4387687130 hasConcept C159886148 @default.
- W4387687130 hasConcept C185429906 @default.
- W4387687130 hasConcept C202444582 @default.
- W4387687130 hasConcept C2776330181 @default.
- W4387687130 hasConcept C2778445095 @default.
- W4387687130 hasConcept C2779343474 @default.
- W4387687130 hasConcept C2779557605 @default.
- W4387687130 hasConcept C33923547 @default.
- W4387687130 hasConcept C39890363 @default.
- W4387687130 hasConcept C41008148 @default.
- W4387687130 hasConcept C41895202 @default.
- W4387687130 hasConcept C62520636 @default.
- W4387687130 hasConcept C77553402 @default.
- W4387687130 hasConcept C86803240 @default.
- W4387687130 hasConcept C98763669 @default.
- W4387687130 hasConceptScore W4387687130C105795698 @default.
- W4387687130 hasConceptScore W4387687130C106189395 @default.
- W4387687130 hasConceptScore W4387687130C114614502 @default.
- W4387687130 hasConceptScore W4387687130C121332964 @default.
- W4387687130 hasConceptScore W4387687130C122044880 @default.
- W4387687130 hasConceptScore W4387687130C126255220 @default.
- W4387687130 hasConceptScore W4387687130C134306372 @default.
- W4387687130 hasConceptScore W4387687130C138777275 @default.
- W4387687130 hasConceptScore W4387687130C138885662 @default.
- W4387687130 hasConceptScore W4387687130C144237770 @default.
- W4387687130 hasConceptScore W4387687130C151730666 @default.
- W4387687130 hasConceptScore W4387687130C154945302 @default.
- W4387687130 hasConceptScore W4387687130C159886148 @default.
- W4387687130 hasConceptScore W4387687130C185429906 @default.
- W4387687130 hasConceptScore W4387687130C202444582 @default.
- W4387687130 hasConceptScore W4387687130C2776330181 @default.
- W4387687130 hasConceptScore W4387687130C2778445095 @default.
- W4387687130 hasConceptScore W4387687130C2779343474 @default.
- W4387687130 hasConceptScore W4387687130C2779557605 @default.
- W4387687130 hasConceptScore W4387687130C33923547 @default.
- W4387687130 hasConceptScore W4387687130C39890363 @default.
- W4387687130 hasConceptScore W4387687130C41008148 @default.
- W4387687130 hasConceptScore W4387687130C41895202 @default.
- W4387687130 hasConceptScore W4387687130C62520636 @default.
- W4387687130 hasConceptScore W4387687130C77553402 @default.
- W4387687130 hasConceptScore W4387687130C86803240 @default.
- W4387687130 hasConceptScore W4387687130C98763669 @default.
- W4387687130 hasLocation W43876871301 @default.
- W4387687130 hasOpenAccess W4387687130 @default.
- W4387687130 hasPrimaryLocation W43876871301 @default.
- W4387687130 hasRelatedWork W2044004505 @default.
- W4387687130 hasRelatedWork W2097562045 @default.
- W4387687130 hasRelatedWork W2341040961 @default.
- W4387687130 hasRelatedWork W2390585021 @default.
- W4387687130 hasRelatedWork W2779828239 @default.
- W4387687130 hasRelatedWork W2951177262 @default.
- W4387687130 hasRelatedWork W3154976382 @default.
- W4387687130 hasRelatedWork W4287207389 @default.
- W4387687130 hasRelatedWork W4292101436 @default.
- W4387687130 hasRelatedWork W50969306 @default.
- W4387687130 isParatext "false" @default.
- W4387687130 isRetracted "false" @default.
- W4387687130 workType "article" @default.