Matches in SemOpenAlex for { <https://semopenalex.org/work/W4288351606> ?p ?o ?g. }
Showing items 1 to 77 of
77
with 100 items per page.
- W4288351606 abstract "Motivated by the study of $Q$-learning algorithms in reinforcement learning, we study a class of stochastic approximation procedures based on operators that satisfy monotonicity and quasi-contractivity conditions with respect to an underlying cone. We prove a general sandwich relation on the iterate error at each time, and use it to derive non-asymptotic bounds on the error in terms of a cone-induced gauge norm. These results are derived within a deterministic framework, requiring no assumptions on the noise. We illustrate these general bounds in application to synchronous $Q$-learning for discounted Markov decision processes with discrete state-action spaces, in particular by deriving non-asymptotic bounds on the $ell_infty$-norm for a range of stepsizes. These results are the sharpest known to date, and we show via simulation that the dependence of our bounds cannot be improved in a worst-case sense. These results show that relative to a model-based $Q$-iteration, the $ell_infty$-based sample complexity of $Q$-learning is suboptimal in terms of the discount factor $gamma$." @default.
- W4288351606 created "2022-07-29" @default.
- W4288351606 creator A5038379562 @default.
- W4288351606 date "2019-05-15" @default.
- W4288351606 modified "2023-09-26" @default.
- W4288351606 title "Stochastic approximation with cone-contractive operators: Sharp $ell_infty$-bounds for $Q$-learning" @default.
- W4288351606 doi "https://doi.org/10.48550/arxiv.1905.06265" @default.
- W4288351606 hasPublicationYear "2019" @default.
- W4288351606 type Work @default.
- W4288351606 citedByCount "0" @default.
- W4288351606 crossrefType "posted-content" @default.
- W4288351606 hasAuthorship W4288351606A5038379562 @default.
- W4288351606 hasBestOaLocation W42883516061 @default.
- W4288351606 hasConcept C105795698 @default.
- W4288351606 hasConcept C106189395 @default.
- W4288351606 hasConcept C11413529 @default.
- W4288351606 hasConcept C118615104 @default.
- W4288351606 hasConcept C134306372 @default.
- W4288351606 hasConcept C154945302 @default.
- W4288351606 hasConcept C159886148 @default.
- W4288351606 hasConcept C159985019 @default.
- W4288351606 hasConcept C17744445 @default.
- W4288351606 hasConcept C188116033 @default.
- W4288351606 hasConcept C191795146 @default.
- W4288351606 hasConcept C192562407 @default.
- W4288351606 hasConcept C199539241 @default.
- W4288351606 hasConcept C202444582 @default.
- W4288351606 hasConcept C204323151 @default.
- W4288351606 hasConcept C2778445095 @default.
- W4288351606 hasConcept C28826006 @default.
- W4288351606 hasConcept C30014739 @default.
- W4288351606 hasConcept C33923547 @default.
- W4288351606 hasConcept C41008148 @default.
- W4288351606 hasConcept C62799726 @default.
- W4288351606 hasConcept C72169020 @default.
- W4288351606 hasConcept C97541855 @default.
- W4288351606 hasConcept C98763669 @default.
- W4288351606 hasConceptScore W4288351606C105795698 @default.
- W4288351606 hasConceptScore W4288351606C106189395 @default.
- W4288351606 hasConceptScore W4288351606C11413529 @default.
- W4288351606 hasConceptScore W4288351606C118615104 @default.
- W4288351606 hasConceptScore W4288351606C134306372 @default.
- W4288351606 hasConceptScore W4288351606C154945302 @default.
- W4288351606 hasConceptScore W4288351606C159886148 @default.
- W4288351606 hasConceptScore W4288351606C159985019 @default.
- W4288351606 hasConceptScore W4288351606C17744445 @default.
- W4288351606 hasConceptScore W4288351606C188116033 @default.
- W4288351606 hasConceptScore W4288351606C191795146 @default.
- W4288351606 hasConceptScore W4288351606C192562407 @default.
- W4288351606 hasConceptScore W4288351606C199539241 @default.
- W4288351606 hasConceptScore W4288351606C202444582 @default.
- W4288351606 hasConceptScore W4288351606C204323151 @default.
- W4288351606 hasConceptScore W4288351606C2778445095 @default.
- W4288351606 hasConceptScore W4288351606C28826006 @default.
- W4288351606 hasConceptScore W4288351606C30014739 @default.
- W4288351606 hasConceptScore W4288351606C33923547 @default.
- W4288351606 hasConceptScore W4288351606C41008148 @default.
- W4288351606 hasConceptScore W4288351606C62799726 @default.
- W4288351606 hasConceptScore W4288351606C72169020 @default.
- W4288351606 hasConceptScore W4288351606C97541855 @default.
- W4288351606 hasConceptScore W4288351606C98763669 @default.
- W4288351606 hasLocation W42883516061 @default.
- W4288351606 hasOpenAccess W4288351606 @default.
- W4288351606 hasPrimaryLocation W42883516061 @default.
- W4288351606 hasRelatedWork W2124144580 @default.
- W4288351606 hasRelatedWork W2146763310 @default.
- W4288351606 hasRelatedWork W2182304831 @default.
- W4288351606 hasRelatedWork W2357975469 @default.
- W4288351606 hasRelatedWork W2808418668 @default.
- W4288351606 hasRelatedWork W2937181779 @default.
- W4288351606 hasRelatedWork W2970347269 @default.
- W4288351606 hasRelatedWork W3089496523 @default.
- W4288351606 hasRelatedWork W3096874164 @default.
- W4288351606 hasRelatedWork W3167472281 @default.
- W4288351606 isParatext "false" @default.
- W4288351606 isRetracted "false" @default.
- W4288351606 workType "article" @default.