Matches in SemOpenAlex for { <https://semopenalex.org/work/W4226501493> ?p ?o ?g. }
Showing items 1 to 83 of
83
with 100 items per page.
- W4226501493 abstract "This paper is concerned with the asynchronous form of Q-learning, which applies a stochastic approximation scheme to Markovian data samples. Motivated by the recent advances in offline reinforcement learning, we develop an algorithmic framework that incorporates the principle of pessimism into asynchronous Q-learning, which penalizes infrequently-visited state-action pairs based on suitable lower confidence bounds (LCBs). This framework leads to, among other things, improved sample efficiency and enhanced adaptivity in the presence of near-expert data. Our approach permits the observed data in some important scenarios to cover only partial state-action space, which is in stark contrast to prior theory that requires uniform coverage of all state-action pairs. When coupled with the idea of variance reduction, asynchronous Q-learning with LCB penalization achieves near-optimal sample complexity, provided that the target accuracy level is small enough. In comparison, prior works were suboptimal in terms of the dependency on the effective horizon even when i.i.d. sampling is permitted. Our results deliver the first theoretical support for the use of pessimism principle in the presence of Markovian non-i.i.d. data." @default.
- W4226501493 created "2022-05-05" @default.
- W4226501493 creator A5017096532 @default.
- W4226501493 creator A5024124240 @default.
- W4226501493 creator A5031910872 @default.
- W4226501493 creator A5061140388 @default.
- W4226501493 date "2022-03-14" @default.
- W4226501493 modified "2023-10-18" @default.
- W4226501493 title "The Efficacy of Pessimism in Asynchronous Q-Learning" @default.
- W4226501493 doi "https://doi.org/10.48550/arxiv.2203.07368" @default.
- W4226501493 hasPublicationYear "2022" @default.
- W4226501493 type Work @default.
- W4226501493 citedByCount "0" @default.
- W4226501493 crossrefType "posted-content" @default.
- W4226501493 hasAuthorship W4226501493A5017096532 @default.
- W4226501493 hasAuthorship W4226501493A5024124240 @default.
- W4226501493 hasAuthorship W4226501493A5031910872 @default.
- W4226501493 hasAuthorship W4226501493A5061140388 @default.
- W4226501493 hasBestOaLocation W42265014931 @default.
- W4226501493 hasConcept C105795698 @default.
- W4226501493 hasConcept C111472728 @default.
- W4226501493 hasConcept C119857082 @default.
- W4226501493 hasConcept C121332964 @default.
- W4226501493 hasConcept C121955636 @default.
- W4226501493 hasConcept C138885662 @default.
- W4226501493 hasConcept C144133560 @default.
- W4226501493 hasConcept C151319957 @default.
- W4226501493 hasConcept C154945302 @default.
- W4226501493 hasConcept C159886148 @default.
- W4226501493 hasConcept C185592680 @default.
- W4226501493 hasConcept C188116033 @default.
- W4226501493 hasConcept C19499675 @default.
- W4226501493 hasConcept C196083921 @default.
- W4226501493 hasConcept C198531522 @default.
- W4226501493 hasConcept C2780791683 @default.
- W4226501493 hasConcept C31258907 @default.
- W4226501493 hasConcept C33923547 @default.
- W4226501493 hasConcept C41008148 @default.
- W4226501493 hasConcept C43617362 @default.
- W4226501493 hasConcept C62520636 @default.
- W4226501493 hasConcept C62644790 @default.
- W4226501493 hasConcept C97541855 @default.
- W4226501493 hasConcept C9992130 @default.
- W4226501493 hasConceptScore W4226501493C105795698 @default.
- W4226501493 hasConceptScore W4226501493C111472728 @default.
- W4226501493 hasConceptScore W4226501493C119857082 @default.
- W4226501493 hasConceptScore W4226501493C121332964 @default.
- W4226501493 hasConceptScore W4226501493C121955636 @default.
- W4226501493 hasConceptScore W4226501493C138885662 @default.
- W4226501493 hasConceptScore W4226501493C144133560 @default.
- W4226501493 hasConceptScore W4226501493C151319957 @default.
- W4226501493 hasConceptScore W4226501493C154945302 @default.
- W4226501493 hasConceptScore W4226501493C159886148 @default.
- W4226501493 hasConceptScore W4226501493C185592680 @default.
- W4226501493 hasConceptScore W4226501493C188116033 @default.
- W4226501493 hasConceptScore W4226501493C19499675 @default.
- W4226501493 hasConceptScore W4226501493C196083921 @default.
- W4226501493 hasConceptScore W4226501493C198531522 @default.
- W4226501493 hasConceptScore W4226501493C2780791683 @default.
- W4226501493 hasConceptScore W4226501493C31258907 @default.
- W4226501493 hasConceptScore W4226501493C33923547 @default.
- W4226501493 hasConceptScore W4226501493C41008148 @default.
- W4226501493 hasConceptScore W4226501493C43617362 @default.
- W4226501493 hasConceptScore W4226501493C62520636 @default.
- W4226501493 hasConceptScore W4226501493C62644790 @default.
- W4226501493 hasConceptScore W4226501493C97541855 @default.
- W4226501493 hasConceptScore W4226501493C9992130 @default.
- W4226501493 hasLocation W42265014931 @default.
- W4226501493 hasOpenAccess W4226501493 @default.
- W4226501493 hasPrimaryLocation W42265014931 @default.
- W4226501493 hasRelatedWork W1501362825 @default.
- W4226501493 hasRelatedWork W2029278774 @default.
- W4226501493 hasRelatedWork W2120679727 @default.
- W4226501493 hasRelatedWork W2170607316 @default.
- W4226501493 hasRelatedWork W2193091921 @default.
- W4226501493 hasRelatedWork W2205410708 @default.
- W4226501493 hasRelatedWork W2767133776 @default.
- W4226501493 hasRelatedWork W2962798535 @default.
- W4226501493 hasRelatedWork W3022038857 @default.
- W4226501493 hasRelatedWork W4226201616 @default.
- W4226501493 isParatext "false" @default.
- W4226501493 isRetracted "false" @default.
- W4226501493 workType "article" @default.