SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W3207876914> ?p ?o ?g. }

Showing items 1 to 94 of 94 with 100 items per page.

W3207876914 endingPage "473" @default.
W3207876914 startingPage "448" @default.
W3207876914 abstract "Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$gamma $ </tex-math></inline-formula> -discounted MDP with state space <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$mathcal {S}$ </tex-math></inline-formula> and action space <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$mathcal {A}$ </tex-math></inline-formula> , we demonstrate that the <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$ell _{infty }$ </tex-math></inline-formula> -based sample complexity of classical asynchronous Q-learning — namely, the number of samples needed to yield an entrywise <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$varepsilon $ </tex-math></inline-formula> -accurate estimate of the Q-function — is at most on the order of <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$frac {1}{ mu _{mathsf {min}}(1-gamma)^{5}varepsilon ^{2}}+ frac { t_{mathsf {mix}}}{ mu _{mathsf {min}}(1-gamma)}$ </tex-math></inline-formula> up to some logarithmic factor, provided that a proper constant learning rate is adopted. Here, <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$t_{mathsf {mix}}$ </tex-math></inline-formula> and <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$mu _{mathsf {min}}$ </tex-math></inline-formula> denote respectively the mixing time and the minimum state-action occupancy probability of the sample trajectory. The first term of this bound matches the sample complexity in the synchronous case with independent samples drawn from the stationary distribution of the trajectory. The second term reflects the cost taken for the empirical distribution of the Markovian trajectory to reach a steady state, which is incurred at the very beginning and becomes amortized as the algorithm runs. Encouragingly, the above bound improves upon the state-of-the-art result by a factor of at least <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$|mathcal {S}||mathcal {A}|$ </tex-math></inline-formula> for all scenarios, and by a factor of at least <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$t_{mathsf {mix}}|mathcal {S}||mathcal {A}|$ </tex-math></inline-formula> for any sufficiently small accuracy level <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$varepsilon $ </tex-math></inline-formula> . Further, we demonstrate that the scaling on the effective horizon <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$frac {1}{1-gamma }$ </tex-math></inline-formula> can be improved by means of variance reduction." @default.
W3207876914 created "2021-10-25" @default.
W3207876914 creator A5005015806 @default.
W3207876914 creator A5011030324 @default.
W3207876914 creator A5024124240 @default.
W3207876914 creator A5053809095 @default.
W3207876914 creator A5061140388 @default.
W3207876914 date "2022-01-01" @default.
W3207876914 modified "2023-10-16" @default.
W3207876914 title "Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction" @default.
W3207876914 cites W1511694993 @default.
W3207876914 cites W1865368880 @default.
W3207876914 cites W1999254175 @default.
W3207876914 cites W2071983464 @default.
W3207876914 cites W2077343054 @default.
W3207876914 cites W2120678009 @default.
W3207876914 cites W2129670787 @default.
W3207876914 cites W2145339207 @default.
W3207876914 cites W2165131254 @default.
W3207876914 cites W2982113767 @default.
W3207876914 cites W3041202696 @default.
W3207876914 cites W32403112 @default.
W3207876914 cites W4233696721 @default.
W3207876914 cites W4245577611 @default.
W3207876914 doi "https://doi.org/10.1109/tit.2021.3120096" @default.
W3207876914 hasPublicationYear "2022" @default.
W3207876914 type Work @default.
W3207876914 sameAs 3207876914 @default.
W3207876914 citedByCount "7" @default.
W3207876914 countsByYear W32078769142021 @default.
W3207876914 countsByYear W32078769142022 @default.
W3207876914 countsByYear W32078769142023 @default.
W3207876914 crossrefType "journal-article" @default.
W3207876914 hasAuthorship W3207876914A5005015806 @default.
W3207876914 hasAuthorship W3207876914A5011030324 @default.
W3207876914 hasAuthorship W3207876914A5024124240 @default.
W3207876914 hasAuthorship W3207876914A5053809095 @default.
W3207876914 hasAuthorship W3207876914A5061140388 @default.
W3207876914 hasBestOaLocation W32078769141 @default.
W3207876914 hasConcept C11413529 @default.
W3207876914 hasConcept C114614502 @default.
W3207876914 hasConcept C118615104 @default.
W3207876914 hasConcept C136119220 @default.
W3207876914 hasConcept C14036430 @default.
W3207876914 hasConcept C151319957 @default.
W3207876914 hasConcept C202444582 @default.
W3207876914 hasConcept C31258907 @default.
W3207876914 hasConcept C33923547 @default.
W3207876914 hasConcept C41008148 @default.
W3207876914 hasConcept C45357846 @default.
W3207876914 hasConcept C78458016 @default.
W3207876914 hasConcept C86803240 @default.
W3207876914 hasConcept C94375191 @default.
W3207876914 hasConceptScore W3207876914C11413529 @default.
W3207876914 hasConceptScore W3207876914C114614502 @default.
W3207876914 hasConceptScore W3207876914C118615104 @default.
W3207876914 hasConceptScore W3207876914C136119220 @default.
W3207876914 hasConceptScore W3207876914C14036430 @default.
W3207876914 hasConceptScore W3207876914C151319957 @default.
W3207876914 hasConceptScore W3207876914C202444582 @default.
W3207876914 hasConceptScore W3207876914C31258907 @default.
W3207876914 hasConceptScore W3207876914C33923547 @default.
W3207876914 hasConceptScore W3207876914C41008148 @default.
W3207876914 hasConceptScore W3207876914C45357846 @default.
W3207876914 hasConceptScore W3207876914C78458016 @default.
W3207876914 hasConceptScore W3207876914C86803240 @default.
W3207876914 hasConceptScore W3207876914C94375191 @default.
W3207876914 hasFunder F4320306076 @default.
W3207876914 hasFunder F4320321001 @default.
W3207876914 hasFunder F4320337345 @default.
W3207876914 hasFunder F4320338279 @default.
W3207876914 hasFunder F4320338281 @default.
W3207876914 hasIssue "1" @default.
W3207876914 hasLocation W32078769141 @default.
W3207876914 hasLocation W32078769142 @default.
W3207876914 hasOpenAccess W3207876914 @default.
W3207876914 hasPrimaryLocation W32078769141 @default.
W3207876914 hasRelatedWork W1978042415 @default.
W3207876914 hasRelatedWork W2076229434 @default.
W3207876914 hasRelatedWork W2100843947 @default.
W3207876914 hasRelatedWork W2108300532 @default.
W3207876914 hasRelatedWork W2145515856 @default.
W3207876914 hasRelatedWork W2346285483 @default.
W3207876914 hasRelatedWork W2903311473 @default.
W3207876914 hasRelatedWork W3046926816 @default.
W3207876914 hasRelatedWork W347180939 @default.
W3207876914 hasRelatedWork W4205753330 @default.
W3207876914 hasVolume "68" @default.
W3207876914 isParatext "false" @default.
W3207876914 isRetracted "false" @default.
W3207876914 magId "3207876914" @default.
W3207876914 workType "article" @default.