Matches in SemOpenAlex for { <https://semopenalex.org/work/W4382619003> ?p ?o ?g. }
Showing items 1 to 56 of
56
with 100 items per page.
- W4382619003 abstract "This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2023), the mean-field interaction of agents renders the definition of the q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by $q$) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023), which can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by $q_e$) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments." @default.
- W4382619003 created "2023-06-30" @default.
- W4382619003 creator A5007953835 @default.
- W4382619003 creator A5075345849 @default.
- W4382619003 date "2023-06-28" @default.
- W4382619003 modified "2023-10-18" @default.
- W4382619003 title "Continuous Time q-learning for McKean-Vlasov Control Problems" @default.
- W4382619003 doi "https://doi.org/10.48550/arxiv.2306.16208" @default.
- W4382619003 hasPublicationYear "2023" @default.
- W4382619003 type Work @default.
- W4382619003 citedByCount "0" @default.
- W4382619003 crossrefType "posted-content" @default.
- W4382619003 hasAuthorship W4382619003A5007953835 @default.
- W4382619003 hasAuthorship W4382619003A5075345849 @default.
- W4382619003 hasBestOaLocation W43826190031 @default.
- W4382619003 hasConcept C126255220 @default.
- W4382619003 hasConcept C14036430 @default.
- W4382619003 hasConcept C14646407 @default.
- W4382619003 hasConcept C154945302 @default.
- W4382619003 hasConcept C188116033 @default.
- W4382619003 hasConcept C28826006 @default.
- W4382619003 hasConcept C33923547 @default.
- W4382619003 hasConcept C41008148 @default.
- W4382619003 hasConcept C48406656 @default.
- W4382619003 hasConcept C78458016 @default.
- W4382619003 hasConcept C86803240 @default.
- W4382619003 hasConcept C97541855 @default.
- W4382619003 hasConceptScore W4382619003C126255220 @default.
- W4382619003 hasConceptScore W4382619003C14036430 @default.
- W4382619003 hasConceptScore W4382619003C14646407 @default.
- W4382619003 hasConceptScore W4382619003C154945302 @default.
- W4382619003 hasConceptScore W4382619003C188116033 @default.
- W4382619003 hasConceptScore W4382619003C28826006 @default.
- W4382619003 hasConceptScore W4382619003C33923547 @default.
- W4382619003 hasConceptScore W4382619003C41008148 @default.
- W4382619003 hasConceptScore W4382619003C48406656 @default.
- W4382619003 hasConceptScore W4382619003C78458016 @default.
- W4382619003 hasConceptScore W4382619003C86803240 @default.
- W4382619003 hasConceptScore W4382619003C97541855 @default.
- W4382619003 hasLocation W43826190031 @default.
- W4382619003 hasLocation W43826190032 @default.
- W4382619003 hasOpenAccess W4382619003 @default.
- W4382619003 hasPrimaryLocation W43826190031 @default.
- W4382619003 hasRelatedWork W2025663273 @default.
- W4382619003 hasRelatedWork W2030191131 @default.
- W4382619003 hasRelatedWork W2089415692 @default.
- W4382619003 hasRelatedWork W2808418668 @default.
- W4382619003 hasRelatedWork W2947999965 @default.
- W4382619003 hasRelatedWork W2998241503 @default.
- W4382619003 hasRelatedWork W3154082313 @default.
- W4382619003 hasRelatedWork W3202907878 @default.
- W4382619003 hasRelatedWork W4288347224 @default.
- W4382619003 hasRelatedWork W4378771262 @default.
- W4382619003 isParatext "false" @default.
- W4382619003 isRetracted "false" @default.
- W4382619003 workType "article" @default.