SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4302013128> ?p ?o ?g. }

Showing items 1 to 73 of 73 with 100 items per page.

W4302013128 abstract "Multi-Agent Reinforcement Learning (MARL) -- where multiple agents learn to interact in a shared dynamic environment -- permeates across a wide range of critical applications. While there has been substantial progress on understanding the global convergence of policy optimization methods in single-agent RL, designing and analysis of efficient policy optimization algorithms in the MARL setting present significant challenges, which unfortunately, remain highly inadequately addressed by existing theory. In this paper, we focus on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games, and study equilibrium finding algorithms in both the infinite-horizon discounted setting and the finite-horizon episodic setting. We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method and the value is updated on a slower timescale. We show that, in the full-information tabular setting, the proposed method achieves a finite-time last-iterate linear convergence to the quantal response equilibrium of the regularized problem, which translates to a sublinear last-iterate convergence to the Nash equilibrium by controlling the amount of regularization. Our convergence results improve upon the best known iteration complexities, and lead to a better understanding of policy optimization in competitive Markov games." @default.
W4302013128 created "2022-10-06" @default.
W4302013128 creator A5033061754 @default.
W4302013128 creator A5053809095 @default.
W4302013128 creator A5083014172 @default.
W4302013128 creator A5091389636 @default.
W4302013128 date "2022-10-03" @default.
W4302013128 modified "2023-09-26" @default.
W4302013128 title "Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games" @default.
W4302013128 doi "https://doi.org/10.48550/arxiv.2210.01050" @default.
W4302013128 hasPublicationYear "2022" @default.
W4302013128 type Work @default.
W4302013128 citedByCount "0" @default.
W4302013128 crossrefType "posted-content" @default.
W4302013128 hasAuthorship W4302013128A5033061754 @default.
W4302013128 hasAuthorship W4302013128A5053809095 @default.
W4302013128 hasAuthorship W4302013128A5083014172 @default.
W4302013128 hasAuthorship W4302013128A5091389636 @default.
W4302013128 hasBestOaLocation W43020131281 @default.
W4302013128 hasConcept C105795698 @default.
W4302013128 hasConcept C106189395 @default.
W4302013128 hasConcept C117160843 @default.
W4302013128 hasConcept C119857082 @default.
W4302013128 hasConcept C126255220 @default.
W4302013128 hasConcept C134306372 @default.
W4302013128 hasConcept C137836250 @default.
W4302013128 hasConcept C144237770 @default.
W4302013128 hasConcept C154945302 @default.
W4302013128 hasConcept C159886148 @default.
W4302013128 hasConcept C162324750 @default.
W4302013128 hasConcept C2777303404 @default.
W4302013128 hasConcept C33923547 @default.
W4302013128 hasConcept C41008148 @default.
W4302013128 hasConcept C42747912 @default.
W4302013128 hasConcept C46814582 @default.
W4302013128 hasConcept C50522688 @default.
W4302013128 hasConcept C97541855 @default.
W4302013128 hasConcept C98763669 @default.
W4302013128 hasConceptScore W4302013128C105795698 @default.
W4302013128 hasConceptScore W4302013128C106189395 @default.
W4302013128 hasConceptScore W4302013128C117160843 @default.
W4302013128 hasConceptScore W4302013128C119857082 @default.
W4302013128 hasConceptScore W4302013128C126255220 @default.
W4302013128 hasConceptScore W4302013128C134306372 @default.
W4302013128 hasConceptScore W4302013128C137836250 @default.
W4302013128 hasConceptScore W4302013128C144237770 @default.
W4302013128 hasConceptScore W4302013128C154945302 @default.
W4302013128 hasConceptScore W4302013128C159886148 @default.
W4302013128 hasConceptScore W4302013128C162324750 @default.
W4302013128 hasConceptScore W4302013128C2777303404 @default.
W4302013128 hasConceptScore W4302013128C33923547 @default.
W4302013128 hasConceptScore W4302013128C41008148 @default.
W4302013128 hasConceptScore W4302013128C42747912 @default.
W4302013128 hasConceptScore W4302013128C46814582 @default.
W4302013128 hasConceptScore W4302013128C50522688 @default.
W4302013128 hasConceptScore W4302013128C97541855 @default.
W4302013128 hasConceptScore W4302013128C98763669 @default.
W4302013128 hasLocation W43020131281 @default.
W4302013128 hasOpenAccess W4302013128 @default.
W4302013128 hasPrimaryLocation W43020131281 @default.
W4302013128 hasRelatedWork W1626977535 @default.
W4302013128 hasRelatedWork W2124144580 @default.
W4302013128 hasRelatedWork W2128670912 @default.
W4302013128 hasRelatedWork W2128702080 @default.
W4302013128 hasRelatedWork W2808418668 @default.
W4302013128 hasRelatedWork W3089496523 @default.
W4302013128 hasRelatedWork W3140738360 @default.
W4302013128 hasRelatedWork W3164927689 @default.
W4302013128 hasRelatedWork W3167472281 @default.
W4302013128 hasRelatedWork W3198596521 @default.
W4302013128 isParatext "false" @default.
W4302013128 isRetracted "false" @default.
W4302013128 workType "article" @default.