Matches in SemOpenAlex for { <https://semopenalex.org/work/W3197785132> ?p ?o ?g. }
- W3197785132 abstract "Trust region methods are widely applied in single-agent reinforcement learning problems due to their monotonic performance-improvement guarantee at every iteration. Nonetheless, when applied in multi-agent settings, the guarantee of trust region methods no longer holds because an agent's payoff is also affected by other agents' adaptive behaviors. To tackle this problem, we conduct a game-theoretical analysis in the policy space, and propose a multi-agent trust region learning method (MATRL), which enables trust region optimization for multi-agent learning. Specifically, MATRL finds a stable improvement direction that is guided by the solution concept of Nash equilibrium at the meta-game level. We derive the monotonic improvement guarantee in multi-agent settings and empirically show the local convergence of MATRL to stable fixed points in the two-player rotational differential game. To test our method, we evaluate MATRL in both discrete and continuous multiplayer general-sum games including checker and switch grid worlds, multi-agent MuJoCo, and Atari games. Results suggest that MATRL significantly outperforms strong multi-agent reinforcement learning baselines." @default.
- W3197785132 created "2021-09-13" @default.
- W3197785132 creator A5011090841 @default.
- W3197785132 creator A5012417955 @default.
- W3197785132 creator A5031429003 @default.
- W3197785132 creator A5036563779 @default.
- W3197785132 creator A5042241049 @default.
- W3197785132 creator A5042785211 @default.
- W3197785132 creator A5090073634 @default.
- W3197785132 date "2021-06-12" @default.
- W3197785132 modified "2023-09-25" @default.
- W3197785132 title "A Game-Theoretic Approach to Multi-Agent Trust Region Optimization" @default.
- W3197785132 cites W102212266 @default.
- W3197785132 cites W1496672808 @default.
- W3197785132 cites W1528676759 @default.
- W3197785132 cites W1540725368 @default.
- W3197785132 cites W1542941925 @default.
- W3197785132 cites W1575592356 @default.
- W3197785132 cites W1579184372 @default.
- W3197785132 cites W1605188341 @default.
- W3197785132 cites W1607392272 @default.
- W3197785132 cites W1641379095 @default.
- W3197785132 cites W1757796397 @default.
- W3197785132 cites W1771410628 @default.
- W3197785132 cites W183499311 @default.
- W3197785132 cites W1967250398 @default.
- W3197785132 cites W2002373723 @default.
- W3197785132 cites W2012812921 @default.
- W3197785132 cites W206679605 @default.
- W3197785132 cites W2067018002 @default.
- W3197785132 cites W2067050450 @default.
- W3197785132 cites W2083639782 @default.
- W3197785132 cites W2106654973 @default.
- W3197785132 cites W2113584214 @default.
- W3197785132 cites W2120327309 @default.
- W3197785132 cites W2131967794 @default.
- W3197785132 cites W2138537392 @default.
- W3197785132 cites W2145297839 @default.
- W3197785132 cites W2160150462 @default.
- W3197785132 cites W2565610523 @default.
- W3197785132 cites W2575731723 @default.
- W3197785132 cites W2583713374 @default.
- W3197785132 cites W2736601468 @default.
- W3197785132 cites W2740377041 @default.
- W3197785132 cites W2756196406 @default.
- W3197785132 cites W2807741983 @default.
- W3197785132 cites W2894677249 @default.
- W3197785132 cites W2907669392 @default.
- W3197785132 cites W2911743772 @default.
- W3197785132 cites W2925418831 @default.
- W3197785132 cites W2946606218 @default.
- W3197785132 cites W2949608212 @default.
- W3197785132 cites W2962938168 @default.
- W3197785132 cites W2962979365 @default.
- W3197785132 cites W2963000099 @default.
- W3197785132 cites W2963039558 @default.
- W3197785132 cites W2963048836 @default.
- W3197785132 cites W2963627051 @default.
- W3197785132 cites W2963788414 @default.
- W3197785132 cites W2963836708 @default.
- W3197785132 cites W2963864421 @default.
- W3197785132 cites W2963937357 @default.
- W3197785132 cites W2963981733 @default.
- W3197785132 cites W2964338167 @default.
- W3197785132 cites W2964381205 @default.
- W3197785132 cites W2970537473 @default.
- W3197785132 cites W2982316857 @default.
- W3197785132 cites W2991843419 @default.
- W3197785132 cites W2996037775 @default.
- W3197785132 cites W2996634922 @default.
- W3197785132 cites W3002679839 @default.
- W3197785132 cites W3011672202 @default.
- W3197785132 cites W3030629143 @default.
- W3197785132 cites W3035569762 @default.
- W3197785132 cites W3037180626 @default.
- W3197785132 cites W3087640518 @default.
- W3197785132 cites W3093356339 @default.
- W3197785132 cites W3093963693 @default.
- W3197785132 cites W3100292177 @default.
- W3197785132 cites W3105688066 @default.
- W3197785132 cites W3107615218 @default.
- W3197785132 cites W3130449816 @default.
- W3197785132 doi "https://doi.org/10.48550/arxiv.2106.06828" @default.
- W3197785132 hasPublicationYear "2021" @default.
- W3197785132 type Work @default.
- W3197785132 sameAs 3197785132 @default.
- W3197785132 citedByCount "1" @default.
- W3197785132 countsByYear W31977851322021 @default.
- W3197785132 crossrefType "posted-content" @default.
- W3197785132 hasAuthorship W3197785132A5011090841 @default.
- W3197785132 hasAuthorship W3197785132A5012417955 @default.
- W3197785132 hasAuthorship W3197785132A5031429003 @default.
- W3197785132 hasAuthorship W3197785132A5036563779 @default.
- W3197785132 hasAuthorship W3197785132A5042241049 @default.
- W3197785132 hasAuthorship W3197785132A5042785211 @default.
- W3197785132 hasAuthorship W3197785132A5090073634 @default.
- W3197785132 hasBestOaLocation W31977851321 @default.
- W3197785132 hasConcept C126255220 @default.
- W3197785132 hasConcept C134306372 @default.
- W3197785132 hasConcept C144237770 @default.