Matches in SemOpenAlex for { <https://semopenalex.org/work/W3207697547> ?p ?o ?g. }
- W3207697547 abstract "Many real-world applications of multi-agent reinforcement learning (RL), such as multi-robot navigation and decentralized control of cyber-physical systems, involve the cooperation of agents as a team with aligned objectives. We study multi-agent RL in the most basic cooperative setting -- Markov teams -- a class of Markov games where the cooperating agents share a common reward. We propose an algorithm in which each agent independently runs stage-based V-learning (a Q-learning style algorithm) to efficiently explore the unknown environment, while using a stochastic gradient descent (SGD) subroutine for policy updates. We show that the agents can learn an $epsilon$-approximate Nash equilibrium policy in at most $proptowidetilde{O}(1/epsilon^4)$ episodes. Our results advocate the use of a novel emph{stage-based} V-learning approach to create a stage-wise stationary environment. We also show that under certain smoothness assumptions of the team, our algorithm can achieve a nearly emph{team-optimal} Nash equilibrium. Simulation results corroborate our theoretical findings. One key feature of our algorithm is being emph{decentralized}, in the sense that each agent has access to only the state and its local actions, and is even emph{oblivious} to the presence of the other agents. Neither communication among teammates nor coordination by a central controller is required during learning. Hence, our algorithm can readily generalize to an arbitrary number of agents, without suffering from the exponential dependence on the number of agents." @default.
- W3207697547 created "2021-10-25" @default.
- W3207697547 creator A5019604570 @default.
- W3207697547 creator A5047410441 @default.
- W3207697547 creator A5069765568 @default.
- W3207697547 creator A5072096775 @default.
- W3207697547 date "2021-10-12" @default.
- W3207697547 modified "2023-09-27" @default.
- W3207697547 title "Decentralized Cooperative Multi-Agent Reinforcement Learning with Exploration" @default.
- W3207697547 cites W1517645949 @default.
- W3207697547 cites W1519783625 @default.
- W3207697547 cites W1521003796 @default.
- W3207697547 cites W1522301498 @default.
- W3207697547 cites W1542941925 @default.
- W3207697547 cites W1560074431 @default.
- W3207697547 cites W1570963478 @default.
- W3207697547 cites W1587973061 @default.
- W3207697547 cites W1639167632 @default.
- W3207697547 cites W1683981004 @default.
- W3207697547 cites W1850488217 @default.
- W3207697547 cites W1882177676 @default.
- W3207697547 cites W1918371733 @default.
- W3207697547 cites W1953936588 @default.
- W3207697547 cites W1967250398 @default.
- W3207697547 cites W1977655452 @default.
- W3207697547 cites W1991888757 @default.
- W3207697547 cites W2061641373 @default.
- W3207697547 cites W2074458475 @default.
- W3207697547 cites W2086887538 @default.
- W3207697547 cites W2092025421 @default.
- W3207697547 cites W2099618002 @default.
- W3207697547 cites W2103151730 @default.
- W3207697547 cites W2103541323 @default.
- W3207697547 cites W2104602264 @default.
- W3207697547 cites W2106887613 @default.
- W3207697547 cites W2107438106 @default.
- W3207697547 cites W2120846115 @default.
- W3207697547 cites W2123827275 @default.
- W3207697547 cites W2124666512 @default.
- W3207697547 cites W2133243407 @default.
- W3207697547 cites W2141076336 @default.
- W3207697547 cites W2145067550 @default.
- W3207697547 cites W2159050893 @default.
- W3207697547 cites W2176451521 @default.
- W3207697547 cites W2257979135 @default.
- W3207697547 cites W2530849036 @default.
- W3207697547 cites W2575731723 @default.
- W3207697547 cites W2752505153 @default.
- W3207697547 cites W2763081248 @default.
- W3207697547 cites W2786313301 @default.
- W3207697547 cites W2793398421 @default.
- W3207697547 cites W2807821938 @default.
- W3207697547 cites W2946606218 @default.
- W3207697547 cites W2962851402 @default.
- W3207697547 cites W2962990479 @default.
- W3207697547 cites W2963000099 @default.
- W3207697547 cites W2963049774 @default.
- W3207697547 cites W2963111827 @default.
- W3207697547 cites W2963297691 @default.
- W3207697547 cites W2963364412 @default.
- W3207697547 cites W2963407617 @default.
- W3207697547 cites W2963470657 @default.
- W3207697547 cites W2963534244 @default.
- W3207697547 cites W2963681631 @default.
- W3207697547 cites W2963965485 @default.
- W3207697547 cites W2964054583 @default.
- W3207697547 cites W2964246930 @default.
- W3207697547 cites W2970623506 @default.
- W3207697547 cites W2970650844 @default.
- W3207697547 cites W2972782056 @default.
- W3207697547 cites W2982316857 @default.
- W3207697547 cites W2991046523 @default.
- W3207697547 cites W2993258424 @default.
- W3207697547 cites W3020325294 @default.
- W3207697547 cites W3035454135 @default.
- W3207697547 cites W3037361954 @default.
- W3207697547 cites W3046141244 @default.
- W3207697547 cites W3046441626 @default.
- W3207697547 cites W3046553904 @default.
- W3207697547 cites W3046692137 @default.
- W3207697547 cites W3092302772 @default.
- W3207697547 cites W3093287223 @default.
- W3207697547 cites W3100292177 @default.
- W3207697547 cites W3100785954 @default.
- W3207697547 cites W3106398159 @default.
- W3207697547 cites W3111119344 @default.
- W3207697547 cites W3119176801 @default.
- W3207697547 cites W3131948096 @default.
- W3207697547 cites W3134847873 @default.
- W3207697547 cites W3143815010 @default.
- W3207697547 cites W3167472281 @default.
- W3207697547 cites W3168075724 @default.
- W3207697547 cites W3169011731 @default.
- W3207697547 cites W3170424010 @default.
- W3207697547 cites W3171210634 @default.
- W3207697547 cites W3172288035 @default.
- W3207697547 cites W3197088826 @default.
- W3207697547 cites W657026121 @default.
- W3207697547 cites W3166490809 @default.
- W3207697547 cites W3173556592 @default.