SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4283032009> ?p ?o ?g. }

Showing items 1 to 63 of 63 with 100 items per page.

W4283032009 abstract "Many advances in cooperative multi-agent reinforcement learning (MARL) are based on two common design principles: value decomposition and parameter sharing. A typical MARL algorithm of this fashion decomposes a centralized Q-function into local Q-networks with parameters shared across agents. Such an algorithmic paradigm enables centralized training and decentralized execution (CTDE) and leads to efficient learning in practice. Despite all the advantages, we revisit these two principles and show that in certain scenarios, e.g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes. In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases, which partially supports some recent empirical observations that PG can be effective in many MARL testbeds. Inspired by our theoretical analysis, we present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors and empirically validate our findings on a variety of domains, ranging from the simplified matrix and grid-world games to complex benchmarks such as StarCraft Multi-Agent Challenge and Google Research Football. We hope our insights could benefit the community towards developing more general and more powerful MARL algorithms. Check our project website at https://sites.google.com/view/revisiting-marl." @default.
W4283032009 created "2022-06-18" @default.
W4283032009 creator A5009946088 @default.
W4283032009 creator A5011500179 @default.
W4283032009 creator A5033638005 @default.
W4283032009 creator A5033904513 @default.
W4283032009 creator A5062497242 @default.
W4283032009 date "2022-06-15" @default.
W4283032009 modified "2023-09-24" @default.
W4283032009 title "Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning" @default.
W4283032009 doi "https://doi.org/10.48550/arxiv.2206.07505" @default.
W4283032009 hasPublicationYear "2022" @default.
W4283032009 type Work @default.
W4283032009 citedByCount "0" @default.
W4283032009 crossrefType "posted-content" @default.
W4283032009 hasAuthorship W4283032009A5009946088 @default.
W4283032009 hasAuthorship W4283032009A5011500179 @default.
W4283032009 hasAuthorship W4283032009A5033638005 @default.
W4283032009 hasAuthorship W4283032009A5033904513 @default.
W4283032009 hasAuthorship W4283032009A5062497242 @default.
W4283032009 hasBestOaLocation W42830320091 @default.
W4283032009 hasConcept C109007969 @default.
W4283032009 hasConcept C119857082 @default.
W4283032009 hasConcept C124681953 @default.
W4283032009 hasConcept C136197465 @default.
W4283032009 hasConcept C14036430 @default.
W4283032009 hasConcept C151730666 @default.
W4283032009 hasConcept C154945302 @default.
W4283032009 hasConcept C18903297 @default.
W4283032009 hasConcept C41008148 @default.
W4283032009 hasConcept C78458016 @default.
W4283032009 hasConcept C86803240 @default.
W4283032009 hasConcept C92927620 @default.
W4283032009 hasConcept C97541855 @default.
W4283032009 hasConceptScore W4283032009C109007969 @default.
W4283032009 hasConceptScore W4283032009C119857082 @default.
W4283032009 hasConceptScore W4283032009C124681953 @default.
W4283032009 hasConceptScore W4283032009C136197465 @default.
W4283032009 hasConceptScore W4283032009C14036430 @default.
W4283032009 hasConceptScore W4283032009C151730666 @default.
W4283032009 hasConceptScore W4283032009C154945302 @default.
W4283032009 hasConceptScore W4283032009C18903297 @default.
W4283032009 hasConceptScore W4283032009C41008148 @default.
W4283032009 hasConceptScore W4283032009C78458016 @default.
W4283032009 hasConceptScore W4283032009C86803240 @default.
W4283032009 hasConceptScore W4283032009C92927620 @default.
W4283032009 hasConceptScore W4283032009C97541855 @default.
W4283032009 hasLocation W42830320091 @default.
W4283032009 hasOpenAccess W4283032009 @default.
W4283032009 hasPrimaryLocation W42830320091 @default.
W4283032009 hasRelatedWork W1562959674 @default.
W4283032009 hasRelatedWork W2794820824 @default.
W4283032009 hasRelatedWork W2961085424 @default.
W4283032009 hasRelatedWork W3022038857 @default.
W4283032009 hasRelatedWork W3074294383 @default.
W4283032009 hasRelatedWork W3173482257 @default.
W4283032009 hasRelatedWork W3209094908 @default.
W4283032009 hasRelatedWork W4210912933 @default.
W4283032009 hasRelatedWork W4226176818 @default.
W4283032009 hasRelatedWork W4255994452 @default.
W4283032009 isParatext "false" @default.
W4283032009 isRetracted "false" @default.
W4283032009 workType "article" @default.