Matches in SemOpenAlex for { <https://semopenalex.org/work/W2036103676> ?p ?o ?g. }
- W2036103676 endingPage "1302" @default.
- W2036103676 startingPage "1289" @default.
- W2036103676 abstract "An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases." @default.
- W2036103676 created "2016-06-24" @default.
- W2036103676 creator A5017743551 @default.
- W2036103676 creator A5036325812 @default.
- W2036103676 creator A5074250521 @default.
- W2036103676 date "2015-07-01" @default.
- W2036103676 modified "2023-10-14" @default.
- W2036103676 title "Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer" @default.
- W2036103676 cites W1502765764 @default.
- W2036103676 cites W1542941925 @default.
- W2036103676 cites W1868540347 @default.
- W2036103676 cites W1968902782 @default.
- W2036103676 cites W1974740629 @default.
- W2036103676 cites W1978375026 @default.
- W2036103676 cites W1980737627 @default.
- W2036103676 cites W1991799203 @default.
- W2036103676 cites W2002373723 @default.
- W2036103676 cites W2008393562 @default.
- W2036103676 cites W2010526786 @default.
- W2036103676 cites W2029250042 @default.
- W2036103676 cites W2072256588 @default.
- W2036103676 cites W2089415692 @default.
- W2036103676 cites W2092710777 @default.
- W2036103676 cites W2097780422 @default.
- W2036103676 cites W2099618002 @default.
- W2036103676 cites W2124152208 @default.
- W2036103676 cites W2130463867 @default.
- W2036103676 cites W2138965998 @default.
- W2036103676 doi "https://doi.org/10.1109/tcyb.2014.2349152" @default.
- W2036103676 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/25181517" @default.
- W2036103676 hasPublicationYear "2015" @default.
- W2036103676 type Work @default.
- W2036103676 sameAs 2036103676 @default.
- W2036103676 citedByCount "39" @default.
- W2036103676 countsByYear W20361036762016 @default.
- W2036103676 countsByYear W20361036762017 @default.
- W2036103676 countsByYear W20361036762018 @default.
- W2036103676 countsByYear W20361036762019 @default.
- W2036103676 countsByYear W20361036762020 @default.
- W2036103676 countsByYear W20361036762021 @default.
- W2036103676 countsByYear W20361036762022 @default.
- W2036103676 countsByYear W20361036762023 @default.
- W2036103676 crossrefType "journal-article" @default.
- W2036103676 hasAuthorship W2036103676A5017743551 @default.
- W2036103676 hasAuthorship W2036103676A5036325812 @default.
- W2036103676 hasAuthorship W2036103676A5074250521 @default.
- W2036103676 hasConcept C109007969 @default.
- W2036103676 hasConcept C121332964 @default.
- W2036103676 hasConcept C126255220 @default.
- W2036103676 hasConcept C127313418 @default.
- W2036103676 hasConcept C131157278 @default.
- W2036103676 hasConcept C134306372 @default.
- W2036103676 hasConcept C144237770 @default.
- W2036103676 hasConcept C150899416 @default.
- W2036103676 hasConcept C151730666 @default.
- W2036103676 hasConcept C154945302 @default.
- W2036103676 hasConcept C163630976 @default.
- W2036103676 hasConcept C164407509 @default.
- W2036103676 hasConcept C177142836 @default.
- W2036103676 hasConcept C201364048 @default.
- W2036103676 hasConcept C202556891 @default.
- W2036103676 hasConcept C33923547 @default.
- W2036103676 hasConcept C41008148 @default.
- W2036103676 hasConcept C46814582 @default.
- W2036103676 hasConcept C78045399 @default.
- W2036103676 hasConcept C92927620 @default.
- W2036103676 hasConcept C94766913 @default.
- W2036103676 hasConcept C97355855 @default.
- W2036103676 hasConcept C97541855 @default.
- W2036103676 hasConceptScore W2036103676C109007969 @default.
- W2036103676 hasConceptScore W2036103676C121332964 @default.
- W2036103676 hasConceptScore W2036103676C126255220 @default.
- W2036103676 hasConceptScore W2036103676C127313418 @default.
- W2036103676 hasConceptScore W2036103676C131157278 @default.
- W2036103676 hasConceptScore W2036103676C134306372 @default.
- W2036103676 hasConceptScore W2036103676C144237770 @default.
- W2036103676 hasConceptScore W2036103676C150899416 @default.
- W2036103676 hasConceptScore W2036103676C151730666 @default.
- W2036103676 hasConceptScore W2036103676C154945302 @default.
- W2036103676 hasConceptScore W2036103676C163630976 @default.
- W2036103676 hasConceptScore W2036103676C164407509 @default.
- W2036103676 hasConceptScore W2036103676C177142836 @default.
- W2036103676 hasConceptScore W2036103676C201364048 @default.
- W2036103676 hasConceptScore W2036103676C202556891 @default.
- W2036103676 hasConceptScore W2036103676C33923547 @default.
- W2036103676 hasConceptScore W2036103676C41008148 @default.
- W2036103676 hasConceptScore W2036103676C46814582 @default.
- W2036103676 hasConceptScore W2036103676C78045399 @default.
- W2036103676 hasConceptScore W2036103676C92927620 @default.
- W2036103676 hasConceptScore W2036103676C94766913 @default.
- W2036103676 hasConceptScore W2036103676C97355855 @default.
- W2036103676 hasConceptScore W2036103676C97541855 @default.
- W2036103676 hasFunder F4320321001 @default.
- W2036103676 hasFunder F4320334924 @default.
- W2036103676 hasIssue "7" @default.
- W2036103676 hasLocation W20361036761 @default.
- W2036103676 hasLocation W20361036762 @default.
- W2036103676 hasOpenAccess W2036103676 @default.