Matches in SemOpenAlex for { <https://semopenalex.org/work/W4225364098> ?p ?o ?g. }
Showing items 1 to 83 of
83
with 100 items per page.
- W4225364098 abstract "The main challenge of multiagent reinforcement learning is the difficulty of learning useful policies in the presence of other simultaneously learning agents whose changing behaviors jointly affect the environment's transition and reward dynamics. An effective approach that has recently emerged for addressing this non-stationarity is for each agent to anticipate the learning of other agents and influence the evolution of future policies towards desirable behavior for its own benefit. Unfortunately, previous approaches for achieving this suffer from myopic evaluation, considering only a finite number of policy updates. As such, these methods can only influence transient future policies rather than achieving the promise of scalable equilibrium selection approaches that influence the behavior at convergence. In this paper, we propose a principled framework for considering the limiting policies of other agents as time approaches infinity. Specifically, we develop a new optimization objective that maximizes each agent's average reward by directly accounting for the impact of its behavior on the limiting set of policies that other agents will converge to. Our paper characterizes desirable solution concepts within this problem setting and provides practical approaches for optimizing over possible outcomes. As a result of our farsighted objective, we demonstrate better long-term performance than state-of-the-art baselines across a suite of diverse multiagent benchmark domains." @default.
- W4225364098 created "2022-05-05" @default.
- W4225364098 creator A5011665886 @default.
- W4225364098 creator A5013046903 @default.
- W4225364098 creator A5019685128 @default.
- W4225364098 creator A5033449275 @default.
- W4225364098 creator A5037927240 @default.
- W4225364098 creator A5048827613 @default.
- W4225364098 creator A5051533554 @default.
- W4225364098 creator A5059094093 @default.
- W4225364098 date "2022-03-07" @default.
- W4225364098 modified "2023-10-02" @default.
- W4225364098 title "Influencing Long-Term Behavior in Multiagent Reinforcement Learning" @default.
- W4225364098 doi "https://doi.org/10.48550/arxiv.2203.03535" @default.
- W4225364098 hasPublicationYear "2022" @default.
- W4225364098 type Work @default.
- W4225364098 citedByCount "0" @default.
- W4225364098 crossrefType "posted-content" @default.
- W4225364098 hasAuthorship W4225364098A5011665886 @default.
- W4225364098 hasAuthorship W4225364098A5013046903 @default.
- W4225364098 hasAuthorship W4225364098A5019685128 @default.
- W4225364098 hasAuthorship W4225364098A5033449275 @default.
- W4225364098 hasAuthorship W4225364098A5037927240 @default.
- W4225364098 hasAuthorship W4225364098A5048827613 @default.
- W4225364098 hasAuthorship W4225364098A5051533554 @default.
- W4225364098 hasAuthorship W4225364098A5059094093 @default.
- W4225364098 hasBestOaLocation W42253640981 @default.
- W4225364098 hasConcept C119857082 @default.
- W4225364098 hasConcept C121332964 @default.
- W4225364098 hasConcept C127413603 @default.
- W4225364098 hasConcept C13280743 @default.
- W4225364098 hasConcept C154945302 @default.
- W4225364098 hasConcept C162324750 @default.
- W4225364098 hasConcept C177264268 @default.
- W4225364098 hasConcept C185798385 @default.
- W4225364098 hasConcept C188198153 @default.
- W4225364098 hasConcept C199360897 @default.
- W4225364098 hasConcept C205649164 @default.
- W4225364098 hasConcept C2777303404 @default.
- W4225364098 hasConcept C41008148 @default.
- W4225364098 hasConcept C48044578 @default.
- W4225364098 hasConcept C50522688 @default.
- W4225364098 hasConcept C61797465 @default.
- W4225364098 hasConcept C62520636 @default.
- W4225364098 hasConcept C77088390 @default.
- W4225364098 hasConcept C78519656 @default.
- W4225364098 hasConcept C97541855 @default.
- W4225364098 hasConceptScore W4225364098C119857082 @default.
- W4225364098 hasConceptScore W4225364098C121332964 @default.
- W4225364098 hasConceptScore W4225364098C127413603 @default.
- W4225364098 hasConceptScore W4225364098C13280743 @default.
- W4225364098 hasConceptScore W4225364098C154945302 @default.
- W4225364098 hasConceptScore W4225364098C162324750 @default.
- W4225364098 hasConceptScore W4225364098C177264268 @default.
- W4225364098 hasConceptScore W4225364098C185798385 @default.
- W4225364098 hasConceptScore W4225364098C188198153 @default.
- W4225364098 hasConceptScore W4225364098C199360897 @default.
- W4225364098 hasConceptScore W4225364098C205649164 @default.
- W4225364098 hasConceptScore W4225364098C2777303404 @default.
- W4225364098 hasConceptScore W4225364098C41008148 @default.
- W4225364098 hasConceptScore W4225364098C48044578 @default.
- W4225364098 hasConceptScore W4225364098C50522688 @default.
- W4225364098 hasConceptScore W4225364098C61797465 @default.
- W4225364098 hasConceptScore W4225364098C62520636 @default.
- W4225364098 hasConceptScore W4225364098C77088390 @default.
- W4225364098 hasConceptScore W4225364098C78519656 @default.
- W4225364098 hasConceptScore W4225364098C97541855 @default.
- W4225364098 hasLocation W42253640981 @default.
- W4225364098 hasOpenAccess W4225364098 @default.
- W4225364098 hasPrimaryLocation W42253640981 @default.
- W4225364098 hasRelatedWork W112744582 @default.
- W4225364098 hasRelatedWork W1992807924 @default.
- W4225364098 hasRelatedWork W2789601449 @default.
- W4225364098 hasRelatedWork W3022038857 @default.
- W4225364098 hasRelatedWork W3136744003 @default.
- W4225364098 hasRelatedWork W4210620793 @default.
- W4225364098 hasRelatedWork W4286893825 @default.
- W4225364098 hasRelatedWork W4292218736 @default.
- W4225364098 hasRelatedWork W4319083788 @default.
- W4225364098 hasRelatedWork W4320031300 @default.
- W4225364098 isParatext "false" @default.
- W4225364098 isRetracted "false" @default.
- W4225364098 workType "article" @default.