Matches in SemOpenAlex for { <https://semopenalex.org/work/W3167041099> ?p ?o ?g. }
- W3167041099 abstract "We posit a new mechanism for cooperation in multi-agent reinforcement learning (MARL) based upon any nonlinear function of the team's long-term state-action occupancy measure, i.e., a emph{general utility}. This subsumes the cumulative return but also allows one to incorporate risk-sensitivity, exploration, and priors. % We derive the {bf D}ecentralized {bf S}hadow Reward {bf A}ctor-{bf C}ritic (DSAC) in which agents alternate between policy evaluation (critic), weighted averaging with neighbors (information mixing), and local gradient updates for their policy parameters (actor). DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i.e., the shadow reward. DSAC converges to $epsilon$-stationarity in $mathcal{O}(1/epsilon^{2.5})$ (Theorem ref{theorem:final}) or faster $mathcal{O}(1/epsilon^{2})$ (Corollary ref{corollary:communication}) steps with high probability, depending on the amount of communications. We further establish the non-existence of spurious stationary points for this problem, that is, DSAC finds the globally optimal policy (Corollary ref{corollary:global}). Experiments demonstrate the merits of goals beyond the cumulative return in cooperative MARL." @default.
- W3167041099 created "2021-06-22" @default.
- W3167041099 creator A5025896653 @default.
- W3167041099 creator A5039563144 @default.
- W3167041099 creator A5052683643 @default.
- W3167041099 creator A5073029812 @default.
- W3167041099 date "2021-05-29" @default.
- W3167041099 modified "2023-09-27" @default.
- W3167041099 title "MARL with General Utilities via Decentralized Shadow Reward Actor-Critic." @default.
- W3167041099 cites W1486653676 @default.
- W3167041099 cites W1541527977 @default.
- W3167041099 cites W1542941925 @default.
- W3167041099 cites W1571416372 @default.
- W3167041099 cites W1578099820 @default.
- W3167041099 cites W1578630563 @default.
- W3167041099 cites W1643746074 @default.
- W3167041099 cites W1918371733 @default.
- W3167041099 cites W1977655452 @default.
- W3167041099 cites W1982243808 @default.
- W3167041099 cites W1982713086 @default.
- W3167041099 cites W1986014385 @default.
- W3167041099 cites W1991888757 @default.
- W3167041099 cites W1995985801 @default.
- W3167041099 cites W2029080014 @default.
- W3167041099 cites W2041367235 @default.
- W3167041099 cites W2044212084 @default.
- W3167041099 cites W2082261506 @default.
- W3167041099 cites W2086304253 @default.
- W3167041099 cites W2088956500 @default.
- W3167041099 cites W2098441552 @default.
- W3167041099 cites W2099618002 @default.
- W3167041099 cites W2104602264 @default.
- W3167041099 cites W2117905067 @default.
- W3167041099 cites W2119567691 @default.
- W3167041099 cites W2119717200 @default.
- W3167041099 cites W2121863487 @default.
- W3167041099 cites W2128928412 @default.
- W3167041099 cites W2148112459 @default.
- W3167041099 cites W2155027007 @default.
- W3167041099 cites W2155791599 @default.
- W3167041099 cites W2156737235 @default.
- W3167041099 cites W2164278908 @default.
- W3167041099 cites W2257979135 @default.
- W3167041099 cites W2414837766 @default.
- W3167041099 cites W2575731723 @default.
- W3167041099 cites W2592798481 @default.
- W3167041099 cites W2601465345 @default.
- W3167041099 cites W2614389101 @default.
- W3167041099 cites W2803863885 @default.
- W3167041099 cites W2902298341 @default.
- W3167041099 cites W2904435756 @default.
- W3167041099 cites W2914351253 @default.
- W3167041099 cites W2915306724 @default.
- W3167041099 cites W2916924555 @default.
- W3167041099 cites W2944956041 @default.
- W3167041099 cites W2945466622 @default.
- W3167041099 cites W2963000099 @default.
- W3167041099 cites W2963407617 @default.
- W3167041099 cites W2963747324 @default.
- W3167041099 cites W2963856199 @default.
- W3167041099 cites W2964005211 @default.
- W3167041099 cites W2964345382 @default.
- W3167041099 cites W2971094937 @default.
- W3167041099 cites W2971233598 @default.
- W3167041099 cites W2972441603 @default.
- W3167041099 cites W2982316857 @default.
- W3167041099 cites W3007455034 @default.
- W3167041099 cites W3023701936 @default.
- W3167041099 cites W3034741641 @default.
- W3167041099 cites W3090204380 @default.
- W3167041099 cites W3091819112 @default.
- W3167041099 cites W3093287223 @default.
- W3167041099 cites W3105702366 @default.
- W3167041099 cites W3110309042 @default.
- W3167041099 cites W3099303842 @default.
- W3167041099 hasPublicationYear "2021" @default.
- W3167041099 type Work @default.
- W3167041099 sameAs 3167041099 @default.
- W3167041099 citedByCount "0" @default.
- W3167041099 crossrefType "posted-content" @default.
- W3167041099 hasAuthorship W3167041099A5025896653 @default.
- W3167041099 hasAuthorship W3167041099A5039563144 @default.
- W3167041099 hasAuthorship W3167041099A5052683643 @default.
- W3167041099 hasAuthorship W3167041099A5073029812 @default.
- W3167041099 hasConcept C10138342 @default.
- W3167041099 hasConcept C105795698 @default.
- W3167041099 hasConcept C114614502 @default.
- W3167041099 hasConcept C117797892 @default.
- W3167041099 hasConcept C118615104 @default.
- W3167041099 hasConcept C126255220 @default.
- W3167041099 hasConcept C127413603 @default.
- W3167041099 hasConcept C144237770 @default.
- W3167041099 hasConcept C154945302 @default.
- W3167041099 hasConcept C15744967 @default.
- W3167041099 hasConcept C160331591 @default.
- W3167041099 hasConcept C162324750 @default.
- W3167041099 hasConcept C170154142 @default.
- W3167041099 hasConcept C182306322 @default.
- W3167041099 hasConcept C21031990 @default.
- W3167041099 hasConcept C2780009758 @default.