Matches in SemOpenAlex for { <https://semopenalex.org/work/W2902709966> ?p ?o ?g. }
- W2902709966 abstract "Many reality tasks such as robot coordination can be naturally modelled as multi-agent cooperative system where the rewards are sparse. This paper focuses on learning decentralized policies for such tasks using sub-optimal demonstration. To learn the multi-agent cooperation effectively and tackle the sub-optimality of demonstration, a self-improving learning method is proposed: On the one hand, the centralized state-action values are initialized by the demonstration and updated by the learned decentralized policy to improve the sub-optimality. On the other hand, the Nash Equilibrium are found by the current state-action value and are used as a guide to learn the policy. The proposed method is evaluated on the combat RTS games which requires a high level of multi-agent cooperation. Extensive experimental results on various combat scenarios demonstrate that the proposed method can learn multi-agent cooperation effectively. It significantly outperforms many state-of-the-art demonstration based approaches." @default.
- W2902709966 created "2018-12-11" @default.
- W2902709966 creator A5019045873 @default.
- W2902709966 creator A5070755370 @default.
- W2902709966 creator A5076090670 @default.
- W2902709966 date "2018-12-05" @default.
- W2902709966 modified "2023-09-27" @default.
- W2902709966 title "Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration." @default.
- W2902709966 cites W1518858799 @default.
- W2902709966 cites W1641379095 @default.
- W2902709966 cites W1750339092 @default.
- W2902709966 cites W1931877416 @default.
- W2902709966 cites W2022504108 @default.
- W2902709966 cites W2061562262 @default.
- W2902709966 cites W2069195348 @default.
- W2902709966 cites W2099618002 @default.
- W2902709966 cites W2101934420 @default.
- W2902709966 cites W2102847492 @default.
- W2902709966 cites W2110906765 @default.
- W2902709966 cites W2113023245 @default.
- W2902709966 cites W2122763142 @default.
- W2902709966 cites W2134779831 @default.
- W2902709966 cites W2145339207 @default.
- W2902709966 cites W2147492008 @default.
- W2902709966 cites W2148112459 @default.
- W2902709966 cites W2257979135 @default.
- W2902709966 cites W2402402867 @default.
- W2902709966 cites W2434014514 @default.
- W2902709966 cites W2518713116 @default.
- W2902709966 cites W2565610523 @default.
- W2902709966 cites W2604283518 @default.
- W2902709966 cites W2623431351 @default.
- W2902709966 cites W2741122588 @default.
- W2902709966 cites W2756196406 @default.
- W2902709966 cites W2766447205 @default.
- W2902709966 cites W2776126823 @default.
- W2902709966 cites W2788862220 @default.
- W2902709966 cites W2789386227 @default.
- W2902709966 cites W2800978878 @default.
- W2902709966 cites W2803616302 @default.
- W2902709966 cites W2808602506 @default.
- W2902709966 cites W2919115771 @default.
- W2902709966 cites W2949201811 @default.
- W2902709966 cites W2950735232 @default.
- W2902709966 cites W2962938168 @default.
- W2902709966 cites W2963039558 @default.
- W2902709966 cites W2963658727 @default.
- W2902709966 cites W2963937357 @default.
- W2902709966 cites W2964098908 @default.
- W2902709966 cites W2964161785 @default.
- W2902709966 cites W3093287223 @default.
- W2902709966 cites W2524264252 @default.
- W2902709966 hasPublicationYear "2018" @default.
- W2902709966 type Work @default.
- W2902709966 sameAs 2902709966 @default.
- W2902709966 citedByCount "0" @default.
- W2902709966 crossrefType "posted-content" @default.
- W2902709966 hasAuthorship W2902709966A5019045873 @default.
- W2902709966 hasAuthorship W2902709966A5070755370 @default.
- W2902709966 hasAuthorship W2902709966A5076090670 @default.
- W2902709966 hasConcept C11413529 @default.
- W2902709966 hasConcept C121332964 @default.
- W2902709966 hasConcept C126255220 @default.
- W2902709966 hasConcept C154945302 @default.
- W2902709966 hasConcept C188116033 @default.
- W2902709966 hasConcept C2780791683 @default.
- W2902709966 hasConcept C33923547 @default.
- W2902709966 hasConcept C41008148 @default.
- W2902709966 hasConcept C41550386 @default.
- W2902709966 hasConcept C46814582 @default.
- W2902709966 hasConcept C48103436 @default.
- W2902709966 hasConcept C62520636 @default.
- W2902709966 hasConcept C97541855 @default.
- W2902709966 hasConceptScore W2902709966C11413529 @default.
- W2902709966 hasConceptScore W2902709966C121332964 @default.
- W2902709966 hasConceptScore W2902709966C126255220 @default.
- W2902709966 hasConceptScore W2902709966C154945302 @default.
- W2902709966 hasConceptScore W2902709966C188116033 @default.
- W2902709966 hasConceptScore W2902709966C2780791683 @default.
- W2902709966 hasConceptScore W2902709966C33923547 @default.
- W2902709966 hasConceptScore W2902709966C41008148 @default.
- W2902709966 hasConceptScore W2902709966C41550386 @default.
- W2902709966 hasConceptScore W2902709966C46814582 @default.
- W2902709966 hasConceptScore W2902709966C48103436 @default.
- W2902709966 hasConceptScore W2902709966C62520636 @default.
- W2902709966 hasConceptScore W2902709966C97541855 @default.
- W2902709966 hasLocation W29027099661 @default.
- W2902709966 hasOpenAccess W2902709966 @default.
- W2902709966 hasPrimaryLocation W29027099661 @default.
- W2902709966 hasRelatedWork W1602079996 @default.
- W2902709966 hasRelatedWork W2130126738 @default.
- W2902709966 hasRelatedWork W2259575993 @default.
- W2902709966 hasRelatedWork W2889157317 @default.
- W2902709966 hasRelatedWork W2937587379 @default.
- W2902709966 hasRelatedWork W2953318161 @default.
- W2902709966 hasRelatedWork W2970727184 @default.
- W2902709966 hasRelatedWork W2973408519 @default.
- W2902709966 hasRelatedWork W2973843138 @default.
- W2902709966 hasRelatedWork W2989068617 @default.
- W2902709966 hasRelatedWork W3003470745 @default.