Matches in SemOpenAlex for { <https://semopenalex.org/work/W3085581910> ?p ?o ?g. }
- W3085581910 abstract "Reinforcement learning from self-play has recently reported many successes. Self-play, where the agents compete with themselves, is often used to generate training data for iterative policy improvement. In previous work, heuristic rules are designed to choose an opponent for the current learner. Typical rules include choosing the latest agent, the best agent, or a random historical agent. However, these rules may be inefficient in practice and sometimes do not guarantee convergence even in the simplest matrix games. This paper proposes a new algorithmic framework for competitive self-play reinforcement learning in two-player zero-sum games. We recognize the fact that the Nash equilibrium coincides with the saddle point of the stochastic payoff function, which motivates us to borrow ideas from classical saddle point optimization literature. Our method simultaneously trains several agents and intelligently takes each other as opponents based on a simple adversarial rule derived from a principled perturbation-based saddle optimization method. We prove theoretically that our algorithm converges to an approximate equilibrium with high probability in convex-concave games under standard assumptions. Beyond the theory, we further show the empirical superiority of our method over baseline methods relying on the aforementioned opponent-selection heuristics in matrix games, grid-world soccer, Gomoku, and simulated robot sumo, with neural net policy function approximators." @default.
- W3085581910 created "2020-09-21" @default.
- W3085581910 creator A5018998702 @default.
- W3085581910 creator A5061177999 @default.
- W3085581910 creator A5061853202 @default.
- W3085581910 date "2021-05-04" @default.
- W3085581910 modified "2023-09-27" @default.
- W3085581910 title "Efficient Competitive Self-Play Policy Optimization" @default.
- W3085581910 cites W103885025 @default.
- W3085581910 cites W1519983590 @default.
- W3085581910 cites W1542941925 @default.
- W3085581910 cites W1570906179 @default.
- W3085581910 cites W1597864774 @default.
- W3085581910 cites W1603538908 @default.
- W3085581910 cites W1607392272 @default.
- W3085581910 cites W2020123437 @default.
- W3085581910 cites W2103315867 @default.
- W3085581910 cites W2119717200 @default.
- W3085581910 cites W2120327309 @default.
- W3085581910 cites W2120846115 @default.
- W3085581910 cites W2121863487 @default.
- W3085581910 cites W2143343660 @default.
- W3085581910 cites W2158782408 @default.
- W3085581910 cites W2257979135 @default.
- W3085581910 cites W2291986326 @default.
- W3085581910 cites W2330024298 @default.
- W3085581910 cites W2575731723 @default.
- W3085581910 cites W2736601468 @default.
- W3085581910 cites W2766447205 @default.
- W3085581910 cites W2894677249 @default.
- W3085581910 cites W2902907165 @default.
- W3085581910 cites W2911616846 @default.
- W3085581910 cites W2945871487 @default.
- W3085581910 cites W2960876848 @default.
- W3085581910 cites W2962904119 @default.
- W3085581910 cites W2962941327 @default.
- W3085581910 cites W2963184621 @default.
- W3085581910 cites W2963887494 @default.
- W3085581910 cites W2963937357 @default.
- W3085581910 cites W2964043796 @default.
- W3085581910 cites W2982316857 @default.
- W3085581910 cites W2995520132 @default.
- W3085581910 cites W2996037775 @default.
- W3085581910 cites W2996343955 @default.
- W3085581910 cites W3005199613 @default.
- W3085581910 cites W3035060314 @default.
- W3085581910 cites W2131600418 @default.
- W3085581910 hasPublicationYear "2021" @default.
- W3085581910 type Work @default.
- W3085581910 sameAs 3085581910 @default.
- W3085581910 citedByCount "0" @default.
- W3085581910 crossrefType "journal-article" @default.
- W3085581910 hasAuthorship W3085581910A5018998702 @default.
- W3085581910 hasAuthorship W3085581910A5061177999 @default.
- W3085581910 hasAuthorship W3085581910A5061853202 @default.
- W3085581910 hasConcept C126255220 @default.
- W3085581910 hasConcept C127705205 @default.
- W3085581910 hasConcept C144237770 @default.
- W3085581910 hasConcept C154945302 @default.
- W3085581910 hasConcept C22171661 @default.
- W3085581910 hasConcept C32407928 @default.
- W3085581910 hasConcept C33923547 @default.
- W3085581910 hasConcept C41008148 @default.
- W3085581910 hasConcept C46814582 @default.
- W3085581910 hasConcept C97541855 @default.
- W3085581910 hasConceptScore W3085581910C126255220 @default.
- W3085581910 hasConceptScore W3085581910C127705205 @default.
- W3085581910 hasConceptScore W3085581910C144237770 @default.
- W3085581910 hasConceptScore W3085581910C154945302 @default.
- W3085581910 hasConceptScore W3085581910C22171661 @default.
- W3085581910 hasConceptScore W3085581910C32407928 @default.
- W3085581910 hasConceptScore W3085581910C33923547 @default.
- W3085581910 hasConceptScore W3085581910C41008148 @default.
- W3085581910 hasConceptScore W3085581910C46814582 @default.
- W3085581910 hasConceptScore W3085581910C97541855 @default.
- W3085581910 hasLocation W30855819101 @default.
- W3085581910 hasOpenAccess W3085581910 @default.
- W3085581910 hasPrimaryLocation W30855819101 @default.
- W3085581910 hasRelatedWork W1608293404 @default.
- W3085581910 hasRelatedWork W1980813607 @default.
- W3085581910 hasRelatedWork W2047918528 @default.
- W3085581910 hasRelatedWork W2057550885 @default.
- W3085581910 hasRelatedWork W2114630608 @default.
- W3085581910 hasRelatedWork W2149519648 @default.
- W3085581910 hasRelatedWork W2151327297 @default.
- W3085581910 hasRelatedWork W2188793289 @default.
- W3085581910 hasRelatedWork W2288502617 @default.
- W3085581910 hasRelatedWork W2750605955 @default.
- W3085581910 hasRelatedWork W2892013712 @default.
- W3085581910 hasRelatedWork W2949381094 @default.
- W3085581910 hasRelatedWork W2993335844 @default.
- W3085581910 hasRelatedWork W3090662669 @default.
- W3085581910 hasRelatedWork W3118463161 @default.
- W3085581910 hasRelatedWork W3135853125 @default.
- W3085581910 hasRelatedWork W3151813343 @default.
- W3085581910 hasRelatedWork W3172715706 @default.
- W3085581910 hasRelatedWork W3184804348 @default.
- W3085581910 hasRelatedWork W3033370016 @default.
- W3085581910 isParatext "false" @default.
- W3085581910 isRetracted "false" @default.