Matches in SemOpenAlex for { <https://semopenalex.org/work/W3005199613> ?p ?o ?g. }
- W3005199613 abstract "Self-play, where the algorithm learns by playing against itself without requiring any direct supervision, has become the new weapon in modern Reinforcement Learning (RL) for achieving superhuman performance in practice. However, the majority of exisiting theory in reinforcement learning only applies to the setting where the agent plays against a fixed environment; it remains largely open whether self-play algorithms can be provably effective, especially when it is necessary to manage the exploration/exploitation tradeoff. We study self-play in competitive reinforcement learning under the setting of Markov games, a generalization of Markov decision processes to the two-player case. We introduce a self-play algorithm---Value Iteration with Upper/Lower Confidence Bound (VI-ULCB)---and show that it achieves regret $tilde{mathcal{O}}(sqrt{T})$ after playing $T$ steps of the game, where the regret is measured by the agent's performance against a emph{fully adversarial} opponent who can exploit the agent's strategy at emph{any} step. We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret of $tilde{mathcal{O}}(T^{2/3})$, but is guaranteed to run in polynomial time even in the worst case. To the best of our knowledge, our work presents the first line of provably sample-efficient self-play algorithms for competitive reinforcement learning." @default.
- W3005199613 created "2020-02-14" @default.
- W3005199613 creator A5051934623 @default.
- W3005199613 creator A5082311500 @default.
- W3005199613 date "2020-02-10" @default.
- W3005199613 modified "2023-09-27" @default.
- W3005199613 title "Provable Self-Play Algorithms for Competitive Reinforcement Learning" @default.
- W3005199613 cites W1496590343 @default.
- W3005199613 cites W1505937442 @default.
- W3005199613 cites W1519783625 @default.
- W3005199613 cites W1542595278 @default.
- W3005199613 cites W1542941925 @default.
- W3005199613 cites W1850488217 @default.
- W3005199613 cites W206679605 @default.
- W3005199613 cites W2075567596 @default.
- W3005199613 cites W2089649903 @default.
- W3005199613 cites W2120846115 @default.
- W3005199613 cites W2126211987 @default.
- W3005199613 cites W2133096155 @default.
- W3005199613 cites W2141076336 @default.
- W3005199613 cites W2150234726 @default.
- W3005199613 cites W2162926979 @default.
- W3005199613 cites W2168405694 @default.
- W3005199613 cites W2489939061 @default.
- W3005199613 cites W2518564545 @default.
- W3005199613 cites W2766447205 @default.
- W3005199613 cites W2907502549 @default.
- W3005199613 cites W2918364912 @default.
- W3005199613 cites W2946912408 @default.
- W3005199613 cites W2948345763 @default.
- W3005199613 cites W2951831808 @default.
- W3005199613 cites W2963049774 @default.
- W3005199613 cites W2963582321 @default.
- W3005199613 cites W2964054583 @default.
- W3005199613 cites W2968526727 @default.
- W3005199613 cites W2970884920 @default.
- W3005199613 cites W2991046523 @default.
- W3005199613 cites W2991935368 @default.
- W3005199613 cites W3004977066 @default.
- W3005199613 cites W3025093529 @default.
- W3005199613 cites W3035759338 @default.
- W3005199613 hasPublicationYear "2020" @default.
- W3005199613 type Work @default.
- W3005199613 sameAs 3005199613 @default.
- W3005199613 citedByCount "18" @default.
- W3005199613 countsByYear W30051996132020 @default.
- W3005199613 countsByYear W30051996132021 @default.
- W3005199613 crossrefType "posted-content" @default.
- W3005199613 hasAuthorship W3005199613A5051934623 @default.
- W3005199613 hasAuthorship W3005199613A5082311500 @default.
- W3005199613 hasConcept C102408133 @default.
- W3005199613 hasConcept C105795698 @default.
- W3005199613 hasConcept C106189395 @default.
- W3005199613 hasConcept C11413529 @default.
- W3005199613 hasConcept C118615104 @default.
- W3005199613 hasConcept C119857082 @default.
- W3005199613 hasConcept C134306372 @default.
- W3005199613 hasConcept C154945302 @default.
- W3005199613 hasConcept C159886148 @default.
- W3005199613 hasConcept C165696696 @default.
- W3005199613 hasConcept C177148314 @default.
- W3005199613 hasConcept C188116033 @default.
- W3005199613 hasConcept C33923547 @default.
- W3005199613 hasConcept C36686422 @default.
- W3005199613 hasConcept C38652104 @default.
- W3005199613 hasConcept C41008148 @default.
- W3005199613 hasConcept C50817715 @default.
- W3005199613 hasConcept C77553402 @default.
- W3005199613 hasConcept C80444323 @default.
- W3005199613 hasConcept C97541855 @default.
- W3005199613 hasConcept C98763669 @default.
- W3005199613 hasConceptScore W3005199613C102408133 @default.
- W3005199613 hasConceptScore W3005199613C105795698 @default.
- W3005199613 hasConceptScore W3005199613C106189395 @default.
- W3005199613 hasConceptScore W3005199613C11413529 @default.
- W3005199613 hasConceptScore W3005199613C118615104 @default.
- W3005199613 hasConceptScore W3005199613C119857082 @default.
- W3005199613 hasConceptScore W3005199613C134306372 @default.
- W3005199613 hasConceptScore W3005199613C154945302 @default.
- W3005199613 hasConceptScore W3005199613C159886148 @default.
- W3005199613 hasConceptScore W3005199613C165696696 @default.
- W3005199613 hasConceptScore W3005199613C177148314 @default.
- W3005199613 hasConceptScore W3005199613C188116033 @default.
- W3005199613 hasConceptScore W3005199613C33923547 @default.
- W3005199613 hasConceptScore W3005199613C36686422 @default.
- W3005199613 hasConceptScore W3005199613C38652104 @default.
- W3005199613 hasConceptScore W3005199613C41008148 @default.
- W3005199613 hasConceptScore W3005199613C50817715 @default.
- W3005199613 hasConceptScore W3005199613C77553402 @default.
- W3005199613 hasConceptScore W3005199613C80444323 @default.
- W3005199613 hasConceptScore W3005199613C97541855 @default.
- W3005199613 hasConceptScore W3005199613C98763669 @default.
- W3005199613 hasLocation W30051996131 @default.
- W3005199613 hasOpenAccess W3005199613 @default.
- W3005199613 hasPrimaryLocation W30051996131 @default.
- W3005199613 hasRelatedWork W1505937442 @default.
- W3005199613 hasRelatedWork W1519783625 @default.
- W3005199613 hasRelatedWork W1542941925 @default.
- W3005199613 hasRelatedWork W1850488217 @default.
- W3005199613 hasRelatedWork W2120846115 @default.