Matches in SemOpenAlex for { <https://semopenalex.org/work/W3183382229> ?p ?o ?g. }
- W3183382229 abstract "Reinforcement learning problems with multiple agents pose the challenge of efficiently adapting to nonstationary dynamics arising from other agents’ strategic behavior. Although several algorithms exist for these problems with promising empirical results, regret analysis and efficient use of other-agent models in general-sum games is very limited. We propose an algorithm (TSMG) for general-sum Markov games against agents that switch between several stationary policies, combining change detection with Thompson sampling to learn parametric models of these policies. Under standard assumptions for parametric Markov decision process (MDP) learning, the expected regret of TSMG in the worst case over policy parameters and switch schedules is near-optimal in time and number of switches, up to logarithmic factors. Our experiments on simulated games show that TSMG can outperform standard Thompson sampling and a version of Thompson sampling with a static reset schedule, despite the violation of an assumption that the MDPs induced by the other player are ergodic." @default.
- W3183382229 created "2021-08-02" @default.
- W3183382229 creator A5019731677 @default.
- W3183382229 creator A5051918150 @default.
- W3183382229 date "2021-07-27" @default.
- W3183382229 modified "2023-09-24" @default.
- W3183382229 title "Thompson Sampling for Markov Games with Piecewise Stationary Opponent Policies" @default.
- W3183382229 cites W1662803991 @default.
- W3183382229 cites W1865368880 @default.
- W3183382229 cites W2039522160 @default.
- W3183382229 cites W2062663664 @default.
- W3183382229 cites W2119567691 @default.
- W3183382229 cites W2120846115 @default.
- W3183382229 cites W2128394571 @default.
- W3183382229 cites W2148496763 @default.
- W3183382229 cites W223308864 @default.
- W3183382229 cites W2242236508 @default.
- W3183382229 cites W2263562440 @default.
- W3183382229 cites W2769648743 @default.
- W3183382229 cites W2892024042 @default.
- W3183382229 cites W2907626093 @default.
- W3183382229 cites W2911079225 @default.
- W3183382229 cites W2912793366 @default.
- W3183382229 cites W2914115734 @default.
- W3183382229 cites W2963043258 @default.
- W3183382229 cites W2963111827 @default.
- W3183382229 cites W2963158178 @default.
- W3183382229 cites W2963627051 @default.
- W3183382229 cites W2964254877 @default.
- W3183382229 cites W2996545258 @default.
- W3183382229 cites W3005199613 @default.
- W3183382229 cites W3005850366 @default.
- W3183382229 cites W3007673166 @default.
- W3183382229 cites W3029753614 @default.
- W3183382229 cites W3098765862 @default.
- W3183382229 hasPublicationYear "2021" @default.
- W3183382229 type Work @default.
- W3183382229 sameAs 3183382229 @default.
- W3183382229 citedByCount "1" @default.
- W3183382229 countsByYear W31833822292021 @default.
- W3183382229 crossrefType "proceedings-article" @default.
- W3183382229 hasAuthorship W3183382229A5019731677 @default.
- W3183382229 hasAuthorship W3183382229A5051918150 @default.
- W3183382229 hasConcept C105795698 @default.
- W3183382229 hasConcept C106131492 @default.
- W3183382229 hasConcept C106189395 @default.
- W3183382229 hasConcept C111919701 @default.
- W3183382229 hasConcept C117251300 @default.
- W3183382229 hasConcept C119857082 @default.
- W3183382229 hasConcept C122044880 @default.
- W3183382229 hasConcept C126255220 @default.
- W3183382229 hasConcept C134306372 @default.
- W3183382229 hasConcept C140779682 @default.
- W3183382229 hasConcept C145071142 @default.
- W3183382229 hasConcept C154945302 @default.
- W3183382229 hasConcept C159886148 @default.
- W3183382229 hasConcept C164660894 @default.
- W3183382229 hasConcept C188116033 @default.
- W3183382229 hasConcept C31972630 @default.
- W3183382229 hasConcept C33923547 @default.
- W3183382229 hasConcept C41008148 @default.
- W3183382229 hasConcept C46814582 @default.
- W3183382229 hasConcept C50817715 @default.
- W3183382229 hasConcept C68387754 @default.
- W3183382229 hasConcept C97541855 @default.
- W3183382229 hasConcept C98763669 @default.
- W3183382229 hasConceptScore W3183382229C105795698 @default.
- W3183382229 hasConceptScore W3183382229C106131492 @default.
- W3183382229 hasConceptScore W3183382229C106189395 @default.
- W3183382229 hasConceptScore W3183382229C111919701 @default.
- W3183382229 hasConceptScore W3183382229C117251300 @default.
- W3183382229 hasConceptScore W3183382229C119857082 @default.
- W3183382229 hasConceptScore W3183382229C122044880 @default.
- W3183382229 hasConceptScore W3183382229C126255220 @default.
- W3183382229 hasConceptScore W3183382229C134306372 @default.
- W3183382229 hasConceptScore W3183382229C140779682 @default.
- W3183382229 hasConceptScore W3183382229C145071142 @default.
- W3183382229 hasConceptScore W3183382229C154945302 @default.
- W3183382229 hasConceptScore W3183382229C159886148 @default.
- W3183382229 hasConceptScore W3183382229C164660894 @default.
- W3183382229 hasConceptScore W3183382229C188116033 @default.
- W3183382229 hasConceptScore W3183382229C31972630 @default.
- W3183382229 hasConceptScore W3183382229C33923547 @default.
- W3183382229 hasConceptScore W3183382229C41008148 @default.
- W3183382229 hasConceptScore W3183382229C46814582 @default.
- W3183382229 hasConceptScore W3183382229C50817715 @default.
- W3183382229 hasConceptScore W3183382229C68387754 @default.
- W3183382229 hasConceptScore W3183382229C97541855 @default.
- W3183382229 hasConceptScore W3183382229C98763669 @default.
- W3183382229 hasLocation W31833822291 @default.
- W3183382229 hasOpenAccess W3183382229 @default.
- W3183382229 hasPrimaryLocation W31833822291 @default.
- W3183382229 hasRelatedWork W1544822727 @default.
- W3183382229 hasRelatedWork W1584182316 @default.
- W3183382229 hasRelatedWork W202381653 @default.
- W3183382229 hasRelatedWork W2113618211 @default.
- W3183382229 hasRelatedWork W2341066496 @default.
- W3183382229 hasRelatedWork W2493063923 @default.
- W3183382229 hasRelatedWork W2493386564 @default.
- W3183382229 hasRelatedWork W2569183985 @default.