Matches in SemOpenAlex for { <https://semopenalex.org/work/W3005731428> ?p ?o ?g. }
- W3005731428 abstract "We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap. In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret. For both settings, we propose an optimistic variant of the least-squares minimax value iteration algorithm. We show that our algorithm is computationally efficient and provably achieves an $tilde O(sqrt{d^3 H^3 T} )$ upper bound on the duality gap and regret, where $d$ is the linear dimension, $H$ the horizon and $T$ the total number of timesteps. Our results do not require additional assumptions on the sampling model. Our setting requires overcoming several new challenges that are absent in Markov decision processes or turn-based Markov games. In particular, to achieve optimism with simultaneous moves, we construct both upper and lower confidence bounds of the value function, and then compute the optimistic policy by solving a general-sum matrix game with these bounds as the payoff matrices. As finding the Nash Equilibrium of a general-sum game is computationally hard, our algorithm instead solves for a Coarse Correlated Equilibrium (CCE), which can be obtained efficiently. To our best knowledge, such a CCE-based scheme for optimism has not appeared in the literature and might be of interest in its own right." @default.
- W3005731428 created "2020-02-24" @default.
- W3005731428 creator A5008882694 @default.
- W3005731428 creator A5048272675 @default.
- W3005731428 creator A5060774236 @default.
- W3005731428 creator A5078210646 @default.
- W3005731428 date "2020-02-17" @default.
- W3005731428 modified "2023-09-26" @default.
- W3005731428 title "Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium" @default.
- W3005731428 cites W107583932 @default.
- W3005731428 cites W1519783625 @default.
- W3005731428 cites W1542595278 @default.
- W3005731428 cites W1542941925 @default.
- W3005731428 cites W1626977535 @default.
- W3005731428 cites W1788877992 @default.
- W3005731428 cites W1850488217 @default.
- W3005731428 cites W1973039793 @default.
- W3005731428 cites W1988716172 @default.
- W3005731428 cites W2002373723 @default.
- W3005731428 cites W2003832486 @default.
- W3005731428 cites W2034184818 @default.
- W3005731428 cites W2057913812 @default.
- W3005731428 cites W2058557460 @default.
- W3005731428 cites W2072931156 @default.
- W3005731428 cites W2099618002 @default.
- W3005731428 cites W2117355432 @default.
- W3005731428 cites W2118929276 @default.
- W3005731428 cites W2119567691 @default.
- W3005731428 cites W2119738618 @default.
- W3005731428 cites W2120846115 @default.
- W3005731428 cites W2121863487 @default.
- W3005731428 cites W2129670787 @default.
- W3005731428 cites W2136503687 @default.
- W3005731428 cites W2141076336 @default.
- W3005731428 cites W2156737235 @default.
- W3005731428 cites W2164637474 @default.
- W3005731428 cites W2257979135 @default.
- W3005731428 cites W2295179707 @default.
- W3005731428 cites W2395575420 @default.
- W3005731428 cites W2413079700 @default.
- W3005731428 cites W2462023919 @default.
- W3005731428 cites W2463221887 @default.
- W3005731428 cites W2518564545 @default.
- W3005731428 cites W2530849036 @default.
- W3005731428 cites W2545659366 @default.
- W3005731428 cites W2557283755 @default.
- W3005731428 cites W2574978968 @default.
- W3005731428 cites W2575731723 @default.
- W3005731428 cites W2736601468 @default.
- W3005731428 cites W2762117857 @default.
- W3005731428 cites W2766447205 @default.
- W3005731428 cites W2773381986 @default.
- W3005731428 cites W2787270134 @default.
- W3005731428 cites W2799183551 @default.
- W3005731428 cites W2890189120 @default.
- W3005731428 cites W2892013712 @default.
- W3005731428 cites W2907626093 @default.
- W3005731428 cites W2911793117 @default.
- W3005731428 cites W2911931139 @default.
- W3005731428 cites W2914351253 @default.
- W3005731428 cites W2919115771 @default.
- W3005731428 cites W2945496654 @default.
- W3005731428 cites W2948677277 @default.
- W3005731428 cites W2949510923 @default.
- W3005731428 cites W2951713757 @default.
- W3005731428 cites W2951831808 @default.
- W3005731428 cites W2960876848 @default.
- W3005731428 cites W2962723383 @default.
- W3005731428 cites W2963000099 @default.
- W3005731428 cites W2963049774 @default.
- W3005731428 cites W2963111827 @default.
- W3005731428 cites W2963407617 @default.
- W3005731428 cites W2963434013 @default.
- W3005731428 cites W2963582321 @default.
- W3005731428 cites W2964054583 @default.
- W3005731428 cites W2965497096 @default.
- W3005731428 cites W2970355847 @default.
- W3005731428 cites W2970770768 @default.
- W3005731428 cites W2970884920 @default.
- W3005731428 cites W2971085818 @default.
- W3005731428 cites W2971249033 @default.
- W3005731428 cites W2971936494 @default.
- W3005731428 cites W2973525135 @default.
- W3005731428 cites W2982316857 @default.
- W3005731428 cites W2990210896 @default.
- W3005731428 cites W2991046523 @default.
- W3005731428 cites W2991929641 @default.
- W3005731428 cites W2991935368 @default.
- W3005731428 cites W2995638039 @default.
- W3005731428 cites W3014860839 @default.
- W3005731428 cites W3022125780 @default.
- W3005731428 cites W3035273634 @default.
- W3005731428 cites W3037341018 @default.
- W3005731428 cites W3046395471 @default.
- W3005731428 cites W3100426776 @default.
- W3005731428 cites W2152790647 @default.
- W3005731428 doi "https://doi.org/10.48550/arxiv.2002.07066" @default.
- W3005731428 hasPublicationYear "2020" @default.
- W3005731428 type Work @default.
- W3005731428 sameAs 3005731428 @default.