Matches in SemOpenAlex for { <https://semopenalex.org/work/W2994818021> ?p ?o ?g. }
- W2994818021 abstract "While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In particular, it remains elusive how to design a provably efficient policy optimization algorithm that incorporates exploration. To bridge such a gap, this paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO), which follows an ``optimistic version'' of the policy gradient direction. This paper proves that, in the problem of episodic Markov decision process with linear function approximation, unknown transition, and adversarial reward with full-information feedback, OPPO achieves $tilde{O}(sqrt{d^2 H^3 T} )$ regret. Here $d$ is the feature dimension, $H$ is the episode horizon, and $T$ is the total number of steps. To the best of our knowledge, OPPO is the first provably efficient policy optimization algorithm that explores." @default.
- W2994818021 created "2019-12-26" @default.
- W2994818021 creator A5021536336 @default.
- W2994818021 creator A5048272675 @default.
- W2994818021 creator A5078210646 @default.
- W2994818021 creator A5082311500 @default.
- W2994818021 date "2019-12-12" @default.
- W2994818021 modified "2023-09-27" @default.
- W2994818021 title "Provably Efficient Exploration in Policy Optimization" @default.
- W2994818021 cites W107583932 @default.
- W2994818021 cites W1541730457 @default.
- W2994818021 cites W1597303641 @default.
- W2994818021 cites W1771410628 @default.
- W2994818021 cites W183249136 @default.
- W2994818021 cites W1850488217 @default.
- W2994818021 cites W2049934117 @default.
- W2994818021 cites W2061753713 @default.
- W2994818021 cites W2072931156 @default.
- W2994818021 cites W2074680702 @default.
- W2994818021 cites W2096199223 @default.
- W2994818021 cites W2119717200 @default.
- W2994818021 cites W2119738618 @default.
- W2994818021 cites W2121863487 @default.
- W2994818021 cites W2128477394 @default.
- W2994818021 cites W2128812357 @default.
- W2994818021 cites W2129670787 @default.
- W2994818021 cites W2130801532 @default.
- W2994818021 cites W2149166950 @default.
- W2994818021 cites W2150234726 @default.
- W2994818021 cites W2155027007 @default.
- W2994818021 cites W2156211713 @default.
- W2994818021 cites W2156737235 @default.
- W2994818021 cites W2168405694 @default.
- W2994818021 cites W2257979135 @default.
- W2994818021 cites W2545659366 @default.
- W2994818021 cites W2736601468 @default.
- W2994818021 cites W2739559388 @default.
- W2994818021 cites W2766447205 @default.
- W2994818021 cites W2883364792 @default.
- W2994818021 cites W2886474253 @default.
- W2994818021 cites W2890347272 @default.
- W2994818021 cites W2907626093 @default.
- W2994818021 cites W2945496654 @default.
- W2994818021 cites W2948432982 @default.
- W2994818021 cites W2949578685 @default.
- W2994818021 cites W2951326042 @default.
- W2994818021 cites W2952500758 @default.
- W2994818021 cites W2956123884 @default.
- W2994818021 cites W2962901215 @default.
- W2994818021 cites W2963049774 @default.
- W2994818021 cites W2963434013 @default.
- W2994818021 cites W2963641140 @default.
- W2994818021 cites W2963872309 @default.
- W2994818021 cites W2964054583 @default.
- W2994818021 cites W2970161765 @default.
- W2994818021 cites W2970355847 @default.
- W2994818021 cites W2971085818 @default.
- W2994818021 cites W2971587637 @default.
- W2994818021 cites W2990210896 @default.
- W2994818021 cites W2990830025 @default.
- W2994818021 cites W3029753614 @default.
- W2994818021 cites W3034360859 @default.
- W2994818021 cites W3035273634 @default.
- W2994818021 cites W3036466821 @default.
- W2994818021 cites W3036849408 @default.
- W2994818021 cites W3046395471 @default.
- W2994818021 cites W3046626913 @default.
- W2994818021 cites W3117137507 @default.
- W2994818021 hasPublicationYear "2019" @default.
- W2994818021 type Work @default.
- W2994818021 sameAs 2994818021 @default.
- W2994818021 citedByCount "4" @default.
- W2994818021 countsByYear W29948180212020 @default.
- W2994818021 countsByYear W29948180212021 @default.
- W2994818021 crossrefType "posted-content" @default.
- W2994818021 hasAuthorship W2994818021A5021536336 @default.
- W2994818021 hasAuthorship W2994818021A5048272675 @default.
- W2994818021 hasAuthorship W2994818021A5078210646 @default.
- W2994818021 hasAuthorship W2994818021A5082311500 @default.
- W2994818021 hasConcept C105795698 @default.
- W2994818021 hasConcept C106189395 @default.
- W2994818021 hasConcept C114614502 @default.
- W2994818021 hasConcept C118615104 @default.
- W2994818021 hasConcept C119857082 @default.
- W2994818021 hasConcept C126255220 @default.
- W2994818021 hasConcept C137836250 @default.
- W2994818021 hasConcept C14036430 @default.
- W2994818021 hasConcept C14646407 @default.
- W2994818021 hasConcept C154945302 @default.
- W2994818021 hasConcept C159886148 @default.
- W2994818021 hasConcept C178635117 @default.
- W2994818021 hasConcept C33676613 @default.
- W2994818021 hasConcept C33923547 @default.
- W2994818021 hasConcept C36686422 @default.
- W2994818021 hasConcept C37736160 @default.
- W2994818021 hasConcept C38652104 @default.
- W2994818021 hasConcept C41008148 @default.
- W2994818021 hasConcept C50817715 @default.
- W2994818021 hasConcept C78458016 @default.
- W2994818021 hasConcept C86803240 @default.