Matches in SemOpenAlex for { <https://semopenalex.org/work/W3034871777> ?p ?o ?g. }
Showing items 1 to 97 of
97
with 100 items per page.
- W3034871777 endingPage "1294" @default.
- W3034871777 startingPage "1283" @default.
- W3034871777 abstract "While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In particular, it remains elusive how to design a provably efficient policy optimization algorithm that incorporates exploration. To bridge such a gap, this paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO), which follows an ``optimistic version'' of the policy gradient direction. This paper proves that, in the problem of episodic Markov decision process with linear function approximation, unknown transition, and adversarial reward with full-information feedback, OPPO achieves $tilde{O}(sqrt{d^2 H^3 T} )$ regret. Here $d$ is the feature dimension, $H$ is the episode horizon, and $T$ is the total number of steps. To the best of our knowledge, OPPO is the first provably efficient policy optimization algorithm that explores." @default.
- W3034871777 created "2020-06-19" @default.
- W3034871777 creator A5021536336 @default.
- W3034871777 creator A5048272675 @default.
- W3034871777 creator A5078210646 @default.
- W3034871777 creator A5082311500 @default.
- W3034871777 date "2020-07-12" @default.
- W3034871777 modified "2023-09-24" @default.
- W3034871777 title "Provably Efficient Exploration in Policy Optimization" @default.
- W3034871777 hasPublicationYear "2020" @default.
- W3034871777 type Work @default.
- W3034871777 sameAs 3034871777 @default.
- W3034871777 citedByCount "76" @default.
- W3034871777 countsByYear W30348717772020 @default.
- W3034871777 countsByYear W30348717772021 @default.
- W3034871777 countsByYear W30348717772022 @default.
- W3034871777 crossrefType "proceedings-article" @default.
- W3034871777 hasAuthorship W3034871777A5021536336 @default.
- W3034871777 hasAuthorship W3034871777A5048272675 @default.
- W3034871777 hasAuthorship W3034871777A5078210646 @default.
- W3034871777 hasAuthorship W3034871777A5082311500 @default.
- W3034871777 hasConcept C105795698 @default.
- W3034871777 hasConcept C106189395 @default.
- W3034871777 hasConcept C114614502 @default.
- W3034871777 hasConcept C118615104 @default.
- W3034871777 hasConcept C119857082 @default.
- W3034871777 hasConcept C126255220 @default.
- W3034871777 hasConcept C137836250 @default.
- W3034871777 hasConcept C14036430 @default.
- W3034871777 hasConcept C14646407 @default.
- W3034871777 hasConcept C154945302 @default.
- W3034871777 hasConcept C159886148 @default.
- W3034871777 hasConcept C178635117 @default.
- W3034871777 hasConcept C33676613 @default.
- W3034871777 hasConcept C33923547 @default.
- W3034871777 hasConcept C36686422 @default.
- W3034871777 hasConcept C37736160 @default.
- W3034871777 hasConcept C38652104 @default.
- W3034871777 hasConcept C41008148 @default.
- W3034871777 hasConcept C50817715 @default.
- W3034871777 hasConcept C78458016 @default.
- W3034871777 hasConcept C86803240 @default.
- W3034871777 hasConcept C89109886 @default.
- W3034871777 hasConcept C97541855 @default.
- W3034871777 hasConceptScore W3034871777C105795698 @default.
- W3034871777 hasConceptScore W3034871777C106189395 @default.
- W3034871777 hasConceptScore W3034871777C114614502 @default.
- W3034871777 hasConceptScore W3034871777C118615104 @default.
- W3034871777 hasConceptScore W3034871777C119857082 @default.
- W3034871777 hasConceptScore W3034871777C126255220 @default.
- W3034871777 hasConceptScore W3034871777C137836250 @default.
- W3034871777 hasConceptScore W3034871777C14036430 @default.
- W3034871777 hasConceptScore W3034871777C14646407 @default.
- W3034871777 hasConceptScore W3034871777C154945302 @default.
- W3034871777 hasConceptScore W3034871777C159886148 @default.
- W3034871777 hasConceptScore W3034871777C178635117 @default.
- W3034871777 hasConceptScore W3034871777C33676613 @default.
- W3034871777 hasConceptScore W3034871777C33923547 @default.
- W3034871777 hasConceptScore W3034871777C36686422 @default.
- W3034871777 hasConceptScore W3034871777C37736160 @default.
- W3034871777 hasConceptScore W3034871777C38652104 @default.
- W3034871777 hasConceptScore W3034871777C41008148 @default.
- W3034871777 hasConceptScore W3034871777C50817715 @default.
- W3034871777 hasConceptScore W3034871777C78458016 @default.
- W3034871777 hasConceptScore W3034871777C86803240 @default.
- W3034871777 hasConceptScore W3034871777C89109886 @default.
- W3034871777 hasConceptScore W3034871777C97541855 @default.
- W3034871777 hasLocation W30348717771 @default.
- W3034871777 hasOpenAccess W3034871777 @default.
- W3034871777 hasPrimaryLocation W30348717771 @default.
- W3034871777 hasRelatedWork W1575592356 @default.
- W3034871777 hasRelatedWork W1771410628 @default.
- W3034871777 hasRelatedWork W1850488217 @default.
- W3034871777 hasRelatedWork W2119567691 @default.
- W3034871777 hasRelatedWork W2119738618 @default.
- W3034871777 hasRelatedWork W2121863487 @default.
- W3034871777 hasRelatedWork W2130801532 @default.
- W3034871777 hasRelatedWork W2545659366 @default.
- W3034871777 hasRelatedWork W2736601468 @default.
- W3034871777 hasRelatedWork W2945496654 @default.
- W3034871777 hasRelatedWork W2956123884 @default.
- W3034871777 hasRelatedWork W2963049774 @default.
- W3034871777 hasRelatedWork W2964054583 @default.
- W3034871777 hasRelatedWork W2991929641 @default.
- W3034871777 hasRelatedWork W2994818021 @default.
- W3034871777 hasRelatedWork W3029753614 @default.
- W3034871777 hasRelatedWork W3035273634 @default.
- W3034871777 hasRelatedWork W3036849408 @default.
- W3034871777 hasRelatedWork W3037341018 @default.
- W3034871777 hasRelatedWork W3046395471 @default.
- W3034871777 hasVolume "1" @default.
- W3034871777 isParatext "false" @default.
- W3034871777 isRetracted "false" @default.
- W3034871777 magId "3034871777" @default.
- W3034871777 workType "article" @default.