Matches in SemOpenAlex for { <https://semopenalex.org/work/W3106393551> ?p ?o ?g. }
Showing items 1 to 91 of
91
with 100 items per page.
- W3106393551 endingPage "1403" @default.
- W3106393551 startingPage "1392" @default.
- W3106393551 abstract "The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address largescale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods." @default.
- W3106393551 created "2020-11-23" @default.
- W3106393551 creator A5070339884 @default.
- W3106393551 creator A5077167635 @default.
- W3106393551 date "2020-01-01" @default.
- W3106393551 modified "2023-09-29" @default.
- W3106393551 title "A Unifying View of Optimism in Episodic Reinforcement Learning" @default.
- W3106393551 hasPublicationYear "2020" @default.
- W3106393551 type Work @default.
- W3106393551 sameAs 3106393551 @default.
- W3106393551 citedByCount "20" @default.
- W3106393551 countsByYear W31063935512019 @default.
- W3106393551 countsByYear W31063935512020 @default.
- W3106393551 countsByYear W31063935512021 @default.
- W3106393551 crossrefType "proceedings-article" @default.
- W3106393551 hasAuthorship W3106393551A5070339884 @default.
- W3106393551 hasAuthorship W3106393551A5077167635 @default.
- W3106393551 hasConcept C111472728 @default.
- W3106393551 hasConcept C11413529 @default.
- W3106393551 hasConcept C119857082 @default.
- W3106393551 hasConcept C126255220 @default.
- W3106393551 hasConcept C138885662 @default.
- W3106393551 hasConcept C14036430 @default.
- W3106393551 hasConcept C14646407 @default.
- W3106393551 hasConcept C154945302 @default.
- W3106393551 hasConcept C17744445 @default.
- W3106393551 hasConcept C199539241 @default.
- W3106393551 hasConcept C2776359362 @default.
- W3106393551 hasConcept C2777212361 @default.
- W3106393551 hasConcept C2780586882 @default.
- W3106393551 hasConcept C33923547 @default.
- W3106393551 hasConcept C37404715 @default.
- W3106393551 hasConcept C41008148 @default.
- W3106393551 hasConcept C49937458 @default.
- W3106393551 hasConcept C78458016 @default.
- W3106393551 hasConcept C80444323 @default.
- W3106393551 hasConcept C86803240 @default.
- W3106393551 hasConcept C94625758 @default.
- W3106393551 hasConcept C97541855 @default.
- W3106393551 hasConceptScore W3106393551C111472728 @default.
- W3106393551 hasConceptScore W3106393551C11413529 @default.
- W3106393551 hasConceptScore W3106393551C119857082 @default.
- W3106393551 hasConceptScore W3106393551C126255220 @default.
- W3106393551 hasConceptScore W3106393551C138885662 @default.
- W3106393551 hasConceptScore W3106393551C14036430 @default.
- W3106393551 hasConceptScore W3106393551C14646407 @default.
- W3106393551 hasConceptScore W3106393551C154945302 @default.
- W3106393551 hasConceptScore W3106393551C17744445 @default.
- W3106393551 hasConceptScore W3106393551C199539241 @default.
- W3106393551 hasConceptScore W3106393551C2776359362 @default.
- W3106393551 hasConceptScore W3106393551C2777212361 @default.
- W3106393551 hasConceptScore W3106393551C2780586882 @default.
- W3106393551 hasConceptScore W3106393551C33923547 @default.
- W3106393551 hasConceptScore W3106393551C37404715 @default.
- W3106393551 hasConceptScore W3106393551C41008148 @default.
- W3106393551 hasConceptScore W3106393551C49937458 @default.
- W3106393551 hasConceptScore W3106393551C78458016 @default.
- W3106393551 hasConceptScore W3106393551C80444323 @default.
- W3106393551 hasConceptScore W3106393551C86803240 @default.
- W3106393551 hasConceptScore W3106393551C94625758 @default.
- W3106393551 hasConceptScore W3106393551C97541855 @default.
- W3106393551 hasLocation W31063935511 @default.
- W3106393551 hasOpenAccess W3106393551 @default.
- W3106393551 hasPrimaryLocation W31063935511 @default.
- W3106393551 hasRelatedWork W107583932 @default.
- W3106393551 hasRelatedWork W1505937442 @default.
- W3106393551 hasRelatedWork W1662803991 @default.
- W3106393551 hasRelatedWork W1850488217 @default.
- W3106393551 hasRelatedWork W1867103660 @default.
- W3106393551 hasRelatedWork W2119567691 @default.
- W3106393551 hasRelatedWork W2129670787 @default.
- W3106393551 hasRelatedWork W2907502549 @default.
- W3106393551 hasRelatedWork W2963049774 @default.
- W3106393551 hasRelatedWork W2963158178 @default.
- W3106393551 hasRelatedWork W2963582321 @default.
- W3106393551 hasRelatedWork W2964054583 @default.
- W3106393551 hasRelatedWork W2964299116 @default.
- W3106393551 hasRelatedWork W2965004202 @default.
- W3106393551 hasRelatedWork W2971249033 @default.
- W3106393551 hasRelatedWork W3020325294 @default.
- W3106393551 hasRelatedWork W3033522400 @default.
- W3106393551 hasRelatedWork W3034871777 @default.
- W3106393551 hasRelatedWork W3046395471 @default.
- W3106393551 hasRelatedWork W3088243654 @default.
- W3106393551 hasVolume "33" @default.
- W3106393551 isParatext "false" @default.
- W3106393551 isRetracted "false" @default.
- W3106393551 magId "3106393551" @default.
- W3106393551 workType "article" @default.