Matches in SemOpenAlex for { <https://semopenalex.org/work/W4313648272> ?p ?o ?g. }
Showing items 1 to 83 of
83
with 100 items per page.
- W4313648272 endingPage "15" @default.
- W4313648272 startingPage "12" @default.
- W4313648272 abstract "Reinforcement learning (RL) is a paradigm where an agent learns to accomplish tasks by interacting with the environment, similar to how humans learn. RL is therefore viewed as a promising approach to achieve artificial intelligence, as evidenced by the remarkable empirical successes. However, many RL algorithms are theoretically not well-understood, especially in the setting where function approximation and off-policy sampling are employed. My thesis [1] aims at developing thorough theoretical understanding to the performance of various RL algorithms through finite-sample analysis. Since most of the RL algorithms are essentially stochastic approximation (SA) algorithms for solving variants of the Bellman equation, the first part of thesis is dedicated to the analysis of general SA involving a contraction operator, and under Markovian noise. We develop a Lyapunov approach where we construct a novel Lyapunov function called the generaled Moreau envelope. The results on SA enable us to establish finite-sample bounds of various RL algorithms in the tabular setting (cf. Part II of the thesis) and when using function approximation (cf. Part III of the thesis), which in turn provide theoretical insights to several important problems in the RL community, such as the efficiency of bootstrapping, the bias-variance trade-off in off-policy learning, and the stability of off-policy control. The main body of this document provides an overview of the contributions of my thesis." @default.
- W4313648272 created "2023-01-07" @default.
- W4313648272 creator A5058269077 @default.
- W4313648272 date "2022-12-30" @default.
- W4313648272 modified "2023-09-28" @default.
- W4313648272 title "A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms" @default.
- W4313648272 cites W1977655452 @default.
- W4313648272 cites W3164660575 @default.
- W4313648272 cites W4213377513 @default.
- W4313648272 cites W4214535770 @default.
- W4313648272 cites W4297329999 @default.
- W4313648272 cites W4313648272 @default.
- W4313648272 doi "https://doi.org/10.1145/3579342.3579346" @default.
- W4313648272 hasPublicationYear "2022" @default.
- W4313648272 type Work @default.
- W4313648272 citedByCount "1" @default.
- W4313648272 countsByYear W43136482722022 @default.
- W4313648272 crossrefType "journal-article" @default.
- W4313648272 hasAuthorship W4313648272A5058269077 @default.
- W4313648272 hasBestOaLocation W43136482722 @default.
- W4313648272 hasConcept C112972136 @default.
- W4313648272 hasConcept C11413529 @default.
- W4313648272 hasConcept C119857082 @default.
- W4313648272 hasConcept C121332964 @default.
- W4313648272 hasConcept C126255220 @default.
- W4313648272 hasConcept C14036430 @default.
- W4313648272 hasConcept C154945302 @default.
- W4313648272 hasConcept C158622935 @default.
- W4313648272 hasConcept C196340769 @default.
- W4313648272 hasConcept C26517878 @default.
- W4313648272 hasConcept C33923547 @default.
- W4313648272 hasConcept C38652104 @default.
- W4313648272 hasConcept C41008148 @default.
- W4313648272 hasConcept C50644808 @default.
- W4313648272 hasConcept C55479107 @default.
- W4313648272 hasConcept C60640748 @default.
- W4313648272 hasConcept C62520636 @default.
- W4313648272 hasConcept C78458016 @default.
- W4313648272 hasConcept C86803240 @default.
- W4313648272 hasConcept C91873725 @default.
- W4313648272 hasConcept C97541855 @default.
- W4313648272 hasConceptScore W4313648272C112972136 @default.
- W4313648272 hasConceptScore W4313648272C11413529 @default.
- W4313648272 hasConceptScore W4313648272C119857082 @default.
- W4313648272 hasConceptScore W4313648272C121332964 @default.
- W4313648272 hasConceptScore W4313648272C126255220 @default.
- W4313648272 hasConceptScore W4313648272C14036430 @default.
- W4313648272 hasConceptScore W4313648272C154945302 @default.
- W4313648272 hasConceptScore W4313648272C158622935 @default.
- W4313648272 hasConceptScore W4313648272C196340769 @default.
- W4313648272 hasConceptScore W4313648272C26517878 @default.
- W4313648272 hasConceptScore W4313648272C33923547 @default.
- W4313648272 hasConceptScore W4313648272C38652104 @default.
- W4313648272 hasConceptScore W4313648272C41008148 @default.
- W4313648272 hasConceptScore W4313648272C50644808 @default.
- W4313648272 hasConceptScore W4313648272C55479107 @default.
- W4313648272 hasConceptScore W4313648272C60640748 @default.
- W4313648272 hasConceptScore W4313648272C62520636 @default.
- W4313648272 hasConceptScore W4313648272C78458016 @default.
- W4313648272 hasConceptScore W4313648272C86803240 @default.
- W4313648272 hasConceptScore W4313648272C91873725 @default.
- W4313648272 hasConceptScore W4313648272C97541855 @default.
- W4313648272 hasIssue "3" @default.
- W4313648272 hasLocation W43136482721 @default.
- W4313648272 hasLocation W43136482722 @default.
- W4313648272 hasOpenAccess W4313648272 @default.
- W4313648272 hasPrimaryLocation W43136482721 @default.
- W4313648272 hasRelatedWork W1561685851 @default.
- W4313648272 hasRelatedWork W2625757775 @default.
- W4313648272 hasRelatedWork W2964182728 @default.
- W4313648272 hasRelatedWork W2981237928 @default.
- W4313648272 hasRelatedWork W2983986640 @default.
- W4313648272 hasRelatedWork W3206576350 @default.
- W4313648272 hasRelatedWork W4285484150 @default.
- W4313648272 hasRelatedWork W4297729153 @default.
- W4313648272 hasRelatedWork W4313648272 @default.
- W4313648272 hasRelatedWork W4318718437 @default.
- W4313648272 hasVolume "50" @default.
- W4313648272 isParatext "false" @default.
- W4313648272 isRetracted "false" @default.
- W4313648272 workType "article" @default.