Matches in SemOpenAlex for { <https://semopenalex.org/work/W2806985155> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W2806985155 abstract "When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades. The fundamental difficulty is that the Bellman operator may become an expansion in general, resulting in oscillating and even divergent behavior of popular algorithms like Q-learning. In this paper, we revisit the Bellman equation, and reformulate it into a novel primal-dual optimization problem using Nesterov's smoothing technique and the Legendre-Fenchel transformation. We then develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem where any differentiable function class may be used. We provide what we believe to be the first convergence guarantee for general nonlinear function approximation, and analyze the algorithm's sample complexity. Empirically, our algorithm compares favorably to state-of-the-art baselines in several benchmark control problems." @default.
- W2806985155 created "2018-06-13" @default.
- W2806985155 creator A5002701522 @default.
- W2806985155 creator A5030589527 @default.
- W2806985155 creator A5054850777 @default.
- W2806985155 creator A5058479146 @default.
- W2806985155 creator A5071683073 @default.
- W2806985155 creator A5081758689 @default.
- W2806985155 creator A5086484914 @default.
- W2806985155 creator A5088339142 @default.
- W2806985155 date "2017-12-29" @default.
- W2806985155 modified "2023-10-11" @default.
- W2806985155 title "SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation" @default.
- W2806985155 doi "https://doi.org/10.48550/arxiv.1712.10285" @default.
- W2806985155 hasPublicationYear "2017" @default.
- W2806985155 type Work @default.
- W2806985155 sameAs 2806985155 @default.
- W2806985155 citedByCount "103" @default.
- W2806985155 countsByYear W28069851552017 @default.
- W2806985155 countsByYear W28069851552018 @default.
- W2806985155 countsByYear W28069851552019 @default.
- W2806985155 countsByYear W28069851552020 @default.
- W2806985155 countsByYear W28069851552021 @default.
- W2806985155 countsByYear W28069851552022 @default.
- W2806985155 crossrefType "posted-content" @default.
- W2806985155 hasAuthorship W2806985155A5002701522 @default.
- W2806985155 hasAuthorship W2806985155A5030589527 @default.
- W2806985155 hasAuthorship W2806985155A5054850777 @default.
- W2806985155 hasAuthorship W2806985155A5058479146 @default.
- W2806985155 hasAuthorship W2806985155A5071683073 @default.
- W2806985155 hasAuthorship W2806985155A5081758689 @default.
- W2806985155 hasAuthorship W2806985155A5086484914 @default.
- W2806985155 hasAuthorship W2806985155A5088339142 @default.
- W2806985155 hasBestOaLocation W28069851551 @default.
- W2806985155 hasConcept C104317684 @default.
- W2806985155 hasConcept C126255220 @default.
- W2806985155 hasConcept C13280743 @default.
- W2806985155 hasConcept C137836250 @default.
- W2806985155 hasConcept C14646407 @default.
- W2806985155 hasConcept C154945302 @default.
- W2806985155 hasConcept C158448853 @default.
- W2806985155 hasConcept C17020691 @default.
- W2806985155 hasConcept C185592680 @default.
- W2806985155 hasConcept C185798385 @default.
- W2806985155 hasConcept C205649164 @default.
- W2806985155 hasConcept C33923547 @default.
- W2806985155 hasConcept C41008148 @default.
- W2806985155 hasConcept C50644808 @default.
- W2806985155 hasConcept C55493867 @default.
- W2806985155 hasConcept C86339819 @default.
- W2806985155 hasConcept C91873725 @default.
- W2806985155 hasConcept C97541855 @default.
- W2806985155 hasConceptScore W2806985155C104317684 @default.
- W2806985155 hasConceptScore W2806985155C126255220 @default.
- W2806985155 hasConceptScore W2806985155C13280743 @default.
- W2806985155 hasConceptScore W2806985155C137836250 @default.
- W2806985155 hasConceptScore W2806985155C14646407 @default.
- W2806985155 hasConceptScore W2806985155C154945302 @default.
- W2806985155 hasConceptScore W2806985155C158448853 @default.
- W2806985155 hasConceptScore W2806985155C17020691 @default.
- W2806985155 hasConceptScore W2806985155C185592680 @default.
- W2806985155 hasConceptScore W2806985155C185798385 @default.
- W2806985155 hasConceptScore W2806985155C205649164 @default.
- W2806985155 hasConceptScore W2806985155C33923547 @default.
- W2806985155 hasConceptScore W2806985155C41008148 @default.
- W2806985155 hasConceptScore W2806985155C50644808 @default.
- W2806985155 hasConceptScore W2806985155C55493867 @default.
- W2806985155 hasConceptScore W2806985155C86339819 @default.
- W2806985155 hasConceptScore W2806985155C91873725 @default.
- W2806985155 hasConceptScore W2806985155C97541855 @default.
- W2806985155 hasLocation W28069851551 @default.
- W2806985155 hasOpenAccess W2806985155 @default.
- W2806985155 hasPrimaryLocation W28069851551 @default.
- W2806985155 hasRelatedWork W2230869072 @default.
- W2806985155 hasRelatedWork W2775408020 @default.
- W2806985155 hasRelatedWork W3040891685 @default.
- W2806985155 hasRelatedWork W3115682199 @default.
- W2806985155 hasRelatedWork W3163781174 @default.
- W2806985155 hasRelatedWork W3214094365 @default.
- W2806985155 hasRelatedWork W4226345898 @default.
- W2806985155 hasRelatedWork W4285484150 @default.
- W2806985155 hasRelatedWork W4298336974 @default.
- W2806985155 hasRelatedWork W4303494752 @default.
- W2806985155 isParatext "false" @default.
- W2806985155 isRetracted "false" @default.
- W2806985155 magId "2806985155" @default.
- W2806985155 workType "article" @default.