Matches in SemOpenAlex for { <https://semopenalex.org/work/W4295832433> ?p ?o ?g. }
Showing items 1 to 61 of
61
with 100 items per page.
- W4295832433 abstract "Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q($sigma$) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q($sigma$) algorithm to an online multi-step algorithm Q($sigma, lambda$) using eligibility traces and introduces Double Q($sigma$) as the extension of Q($sigma$) to double learning. Experiments suggest that the new Q($sigma, lambda$) algorithm can outperform the classical TD control methods Sarsa($lambda$), Q($lambda$) and Q($sigma$)." @default.
- W4295832433 created "2022-09-15" @default.
- W4295832433 creator A5009940993 @default.
- W4295832433 date "2017-11-05" @default.
- W4295832433 modified "2023-10-18" @default.
- W4295832433 title "Double Q($sigma$) and Q($sigma, lambda$): Unifying Reinforcement Learning Control Algorithms" @default.
- W4295832433 doi "https://doi.org/10.48550/arxiv.1711.01569" @default.
- W4295832433 hasPublicationYear "2017" @default.
- W4295832433 type Work @default.
- W4295832433 citedByCount "0" @default.
- W4295832433 crossrefType "posted-content" @default.
- W4295832433 hasAuthorship W4295832433A5009940993 @default.
- W4295832433 hasBestOaLocation W42958324331 @default.
- W4295832433 hasConcept C11413529 @default.
- W4295832433 hasConcept C114614502 @default.
- W4295832433 hasConcept C121332964 @default.
- W4295832433 hasConcept C127413603 @default.
- W4295832433 hasConcept C154945302 @default.
- W4295832433 hasConcept C202444582 @default.
- W4295832433 hasConcept C23119410 @default.
- W4295832433 hasConcept C2778049214 @default.
- W4295832433 hasConcept C2778113609 @default.
- W4295832433 hasConcept C33923547 @default.
- W4295832433 hasConcept C34146451 @default.
- W4295832433 hasConcept C41008148 @default.
- W4295832433 hasConcept C42360764 @default.
- W4295832433 hasConcept C62520636 @default.
- W4295832433 hasConcept C9652623 @default.
- W4295832433 hasConcept C97541855 @default.
- W4295832433 hasConceptScore W4295832433C11413529 @default.
- W4295832433 hasConceptScore W4295832433C114614502 @default.
- W4295832433 hasConceptScore W4295832433C121332964 @default.
- W4295832433 hasConceptScore W4295832433C127413603 @default.
- W4295832433 hasConceptScore W4295832433C154945302 @default.
- W4295832433 hasConceptScore W4295832433C202444582 @default.
- W4295832433 hasConceptScore W4295832433C23119410 @default.
- W4295832433 hasConceptScore W4295832433C2778049214 @default.
- W4295832433 hasConceptScore W4295832433C2778113609 @default.
- W4295832433 hasConceptScore W4295832433C33923547 @default.
- W4295832433 hasConceptScore W4295832433C34146451 @default.
- W4295832433 hasConceptScore W4295832433C41008148 @default.
- W4295832433 hasConceptScore W4295832433C42360764 @default.
- W4295832433 hasConceptScore W4295832433C62520636 @default.
- W4295832433 hasConceptScore W4295832433C9652623 @default.
- W4295832433 hasConceptScore W4295832433C97541855 @default.
- W4295832433 hasLocation W42958324331 @default.
- W4295832433 hasOpenAccess W4295832433 @default.
- W4295832433 hasPrimaryLocation W42958324331 @default.
- W4295832433 hasRelatedWork W2041475568 @default.
- W4295832433 hasRelatedWork W2057145714 @default.
- W4295832433 hasRelatedWork W2058591591 @default.
- W4295832433 hasRelatedWork W2061090826 @default.
- W4295832433 hasRelatedWork W2334230563 @default.
- W4295832433 hasRelatedWork W2485727175 @default.
- W4295832433 hasRelatedWork W3019115378 @default.
- W4295832433 hasRelatedWork W3100791588 @default.
- W4295832433 hasRelatedWork W3104405637 @default.
- W4295832433 hasRelatedWork W4295267532 @default.
- W4295832433 isParatext "false" @default.
- W4295832433 isRetracted "false" @default.
- W4295832433 workType "article" @default.