Matches in SemOpenAlex for { <https://semopenalex.org/work/W75509695> ?p ?o ?g. }
Showing items 1 to 100 of
100
with 100 items per page.
- W75509695 endingPage "576" @default.
- W75509695 startingPage "568" @default.
- W75509695 abstract "Q-learning, the most popular of reinforcement learning algorithms, has always included an extension to eligibility traces to enable more rapid learning and improved asymptotic performance on non-Markov problems. The λ parameter smoothly shifts on-policy algorithms such as TD(λ) and Sarsa(λ) from a pure bootstrapping form (λ=0) to a pure Monte Carlo form (λ=1). In off-policy algorithms, including Q(λ), GQ(λ), and off-policy LSTD(λ), the λ parameter is intended to play the same role, but does not; on every exploratory action these algorithms bootstrap regardless of the value of λ, and as a result they fail to approximate Monte Carlo learning when λ = 1. It may seem that this is inevitable for any online off-policy algorithm; if updates are made on each step on which the target policy is followed, then how could just the right updates be 'un-made' upon deviation from the target policy? In this paper, we introduce a new version of Q(λ) that does exactly that, without significantly increased algorithmic complexity. En route to our new Q(λ), we introduce a new derivation technique based on the forward-view/backward-view analysis familiar from TD(λ) but extended to apply at every time step rather than only at the end of episodes. We apply this technique to derive first a new off-policy version of TD(λ), called PTD(λ), and then our new Q(λ), called PQ(λ)." @default.
- W75509695 created "2016-06-24" @default.
- W75509695 creator A5004923102 @default.
- W75509695 creator A5017492994 @default.
- W75509695 creator A5033135596 @default.
- W75509695 creator A5065836447 @default.
- W75509695 date "2014-06-21" @default.
- W75509695 modified "2023-10-03" @default.
- W75509695 title "A new Q(lambda) with interim forward view and Monte Carlo equivalence" @default.
- W75509695 cites W1514587017 @default.
- W75509695 cites W1515851193 @default.
- W75509695 cites W1583330603 @default.
- W75509695 cites W1594216983 @default.
- W75509695 cites W1600046456 @default.
- W75509695 cites W1716849269 @default.
- W75509695 cites W2027648864 @default.
- W75509695 cites W2100677568 @default.
- W75509695 cites W2100752967 @default.
- W75509695 cites W2102863375 @default.
- W75509695 cites W2113913482 @default.
- W75509695 cites W2114901408 @default.
- W75509695 cites W2121863487 @default.
- W75509695 cites W2132622533 @default.
- W75509695 cites W2154761920 @default.
- W75509695 cites W2159752377 @default.
- W75509695 cites W2188892596 @default.
- W75509695 cites W2401533533 @default.
- W75509695 cites W2473364827 @default.
- W75509695 cites W3011120880 @default.
- W75509695 hasPublicationYear "2014" @default.
- W75509695 type Work @default.
- W75509695 sameAs 75509695 @default.
- W75509695 citedByCount "26" @default.
- W75509695 countsByYear W755096952014 @default.
- W75509695 countsByYear W755096952015 @default.
- W75509695 countsByYear W755096952016 @default.
- W75509695 countsByYear W755096952017 @default.
- W75509695 countsByYear W755096952018 @default.
- W75509695 countsByYear W755096952019 @default.
- W75509695 countsByYear W755096952020 @default.
- W75509695 crossrefType "proceedings-article" @default.
- W75509695 hasAuthorship W75509695A5004923102 @default.
- W75509695 hasAuthorship W75509695A5017492994 @default.
- W75509695 hasAuthorship W75509695A5033135596 @default.
- W75509695 hasAuthorship W75509695A5065836447 @default.
- W75509695 hasConcept C105795698 @default.
- W75509695 hasConcept C106189395 @default.
- W75509695 hasConcept C11413529 @default.
- W75509695 hasConcept C118615104 @default.
- W75509695 hasConcept C126255220 @default.
- W75509695 hasConcept C154945302 @default.
- W75509695 hasConcept C159886148 @default.
- W75509695 hasConcept C188116033 @default.
- W75509695 hasConcept C19499675 @default.
- W75509695 hasConcept C2780069185 @default.
- W75509695 hasConcept C33923547 @default.
- W75509695 hasConcept C41008148 @default.
- W75509695 hasConcept C97541855 @default.
- W75509695 hasConceptScore W75509695C105795698 @default.
- W75509695 hasConceptScore W75509695C106189395 @default.
- W75509695 hasConceptScore W75509695C11413529 @default.
- W75509695 hasConceptScore W75509695C118615104 @default.
- W75509695 hasConceptScore W75509695C126255220 @default.
- W75509695 hasConceptScore W75509695C154945302 @default.
- W75509695 hasConceptScore W75509695C159886148 @default.
- W75509695 hasConceptScore W75509695C188116033 @default.
- W75509695 hasConceptScore W75509695C19499675 @default.
- W75509695 hasConceptScore W75509695C2780069185 @default.
- W75509695 hasConceptScore W75509695C33923547 @default.
- W75509695 hasConceptScore W75509695C41008148 @default.
- W75509695 hasConceptScore W75509695C97541855 @default.
- W75509695 hasLocation W755096951 @default.
- W75509695 hasOpenAccess W75509695 @default.
- W75509695 hasPrimaryLocation W755096951 @default.
- W75509695 hasRelatedWork W1514587017 @default.
- W75509695 hasRelatedWork W1515851193 @default.
- W75509695 hasRelatedWork W1547925194 @default.
- W75509695 hasRelatedWork W1576452626 @default.
- W75509695 hasRelatedWork W1583330603 @default.
- W75509695 hasRelatedWork W1594216983 @default.
- W75509695 hasRelatedWork W1600046456 @default.
- W75509695 hasRelatedWork W1646707810 @default.
- W75509695 hasRelatedWork W2027648864 @default.
- W75509695 hasRelatedWork W2072931156 @default.
- W75509695 hasRelatedWork W2075268401 @default.
- W75509695 hasRelatedWork W2100677568 @default.
- W75509695 hasRelatedWork W2109910161 @default.
- W75509695 hasRelatedWork W2121863487 @default.
- W75509695 hasRelatedWork W2132622533 @default.
- W75509695 hasRelatedWork W2145339207 @default.
- W75509695 hasRelatedWork W2188892596 @default.
- W75509695 hasRelatedWork W2898155524 @default.
- W75509695 hasRelatedWork W3011120880 @default.
- W75509695 hasRelatedWork W3103780890 @default.
- W75509695 isParatext "false" @default.
- W75509695 isRetracted "false" @default.
- W75509695 magId "75509695" @default.
- W75509695 workType "article" @default.