Matches in SemOpenAlex for { <https://semopenalex.org/work/W3042051570> ?p ?o ?g. }
- W3042051570 abstract "Reinforcement learning with function approximation can be unstable and even divergent, especially when combined with off-policy learning and Bellman updates. In deep reinforcement learning, these issues have been dealt with empirically by adapting and regularizing the representation, in particular with auxiliary tasks. This suggests that representation learning may provide a means to guarantee stability. In this paper, we formally show that there are indeed nontrivial state representations under which the canonical TD algorithm is stable, even when learning off-policy. We analyze representation learning schemes that are based on the transition matrix of a policy, such as proto-value functions, along three axes: approximation error, stability, and ease of estimation. In the most general case, we show that a Schur basis provides convergence guarantees, but is difficult to estimate from samples. For a fixed reward function, we find that an orthogonal basis of the corresponding Krylov subspace is an even better choice. We conclude by empirically demonstrating that these stable representations can be learned using stochastic gradient descent, opening the door to improved techniques for representation learning with deep networks." @default.
- W3042051570 created "2020-07-16" @default.
- W3042051570 creator A5001087292 @default.
- W3042051570 creator A5052979358 @default.
- W3042051570 date "2020-07-10" @default.
- W3042051570 modified "2023-09-27" @default.
- W3042051570 title "Representations for Stable Off-Policy Reinforcement Learning" @default.
- W3042051570 cites W115446000 @default.
- W3042051570 cites W149522324 @default.
- W3042051570 cites W1534520802 @default.
- W3042051570 cites W1547105496 @default.
- W3042051570 cites W1568229137 @default.
- W3042051570 cites W1646707810 @default.
- W3042051570 cites W2040551015 @default.
- W3042051570 cites W2056354534 @default.
- W3042051570 cites W2068778050 @default.
- W3042051570 cites W2071983464 @default.
- W3042051570 cites W2072931156 @default.
- W3042051570 cites W2109910161 @default.
- W3042051570 cites W2119567691 @default.
- W3042051570 cites W2121703796 @default.
- W3042051570 cites W2121863487 @default.
- W3042051570 cites W2123979492 @default.
- W3042051570 cites W2124477018 @default.
- W3042051570 cites W2130005627 @default.
- W3042051570 cites W2134042548 @default.
- W3042051570 cites W2138326839 @default.
- W3042051570 cites W2145339207 @default.
- W3042051570 cites W2153267861 @default.
- W3042051570 cites W2161795906 @default.
- W3042051570 cites W2163176541 @default.
- W3042051570 cites W2614839826 @default.
- W3042051570 cites W2799045927 @default.
- W3042051570 cites W2897328500 @default.
- W3042051570 cites W2902098903 @default.
- W3042051570 cites W2910907894 @default.
- W3042051570 cites W2950872548 @default.
- W3042051570 cites W2963423916 @default.
- W3042051570 cites W2963472011 @default.
- W3042051570 cites W2963488340 @default.
- W3042051570 cites W2963521487 @default.
- W3042051570 cites W2963600139 @default.
- W3042051570 cites W2964123095 @default.
- W3042051570 cites W2964190622 @default.
- W3042051570 cites W2966367827 @default.
- W3042051570 cites W2967355195 @default.
- W3042051570 cites W2970667219 @default.
- W3042051570 hasPublicationYear "2020" @default.
- W3042051570 type Work @default.
- W3042051570 sameAs 3042051570 @default.
- W3042051570 citedByCount "5" @default.
- W3042051570 countsByYear W30420515702020 @default.
- W3042051570 countsByYear W30420515702021 @default.
- W3042051570 crossrefType "posted-content" @default.
- W3042051570 hasAuthorship W3042051570A5001087292 @default.
- W3042051570 hasAuthorship W3042051570A5052979358 @default.
- W3042051570 hasConcept C112972136 @default.
- W3042051570 hasConcept C119857082 @default.
- W3042051570 hasConcept C12426560 @default.
- W3042051570 hasConcept C126255220 @default.
- W3042051570 hasConcept C14036430 @default.
- W3042051570 hasConcept C14646407 @default.
- W3042051570 hasConcept C153258448 @default.
- W3042051570 hasConcept C154945302 @default.
- W3042051570 hasConcept C162324750 @default.
- W3042051570 hasConcept C17744445 @default.
- W3042051570 hasConcept C199539241 @default.
- W3042051570 hasConcept C2524010 @default.
- W3042051570 hasConcept C2776359362 @default.
- W3042051570 hasConcept C2777303404 @default.
- W3042051570 hasConcept C2779436431 @default.
- W3042051570 hasConcept C32834561 @default.
- W3042051570 hasConcept C33923547 @default.
- W3042051570 hasConcept C41008148 @default.
- W3042051570 hasConcept C50522688 @default.
- W3042051570 hasConcept C50644808 @default.
- W3042051570 hasConcept C78458016 @default.
- W3042051570 hasConcept C86803240 @default.
- W3042051570 hasConcept C94625758 @default.
- W3042051570 hasConcept C97541855 @default.
- W3042051570 hasConceptScore W3042051570C112972136 @default.
- W3042051570 hasConceptScore W3042051570C119857082 @default.
- W3042051570 hasConceptScore W3042051570C12426560 @default.
- W3042051570 hasConceptScore W3042051570C126255220 @default.
- W3042051570 hasConceptScore W3042051570C14036430 @default.
- W3042051570 hasConceptScore W3042051570C14646407 @default.
- W3042051570 hasConceptScore W3042051570C153258448 @default.
- W3042051570 hasConceptScore W3042051570C154945302 @default.
- W3042051570 hasConceptScore W3042051570C162324750 @default.
- W3042051570 hasConceptScore W3042051570C17744445 @default.
- W3042051570 hasConceptScore W3042051570C199539241 @default.
- W3042051570 hasConceptScore W3042051570C2524010 @default.
- W3042051570 hasConceptScore W3042051570C2776359362 @default.
- W3042051570 hasConceptScore W3042051570C2777303404 @default.
- W3042051570 hasConceptScore W3042051570C2779436431 @default.
- W3042051570 hasConceptScore W3042051570C32834561 @default.
- W3042051570 hasConceptScore W3042051570C33923547 @default.
- W3042051570 hasConceptScore W3042051570C41008148 @default.
- W3042051570 hasConceptScore W3042051570C50522688 @default.
- W3042051570 hasConceptScore W3042051570C50644808 @default.