Matches in SemOpenAlex for { <https://semopenalex.org/work/W2895961679> ?p ?o ?g. }
- W2895961679 abstract "Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks. Furthermore, on the Atari 2600 domain, it surpasses human performance on 38 of 49 games tested (achieving a median human normalised score of 2.09), and outperforms its closest RVF competitor, Bootstrapped DQN, on 36 of those." @default.
- W2895961679 created "2018-10-26" @default.
- W2895961679 creator A5027441885 @default.
- W2895961679 creator A5029396252 @default.
- W2895961679 creator A5042967044 @default.
- W2895961679 creator A5051794439 @default.
- W2895961679 creator A5070455161 @default.
- W2895961679 creator A5081096634 @default.
- W2895961679 date "2018-10-15" @default.
- W2895961679 modified "2023-09-26" @default.
- W2895961679 title "Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning" @default.
- W2895961679 cites W1514587017 @default.
- W2895961679 cites W1582436621 @default.
- W2895961679 cites W1591803298 @default.
- W2895961679 cites W1626155273 @default.
- W2895961679 cites W2039522160 @default.
- W2895961679 cites W2056354534 @default.
- W2895961679 cites W2107726111 @default.
- W2895961679 cites W2108114251 @default.
- W2895961679 cites W2111764152 @default.
- W2895961679 cites W2121863487 @default.
- W2895961679 cites W2145339207 @default.
- W2895961679 cites W2155968351 @default.
- W2895961679 cites W2173564293 @default.
- W2895961679 cites W2417089653 @default.
- W2895961679 cites W2518564545 @default.
- W2895961679 cites W2765308067 @default.
- W2895961679 cites W2774915412 @default.
- W2895961679 cites W2807151440 @default.
- W2895961679 cites W2914261249 @default.
- W2895961679 cites W2962717849 @default.
- W2895961679 cites W2962723954 @default.
- W2895961679 cites W2962767126 @default.
- W2895961679 cites W2962996309 @default.
- W2895961679 cites W2963024489 @default.
- W2895961679 cites W2963472011 @default.
- W2895961679 cites W2963674921 @default.
- W2895961679 cites W2963751259 @default.
- W2895961679 cites W2963797557 @default.
- W2895961679 cites W2963938771 @default.
- W2895961679 cites W2964121744 @default.
- W2895961679 cites W2964291307 @default.
- W2895961679 cites W2997289589 @default.
- W2895961679 hasPublicationYear "2018" @default.
- W2895961679 type Work @default.
- W2895961679 sameAs 2895961679 @default.
- W2895961679 citedByCount "0" @default.
- W2895961679 crossrefType "posted-content" @default.
- W2895961679 hasAuthorship W2895961679A5027441885 @default.
- W2895961679 hasAuthorship W2895961679A5029396252 @default.
- W2895961679 hasAuthorship W2895961679A5042967044 @default.
- W2895961679 hasAuthorship W2895961679A5051794439 @default.
- W2895961679 hasAuthorship W2895961679A5070455161 @default.
- W2895961679 hasAuthorship W2895961679A5081096634 @default.
- W2895961679 hasConcept C111472728 @default.
- W2895961679 hasConcept C119857082 @default.
- W2895961679 hasConcept C134306372 @default.
- W2895961679 hasConcept C138885662 @default.
- W2895961679 hasConcept C14036430 @default.
- W2895961679 hasConcept C154945302 @default.
- W2895961679 hasConcept C189950617 @default.
- W2895961679 hasConcept C26517878 @default.
- W2895961679 hasConcept C33923547 @default.
- W2895961679 hasConcept C36503486 @default.
- W2895961679 hasConcept C38652104 @default.
- W2895961679 hasConcept C41008148 @default.
- W2895961679 hasConcept C50644808 @default.
- W2895961679 hasConcept C75306776 @default.
- W2895961679 hasConcept C78458016 @default.
- W2895961679 hasConcept C86803240 @default.
- W2895961679 hasConcept C97541855 @default.
- W2895961679 hasConceptScore W2895961679C111472728 @default.
- W2895961679 hasConceptScore W2895961679C119857082 @default.
- W2895961679 hasConceptScore W2895961679C134306372 @default.
- W2895961679 hasConceptScore W2895961679C138885662 @default.
- W2895961679 hasConceptScore W2895961679C14036430 @default.
- W2895961679 hasConceptScore W2895961679C154945302 @default.
- W2895961679 hasConceptScore W2895961679C189950617 @default.
- W2895961679 hasConceptScore W2895961679C26517878 @default.
- W2895961679 hasConceptScore W2895961679C33923547 @default.
- W2895961679 hasConceptScore W2895961679C36503486 @default.
- W2895961679 hasConceptScore W2895961679C38652104 @default.
- W2895961679 hasConceptScore W2895961679C41008148 @default.
- W2895961679 hasConceptScore W2895961679C50644808 @default.
- W2895961679 hasConceptScore W2895961679C75306776 @default.
- W2895961679 hasConceptScore W2895961679C78458016 @default.
- W2895961679 hasConceptScore W2895961679C86803240 @default.
- W2895961679 hasConceptScore W2895961679C97541855 @default.
- W2895961679 hasLocation W28959616791 @default.
- W2895961679 hasOpenAccess W2895961679 @default.
- W2895961679 hasPrimaryLocation W28959616791 @default.
- W2895961679 hasRelatedWork W1792865576 @default.
- W2895961679 hasRelatedWork W2143203060 @default.
- W2895961679 hasRelatedWork W2764311431 @default.
- W2895961679 hasRelatedWork W2789525339 @default.
- W2895961679 hasRelatedWork W2804075420 @default.
- W2895961679 hasRelatedWork W2891921153 @default.
- W2895961679 hasRelatedWork W2949475445 @default.
- W2895961679 hasRelatedWork W2963359646 @default.
- W2895961679 hasRelatedWork W2963737811 @default.