Matches in SemOpenAlex for { <https://semopenalex.org/work/W2441569124> ?p ?o ?g. }
- W2441569124 endingPage "1525" @default.
- W2441569124 startingPage "1519" @default.
- W2441569124 abstract "Monte Carlo Tree Search (MCTS) methods have proven powerful in planning for sequential decision-making problems such as Go and video games, but their performance can be poor when the planning depth and sampling trajectories are limited or when the rewards are sparse. We present an adaptation of PGRD (policy-gradient for reward-design) for learning a reward-bonus function to improve UCT (a MCTS algorithm). Unlike previous applications of PGRD in which the space of reward-bonus functions was limited to linear functions of hand-coded state-action-features, we use PGRD with a multi-layer convolutional neural network to automatically learn features from raw perception as well as to adapt the non-linear reward-bonus function parameters. We also adopt a variance-reducing gradient method to improve PGRD's performance. The new method improves UCT's performance on multiple ATARI games compared to UCT without the reward bonus. Combining PGRD and Deep Learning in this way should make adapting rewards for MCTS algorithms far more widely and practically applicable than before." @default.
- W2441569124 created "2016-06-24" @default.
- W2441569124 creator A5051951021 @default.
- W2441569124 creator A5064419280 @default.
- W2441569124 creator A5065366930 @default.
- W2441569124 creator A5077109450 @default.
- W2441569124 date "2016-07-09" @default.
- W2441569124 modified "2023-09-24" @default.
- W2441569124 title "Deep learning for reward design to improve Monte Carlo tree search in ATARI games" @default.
- W2441569124 cites W1512866498 @default.
- W2441569124 cites W1625390266 @default.
- W2441569124 cites W1777239053 @default.
- W2441569124 cites W1997840820 @default.
- W2441569124 cites W2041367235 @default.
- W2441569124 cites W2061562262 @default.
- W2441569124 cites W2064675550 @default.
- W2441569124 cites W2076063813 @default.
- W2441569124 cites W2084920657 @default.
- W2441569124 cites W2118688707 @default.
- W2441569124 cites W2122372591 @default.
- W2441569124 cites W2127842795 @default.
- W2441569124 cites W2130325614 @default.
- W2441569124 cites W2135995480 @default.
- W2441569124 cites W2140365369 @default.
- W2441569124 cites W2145339207 @default.
- W2441569124 cites W2151210636 @default.
- W2441569124 cites W2156718681 @default.
- W2441569124 cites W2159600763 @default.
- W2441569124 cites W2163602945 @default.
- W2441569124 cites W2164424353 @default.
- W2441569124 cites W2184714326 @default.
- W2441569124 cites W2296576527 @default.
- W2441569124 cites W2396836028 @default.
- W2441569124 cites W2401523698 @default.
- W2441569124 cites W2949608212 @default.
- W2441569124 cites W2964121744 @default.
- W2441569124 cites W64134055 @default.
- W2441569124 cites W779494576 @default.
- W2441569124 hasPublicationYear "2016" @default.
- W2441569124 type Work @default.
- W2441569124 sameAs 2441569124 @default.
- W2441569124 citedByCount "9" @default.
- W2441569124 countsByYear W24415691242016 @default.
- W2441569124 countsByYear W24415691242017 @default.
- W2441569124 countsByYear W24415691242019 @default.
- W2441569124 countsByYear W24415691242021 @default.
- W2441569124 crossrefType "proceedings-article" @default.
- W2441569124 hasAuthorship W2441569124A5051951021 @default.
- W2441569124 hasAuthorship W2441569124A5064419280 @default.
- W2441569124 hasAuthorship W2441569124A5065366930 @default.
- W2441569124 hasAuthorship W2441569124A5077109450 @default.
- W2441569124 hasConcept C105795698 @default.
- W2441569124 hasConcept C113174947 @default.
- W2441569124 hasConcept C119857082 @default.
- W2441569124 hasConcept C120665830 @default.
- W2441569124 hasConcept C121332964 @default.
- W2441569124 hasConcept C121955636 @default.
- W2441569124 hasConcept C126255220 @default.
- W2441569124 hasConcept C134306372 @default.
- W2441569124 hasConcept C139807058 @default.
- W2441569124 hasConcept C14036430 @default.
- W2441569124 hasConcept C144133560 @default.
- W2441569124 hasConcept C154945302 @default.
- W2441569124 hasConcept C19499675 @default.
- W2441569124 hasConcept C196083921 @default.
- W2441569124 hasConcept C33923547 @default.
- W2441569124 hasConcept C41008148 @default.
- W2441569124 hasConcept C46149586 @default.
- W2441569124 hasConcept C78458016 @default.
- W2441569124 hasConcept C81363708 @default.
- W2441569124 hasConcept C86803240 @default.
- W2441569124 hasConcept C97541855 @default.
- W2441569124 hasConceptScore W2441569124C105795698 @default.
- W2441569124 hasConceptScore W2441569124C113174947 @default.
- W2441569124 hasConceptScore W2441569124C119857082 @default.
- W2441569124 hasConceptScore W2441569124C120665830 @default.
- W2441569124 hasConceptScore W2441569124C121332964 @default.
- W2441569124 hasConceptScore W2441569124C121955636 @default.
- W2441569124 hasConceptScore W2441569124C126255220 @default.
- W2441569124 hasConceptScore W2441569124C134306372 @default.
- W2441569124 hasConceptScore W2441569124C139807058 @default.
- W2441569124 hasConceptScore W2441569124C14036430 @default.
- W2441569124 hasConceptScore W2441569124C144133560 @default.
- W2441569124 hasConceptScore W2441569124C154945302 @default.
- W2441569124 hasConceptScore W2441569124C19499675 @default.
- W2441569124 hasConceptScore W2441569124C196083921 @default.
- W2441569124 hasConceptScore W2441569124C33923547 @default.
- W2441569124 hasConceptScore W2441569124C41008148 @default.
- W2441569124 hasConceptScore W2441569124C46149586 @default.
- W2441569124 hasConceptScore W2441569124C78458016 @default.
- W2441569124 hasConceptScore W2441569124C81363708 @default.
- W2441569124 hasConceptScore W2441569124C86803240 @default.
- W2441569124 hasConceptScore W2441569124C97541855 @default.
- W2441569124 hasLocation W24415691241 @default.
- W2441569124 hasOpenAccess W2441569124 @default.
- W2441569124 hasPrimaryLocation W24415691241 @default.
- W2441569124 hasRelatedWork W1777239053 @default.
- W2441569124 hasRelatedWork W1969302761 @default.