Matches in SemOpenAlex for { <https://semopenalex.org/work/W2950892788> ?p ?o ?g. }
- W2950892788 abstract "One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function. Because each component typically only depends on a subset of all features, the corresponding value function can be approximated more easily by a low-dimensional representation, enabling more effective learning. We demonstrate HRA on a toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human performance." @default.
- W2950892788 created "2019-06-27" @default.
- W2950892788 creator A5001309574 @default.
- W2950892788 creator A5024475685 @default.
- W2950892788 creator A5049195732 @default.
- W2950892788 creator A5068980118 @default.
- W2950892788 creator A5070690406 @default.
- W2950892788 creator A5089214987 @default.
- W2950892788 date "2017-06-13" @default.
- W2950892788 modified "2023-09-27" @default.
- W2950892788 title "Hybrid Reward Architecture for Reinforcement Learning" @default.
- W2950892788 cites W1538762724 @default.
- W2950892788 cites W1590744975 @default.
- W2950892788 cites W1592847719 @default.
- W2950892788 cites W1658008008 @default.
- W2950892788 cites W1677182931 @default.
- W2950892788 cites W1777239053 @default.
- W2950892788 cites W2034806191 @default.
- W2950892788 cites W2073384958 @default.
- W2950892788 cites W2100752967 @default.
- W2950892788 cites W2109910161 @default.
- W2950892788 cites W2121863487 @default.
- W2950892788 cites W2132622533 @default.
- W2950892788 cites W2136202932 @default.
- W2950892788 cites W2145339207 @default.
- W2950892788 cites W2168405694 @default.
- W2950892788 cites W2173564293 @default.
- W2950892788 cites W2335959470 @default.
- W2950892788 cites W2442341664 @default.
- W2950892788 cites W2509374375 @default.
- W2950892788 cites W2523728418 @default.
- W2950892788 cites W2950872548 @default.
- W2950892788 cites W2952523895 @default.
- W2950892788 cites W2963477884 @default.
- W2950892788 cites W2964043796 @default.
- W2950892788 cites W567721252 @default.
- W2950892788 cites W72400652 @default.
- W2950892788 hasPublicationYear "2017" @default.
- W2950892788 type Work @default.
- W2950892788 sameAs 2950892788 @default.
- W2950892788 citedByCount "9" @default.
- W2950892788 countsByYear W29508927882017 @default.
- W2950892788 countsByYear W29508927882018 @default.
- W2950892788 countsByYear W29508927882020 @default.
- W2950892788 countsByYear W29508927882021 @default.
- W2950892788 crossrefType "posted-content" @default.
- W2950892788 hasAuthorship W2950892788A5001309574 @default.
- W2950892788 hasAuthorship W2950892788A5024475685 @default.
- W2950892788 hasAuthorship W2950892788A5049195732 @default.
- W2950892788 hasAuthorship W2950892788A5068980118 @default.
- W2950892788 hasAuthorship W2950892788A5070690406 @default.
- W2950892788 hasAuthorship W2950892788A5089214987 @default.
- W2950892788 hasConcept C119857082 @default.
- W2950892788 hasConcept C121332964 @default.
- W2950892788 hasConcept C123657996 @default.
- W2950892788 hasConcept C126255220 @default.
- W2950892788 hasConcept C127413603 @default.
- W2950892788 hasConcept C14036430 @default.
- W2950892788 hasConcept C142362112 @default.
- W2950892788 hasConcept C14646407 @default.
- W2950892788 hasConcept C153349607 @default.
- W2950892788 hasConcept C154945302 @default.
- W2950892788 hasConcept C168167062 @default.
- W2950892788 hasConcept C17744445 @default.
- W2950892788 hasConcept C199539241 @default.
- W2950892788 hasConcept C2776291640 @default.
- W2950892788 hasConcept C2776359362 @default.
- W2950892788 hasConcept C33923547 @default.
- W2950892788 hasConcept C41008148 @default.
- W2950892788 hasConcept C66938386 @default.
- W2950892788 hasConcept C67203356 @default.
- W2950892788 hasConcept C78458016 @default.
- W2950892788 hasConcept C86803240 @default.
- W2950892788 hasConcept C94625758 @default.
- W2950892788 hasConcept C97355855 @default.
- W2950892788 hasConcept C97541855 @default.
- W2950892788 hasConceptScore W2950892788C119857082 @default.
- W2950892788 hasConceptScore W2950892788C121332964 @default.
- W2950892788 hasConceptScore W2950892788C123657996 @default.
- W2950892788 hasConceptScore W2950892788C126255220 @default.
- W2950892788 hasConceptScore W2950892788C127413603 @default.
- W2950892788 hasConceptScore W2950892788C14036430 @default.
- W2950892788 hasConceptScore W2950892788C142362112 @default.
- W2950892788 hasConceptScore W2950892788C14646407 @default.
- W2950892788 hasConceptScore W2950892788C153349607 @default.
- W2950892788 hasConceptScore W2950892788C154945302 @default.
- W2950892788 hasConceptScore W2950892788C168167062 @default.
- W2950892788 hasConceptScore W2950892788C17744445 @default.
- W2950892788 hasConceptScore W2950892788C199539241 @default.
- W2950892788 hasConceptScore W2950892788C2776291640 @default.
- W2950892788 hasConceptScore W2950892788C2776359362 @default.
- W2950892788 hasConceptScore W2950892788C33923547 @default.
- W2950892788 hasConceptScore W2950892788C41008148 @default.
- W2950892788 hasConceptScore W2950892788C66938386 @default.
- W2950892788 hasConceptScore W2950892788C67203356 @default.
- W2950892788 hasConceptScore W2950892788C78458016 @default.
- W2950892788 hasConceptScore W2950892788C86803240 @default.
- W2950892788 hasConceptScore W2950892788C94625758 @default.
- W2950892788 hasConceptScore W2950892788C97355855 @default.
- W2950892788 hasConceptScore W2950892788C97541855 @default.