Matches in SemOpenAlex for { <https://semopenalex.org/work/W2172264025> ?p ?o ?g. }
- W2172264025 endingPage "50" @default.
- W2172264025 startingPage "33" @default.
- W2172264025 abstract "To excel in challenging tasks, intelligent agents need sophisticated mechanisms for action selection: they need policies that dictate what action to take in each situation. Reinforcement learning (RL) algorithms are designed to learn such policies given only positive and negative rewards. Two contrasting approaches to RL that are currently in popular use are temporal difference (TD) methods, which learn value functions, and evolutionary methods, which optimize populations of candidate policies. Both approaches have had practical successes but few studies have directly compared them. Hence, there are no general guidelines describing their relative strengths and weaknesses. In addition, there has been little cross-collaboration, with few attempts to make them work together or to apply ideas from one to the other. In this article we aim to address these shortcomings via three empirical studies that compare these methods and investigate new ways of making them work together. First, we compare the two approaches in a benchmark task and identify variations of the task that isolate factors critical to the performance of each method. Second, we investigate ways to make evolutionary algorithms excel at on-line tasks by borrowing exploratory mechanisms traditionally used by TD methods. We present empirical results demonstrating a dramatic performance improvement. Third, we explore a novel way of making evolutionary and TD methods work together by using evolution to automatically discover good representations for TD function approximators. We present results demonstrating that this novel approach can outperform both TD and evolutionary methods alone." @default.
- W2172264025 created "2016-06-24" @default.
- W2172264025 creator A5001594330 @default.
- W2172264025 creator A5001808269 @default.
- W2172264025 creator A5056879203 @default.
- W2172264025 date "2007-03-01" @default.
- W2172264025 modified "2023-10-16" @default.
- W2172264025 title "Empirical Studies in Action Selection with Reinforcement Learning" @default.
- W2172264025 cites W1543341872 @default.
- W2172264025 cites W1568161011 @default.
- W2172264025 cites W1574700590 @default.
- W2172264025 cites W1914583973 @default.
- W2172264025 cites W1931792391 @default.
- W2172264025 cites W1979356863 @default.
- W2172264025 cites W1990911977 @default.
- W2172264025 cites W2027475699 @default.
- W2172264025 cites W2089052147 @default.
- W2172264025 cites W2104641222 @default.
- W2172264025 cites W2111935653 @default.
- W2172264025 cites W2113913482 @default.
- W2172264025 cites W2124290836 @default.
- W2172264025 cites W2128262600 @default.
- W2172264025 cites W2141559645 @default.
- W2172264025 cites W2162813238 @default.
- W2172264025 cites W2165713838 @default.
- W2172264025 cites W2168405694 @default.
- W2172264025 cites W2169659168 @default.
- W2172264025 cites W2169803171 @default.
- W2172264025 cites W3041202696 @default.
- W2172264025 cites W4232335189 @default.
- W2172264025 doi "https://doi.org/10.1177/1059712306076253" @default.
- W2172264025 hasPublicationYear "2007" @default.
- W2172264025 type Work @default.
- W2172264025 sameAs 2172264025 @default.
- W2172264025 citedByCount "48" @default.
- W2172264025 countsByYear W21722640252012 @default.
- W2172264025 countsByYear W21722640252013 @default.
- W2172264025 countsByYear W21722640252014 @default.
- W2172264025 countsByYear W21722640252015 @default.
- W2172264025 countsByYear W21722640252017 @default.
- W2172264025 countsByYear W21722640252018 @default.
- W2172264025 countsByYear W21722640252019 @default.
- W2172264025 countsByYear W21722640252022 @default.
- W2172264025 crossrefType "journal-article" @default.
- W2172264025 hasAuthorship W2172264025A5001594330 @default.
- W2172264025 hasAuthorship W2172264025A5001808269 @default.
- W2172264025 hasAuthorship W2172264025A5056879203 @default.
- W2172264025 hasConcept C105795698 @default.
- W2172264025 hasConcept C105902424 @default.
- W2172264025 hasConcept C119857082 @default.
- W2172264025 hasConcept C120936955 @default.
- W2172264025 hasConcept C121332964 @default.
- W2172264025 hasConcept C13280743 @default.
- W2172264025 hasConcept C14036430 @default.
- W2172264025 hasConcept C154945302 @default.
- W2172264025 hasConcept C159149176 @default.
- W2172264025 hasConcept C162324750 @default.
- W2172264025 hasConcept C166109690 @default.
- W2172264025 hasConcept C169760540 @default.
- W2172264025 hasConcept C185798385 @default.
- W2172264025 hasConcept C187736073 @default.
- W2172264025 hasConcept C196340769 @default.
- W2172264025 hasConcept C205649164 @default.
- W2172264025 hasConcept C26760741 @default.
- W2172264025 hasConcept C2780451532 @default.
- W2172264025 hasConcept C2780791683 @default.
- W2172264025 hasConcept C33923547 @default.
- W2172264025 hasConcept C41008148 @default.
- W2172264025 hasConcept C62520636 @default.
- W2172264025 hasConcept C78458016 @default.
- W2172264025 hasConcept C81917197 @default.
- W2172264025 hasConcept C86803240 @default.
- W2172264025 hasConcept C97541855 @default.
- W2172264025 hasConceptScore W2172264025C105795698 @default.
- W2172264025 hasConceptScore W2172264025C105902424 @default.
- W2172264025 hasConceptScore W2172264025C119857082 @default.
- W2172264025 hasConceptScore W2172264025C120936955 @default.
- W2172264025 hasConceptScore W2172264025C121332964 @default.
- W2172264025 hasConceptScore W2172264025C13280743 @default.
- W2172264025 hasConceptScore W2172264025C14036430 @default.
- W2172264025 hasConceptScore W2172264025C154945302 @default.
- W2172264025 hasConceptScore W2172264025C159149176 @default.
- W2172264025 hasConceptScore W2172264025C162324750 @default.
- W2172264025 hasConceptScore W2172264025C166109690 @default.
- W2172264025 hasConceptScore W2172264025C169760540 @default.
- W2172264025 hasConceptScore W2172264025C185798385 @default.
- W2172264025 hasConceptScore W2172264025C187736073 @default.
- W2172264025 hasConceptScore W2172264025C196340769 @default.
- W2172264025 hasConceptScore W2172264025C205649164 @default.
- W2172264025 hasConceptScore W2172264025C26760741 @default.
- W2172264025 hasConceptScore W2172264025C2780451532 @default.
- W2172264025 hasConceptScore W2172264025C2780791683 @default.
- W2172264025 hasConceptScore W2172264025C33923547 @default.
- W2172264025 hasConceptScore W2172264025C41008148 @default.
- W2172264025 hasConceptScore W2172264025C62520636 @default.
- W2172264025 hasConceptScore W2172264025C78458016 @default.
- W2172264025 hasConceptScore W2172264025C81917197 @default.
- W2172264025 hasConceptScore W2172264025C86803240 @default.