Matches in SemOpenAlex for { <https://semopenalex.org/work/W2041176007> ?p ?o ?g. }
- W2041176007 endingPage "2193" @default.
- W2041176007 startingPage "2184" @default.
- W2041176007 abstract "Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart–pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm." @default.
- W2041176007 created "2016-06-24" @default.
- W2041176007 creator A5012807007 @default.
- W2041176007 creator A5051865874 @default.
- W2041176007 creator A5076584828 @default.
- W2041176007 date "2013-10-01" @default.
- W2041176007 modified "2023-10-16" @default.
- W2041176007 title "Backward Q-learning: The combination of Sarsa algorithm and Q-learning" @default.
- W2041176007 cites W1497046610 @default.
- W2041176007 cites W1930027090 @default.
- W2041176007 cites W1965295747 @default.
- W2041176007 cites W1979939118 @default.
- W2041176007 cites W1981707771 @default.
- W2041176007 cites W1995991457 @default.
- W2041176007 cites W1997641400 @default.
- W2041176007 cites W2000218760 @default.
- W2041176007 cites W2001180646 @default.
- W2041176007 cites W2006115215 @default.
- W2041176007 cites W2015700124 @default.
- W2041176007 cites W2026536622 @default.
- W2041176007 cites W2030432932 @default.
- W2041176007 cites W2043806097 @default.
- W2041176007 cites W2072170388 @default.
- W2041176007 cites W2074458475 @default.
- W2041176007 cites W2074669169 @default.
- W2041176007 cites W2078872647 @default.
- W2041176007 cites W2091565802 @default.
- W2041176007 cites W2102568724 @default.
- W2041176007 cites W2106155860 @default.
- W2041176007 cites W2107726111 @default.
- W2041176007 cites W2110422826 @default.
- W2041176007 cites W2114032071 @default.
- W2041176007 cites W2116422023 @default.
- W2041176007 cites W2116770962 @default.
- W2041176007 cites W2117941808 @default.
- W2041176007 cites W2123824069 @default.
- W2041176007 cites W2127290018 @default.
- W2041176007 cites W2145417063 @default.
- W2041176007 cites W2151340488 @default.
- W2041176007 cites W2158634442 @default.
- W2041176007 cites W2160362462 @default.
- W2041176007 cites W3041202696 @default.
- W2041176007 cites W32403112 @default.
- W2041176007 doi "https://doi.org/10.1016/j.engappai.2013.06.016" @default.
- W2041176007 hasPublicationYear "2013" @default.
- W2041176007 type Work @default.
- W2041176007 sameAs 2041176007 @default.
- W2041176007 citedByCount "93" @default.
- W2041176007 countsByYear W20411760072015 @default.
- W2041176007 countsByYear W20411760072016 @default.
- W2041176007 countsByYear W20411760072017 @default.
- W2041176007 countsByYear W20411760072018 @default.
- W2041176007 countsByYear W20411760072019 @default.
- W2041176007 countsByYear W20411760072020 @default.
- W2041176007 countsByYear W20411760072021 @default.
- W2041176007 countsByYear W20411760072022 @default.
- W2041176007 countsByYear W20411760072023 @default.
- W2041176007 crossrefType "journal-article" @default.
- W2041176007 hasAuthorship W2041176007A5012807007 @default.
- W2041176007 hasAuthorship W2041176007A5051865874 @default.
- W2041176007 hasAuthorship W2041176007A5076584828 @default.
- W2041176007 hasConcept C11413529 @default.
- W2041176007 hasConcept C119857082 @default.
- W2041176007 hasConcept C154945302 @default.
- W2041176007 hasConcept C162324750 @default.
- W2041176007 hasConcept C166109690 @default.
- W2041176007 hasConcept C169760540 @default.
- W2041176007 hasConcept C188116033 @default.
- W2041176007 hasConcept C26760741 @default.
- W2041176007 hasConcept C2777303404 @default.
- W2041176007 hasConcept C41008148 @default.
- W2041176007 hasConcept C50522688 @default.
- W2041176007 hasConcept C81917197 @default.
- W2041176007 hasConcept C86803240 @default.
- W2041176007 hasConcept C97541855 @default.
- W2041176007 hasConceptScore W2041176007C11413529 @default.
- W2041176007 hasConceptScore W2041176007C119857082 @default.
- W2041176007 hasConceptScore W2041176007C154945302 @default.
- W2041176007 hasConceptScore W2041176007C162324750 @default.
- W2041176007 hasConceptScore W2041176007C166109690 @default.
- W2041176007 hasConceptScore W2041176007C169760540 @default.
- W2041176007 hasConceptScore W2041176007C188116033 @default.
- W2041176007 hasConceptScore W2041176007C26760741 @default.
- W2041176007 hasConceptScore W2041176007C2777303404 @default.
- W2041176007 hasConceptScore W2041176007C41008148 @default.
- W2041176007 hasConceptScore W2041176007C50522688 @default.
- W2041176007 hasConceptScore W2041176007C81917197 @default.
- W2041176007 hasConceptScore W2041176007C86803240 @default.
- W2041176007 hasConceptScore W2041176007C97541855 @default.
- W2041176007 hasFunder F4320311687 @default.
- W2041176007 hasFunder F4320321040 @default.
- W2041176007 hasFunder F4320324663 @default.
- W2041176007 hasIssue "9" @default.
- W2041176007 hasLocation W20411760071 @default.
- W2041176007 hasOpenAccess W2041176007 @default.
- W2041176007 hasPrimaryLocation W20411760071 @default.
- W2041176007 hasRelatedWork W1587318060 @default.
- W2041176007 hasRelatedWork W1968902782 @default.