Matches in SemOpenAlex for { <https://semopenalex.org/work/W2289410116> ?p ?o ?g. }
- W2289410116 abstract "Aggregation of multiple Reinforcement Learning (RL) algorithms is a new and effective technique to improve the quality of Sequential Decision Making (SDM). SDM is very common and important in various realistic applications, especially in automatic control problems. The quality of a SDM depends on (discounted) long-term rewards rather than the instant rewards. Due to delayed feedback, SDM tasks are much more difficult to handle than classification problems. Meanwhile, in many SDM tasks, the feedback about a decision is often in the form of evaluation rather than instruction. Therefore, supervised learning techniques are not suitable in these tasks. To tackle these difficulties, RL methods are investigated. Although many RL algorithms have been developed, none is consistently better than the others. In addition, the parameters of RL algorithms significantly influence learning performances. Successful RL applications depend on suitable learning algorithms and elaborately selected learning parameters, but there is no universal rule to guide the choice of algorithms and the setting of parameters. To handle this difficulty, a new multiple RL system - the Aggregated Multiple Reinforcement Learning System (AMRLS) is developed. In this proposed system, each RL algorithm (learner) learns individually in a learning module and provides its output to an intelligent aggregation module. The aggregation module dynamically aggregates these outputs by using some intelligent aggregation methods and provides a decision of action. Then, all learners take the action and update their policies individually. The two processes are performed alternatively in each learning episode. Because of the intelligent and dynamic aggregation, AMRLS has the ability to deal with dynamic learning problems without the need to search for the optimal learning algorithm or the optimal values of learning parameters. It is claimed that several complementary learning algorithms can be integrated in the AMRLS to improve the learning performance in terms of success rate, robustness, confidence, redundance, and complementariness. There are two strategies for learning an optimal policy by using RL methods. One is based on the Value Function Learning (VFL) strategy, which learns an optimal policy expressed as a value function. The Temporal Difference (TD) methods are examples of this strategy and they are called TDRL in this dissertation. The other strategy is based on the Direct Policy Search (DPS), which directly searches for the optimal policy in the potential policy space. The Genetic Algorithms (GAs)-based search algorithms are instances of this strategy and they are named GARL. Both of the strategies exhibit advantages and disadvantages. A hybrid learning architecture of GARL and TDRL, HGATDRL, is proposed to combine them together. HGATDRL uses an off-line GARL approach to learn an initial policy first, and then updates the policy on-line by using a TDRL approach. This new learning method enhances the learning ability of RL learners in AMRLS. The AMRLS framework and HGATDRL method are tested on several SDM problems, including the maze world problem, pursuit domain problem, cart-pole balancing system, mountain car problem, and flight control system. The experimental results show that the proposed framework and method can enhance the learning ability and improve learning performance of a multiple RL system." @default.
- W2289410116 created "2016-06-24" @default.
- W2289410116 creator A5057515922 @default.
- W2289410116 date "2007-01-01" @default.
- W2289410116 modified "2023-10-03" @default.
- W2289410116 title "A framework for aggregation of multiple reinforcement learning algorithms" @default.
- W2289410116 cites W142858861 @default.
- W2289410116 cites W145683767 @default.
- W2289410116 cites W1495978126 @default.
- W2289410116 cites W1508229132 @default.
- W2289410116 cites W1512947503 @default.
- W2289410116 cites W1519614353 @default.
- W2289410116 cites W1523950537 @default.
- W2289410116 cites W1534175858 @default.
- W2289410116 cites W1534355532 @default.
- W2289410116 cites W1538023783 @default.
- W2289410116 cites W1544439962 @default.
- W2289410116 cites W1549353711 @default.
- W2289410116 cites W1564534945 @default.
- W2289410116 cites W1576818901 @default.
- W2289410116 cites W1590984588 @default.
- W2289410116 cites W1600795850 @default.
- W2289410116 cites W162989097 @default.
- W2289410116 cites W1641379095 @default.
- W2289410116 cites W1646752922 @default.
- W2289410116 cites W1659842140 @default.
- W2289410116 cites W1689445748 @default.
- W2289410116 cites W1705501381 @default.
- W2289410116 cites W1889531669 @default.
- W2289410116 cites W1901460 @default.
- W2289410116 cites W1914583973 @default.
- W2289410116 cites W1916228951 @default.
- W2289410116 cites W1963754118 @default.
- W2289410116 cites W1978181253 @default.
- W2289410116 cites W1987187457 @default.
- W2289410116 cites W1993587028 @default.
- W2289410116 cites W1993740947 @default.
- W2289410116 cites W2003323955 @default.
- W2289410116 cites W2009533501 @default.
- W2289410116 cites W2009767855 @default.
- W2289410116 cites W2012391952 @default.
- W2289410116 cites W2016648380 @default.
- W2289410116 cites W2027945080 @default.
- W2289410116 cites W2044676734 @default.
- W2289410116 cites W2049287437 @default.
- W2289410116 cites W2050417267 @default.
- W2289410116 cites W2055178087 @default.
- W2289410116 cites W2062122188 @default.
- W2289410116 cites W2067619284 @default.
- W2289410116 cites W2071311198 @default.
- W2289410116 cites W2072164538 @default.
- W2289410116 cites W2076337359 @default.
- W2289410116 cites W2078226168 @default.
- W2289410116 cites W2085130511 @default.
- W2289410116 cites W2091565802 @default.
- W2289410116 cites W2097571405 @default.
- W2289410116 cites W2100370041 @default.
- W2289410116 cites W2100677568 @default.
- W2289410116 cites W2101915445 @default.
- W2289410116 cites W2102734279 @default.
- W2289410116 cites W2106451198 @default.
- W2289410116 cites W2107726111 @default.
- W2289410116 cites W2109330238 @default.
- W2289410116 cites W2113204985 @default.
- W2289410116 cites W2113501460 @default.
- W2289410116 cites W2113998737 @default.
- W2289410116 cites W2116339921 @default.
- W2289410116 cites W2117341272 @default.
- W2289410116 cites W2119053738 @default.
- W2289410116 cites W2121863487 @default.
- W2289410116 cites W2124175081 @default.
- W2289410116 cites W2124868070 @default.
- W2289410116 cites W2125801896 @default.
- W2289410116 cites W2126297944 @default.
- W2289410116 cites W2128643385 @default.
- W2289410116 cites W2141650360 @default.
- W2289410116 cites W2143680741 @default.
- W2289410116 cites W2149054115 @default.
- W2289410116 cites W2149162789 @default.
- W2289410116 cites W2151676833 @default.
- W2289410116 cites W2152610444 @default.
- W2289410116 cites W2162813238 @default.
- W2289410116 cites W2164568552 @default.
- W2289410116 cites W2166843422 @default.
- W2289410116 cites W2169022337 @default.
- W2289410116 cites W2312609093 @default.
- W2289410116 cites W2378066718 @default.
- W2289410116 cites W2412433624 @default.
- W2289410116 cites W3139377883 @default.
- W2289410116 cites W3198160809 @default.
- W2289410116 cites W54221576 @default.
- W2289410116 hasPublicationYear "2007" @default.
- W2289410116 type Work @default.
- W2289410116 sameAs 2289410116 @default.
- W2289410116 citedByCount "3" @default.
- W2289410116 countsByYear W22894101162013 @default.
- W2289410116 countsByYear W22894101162014 @default.
- W2289410116 countsByYear W22894101162015 @default.
- W2289410116 crossrefType "dissertation" @default.
- W2289410116 hasAuthorship W2289410116A5057515922 @default.