Matches in SemOpenAlex for { <https://semopenalex.org/work/W1501627097> ?p ?o ?g. }
- W1501627097 abstract "In applying reinforcement learning to agents acting in the real world we are often faced with tasks that are non-Markovian in nature. Much work has been done using state estimation algorithms to try to uncover Markovian models of tasks in order to allow the learning of optimal solutions using reinforcement learning. Unfortunately these algorithms which attempt to simultaneously learn a Markov model of the world and how to act have proved very brittle. Our focus differs. In considering embodied, embedded and situated agents we have a preference for simple learning algorithms which reliably learn satisficing policies. The learning algorithms we consider do not try to uncover the underlying Markovian states, instead they aim to learn successful deterministic reactive policies such that agents actions are based directly upon the observations provided by their sensors. Existing results have shown that such reactive policies can be arbitrarily worse than a policy that has access to the underlying Markov process and in some cases no satisficing reactive policy can exist. Our first contribution is to show that providing agents with alternative actions and viewpoints on the task through the addition of active perception can provide a practical solution in such circumstances. We demonstrate empirically that: (i) adding arbitrary active perception actions to agents which can only learn deterministic reactive policies can allow the learning of satisficing policies where none were originally possible; (ii) active perception actions allow the learning of better satisficing policies than those that existed previously and (iii) our approach converges more reliably to satisficing solutions than existing state estimation algorithms such as U-Tree and the Lion Algorithm. Our other contributions focus on issues which affect the reliability with which deterministic reactive satisficing policies can be learnt in non-Markovian environments. We show that that greedy action selection may be a necessary condition for the existence of stable deterministic reactive policies on partially observable Markov decision processes (POMDPs). We also set out the concept of Consistent Exploration. This is the idea of estimating state-action values by acting as though the policy has been changed to incorporate the action being explored. We demonstrate that this concept can be used to develop better algorithms for learning reactive policies to POMDPs by presenting a new reinforcement learning algorithm; the Consistent Exploration Q(λ) algorithm (CEQ(λ)). We demonstrate on a significant number of problems that CEQ(λ) is more reliable at learning satisficing solutions than the algorithm currently regarded as the best for learning deterministic reactive policies, that of SARSA(λ)." @default.
- W1501627097 created "2016-06-24" @default.
- W1501627097 creator A5083733165 @default.
- W1501627097 date "2007-06-01" @default.
- W1501627097 modified "2023-09-23" @default.
- W1501627097 title "Learning in a state of confusion : employing active perception and reinforcement learning in partially observable worlds" @default.
- W1501627097 cites W11162148 @default.
- W1501627097 cites W116417374 @default.
- W1501627097 cites W1491843047 @default.
- W1501627097 cites W1500024457 @default.
- W1501627097 cites W1502178199 @default.
- W1501627097 cites W1508269516 @default.
- W1501627097 cites W1523873368 @default.
- W1501627097 cites W1529602137 @default.
- W1501627097 cites W1541084404 @default.
- W1501627097 cites W1550098664 @default.
- W1501627097 cites W1555801537 @default.
- W1501627097 cites W1565134770 @default.
- W1501627097 cites W1570690983 @default.
- W1501627097 cites W1582783138 @default.
- W1501627097 cites W1583380718 @default.
- W1501627097 cites W1584182316 @default.
- W1501627097 cites W1585546346 @default.
- W1501627097 cites W1593772383 @default.
- W1501627097 cites W1640646391 @default.
- W1501627097 cites W1646707810 @default.
- W1501627097 cites W1652032257 @default.
- W1501627097 cites W1657542410 @default.
- W1501627097 cites W1657674574 @default.
- W1501627097 cites W1687873425 @default.
- W1501627097 cites W183068496 @default.
- W1501627097 cites W1977088429 @default.
- W1501627097 cites W1989388297 @default.
- W1501627097 cites W1994297893 @default.
- W1501627097 cites W1995544955 @default.
- W1501627097 cites W1996284680 @default.
- W1501627097 cites W2000659199 @default.
- W1501627097 cites W2006038922 @default.
- W1501627097 cites W2024060531 @default.
- W1501627097 cites W2031355581 @default.
- W1501627097 cites W2032100464 @default.
- W1501627097 cites W2037445741 @default.
- W1501627097 cites W2048984163 @default.
- W1501627097 cites W205184011 @default.
- W1501627097 cites W2052117683 @default.
- W1501627097 cites W2056760934 @default.
- W1501627097 cites W2064675550 @default.
- W1501627097 cites W2067483065 @default.
- W1501627097 cites W2087946919 @default.
- W1501627097 cites W2097856935 @default.
- W1501627097 cites W2098645324 @default.
- W1501627097 cites W2102322810 @default.
- W1501627097 cites W2106354437 @default.
- W1501627097 cites W2107726111 @default.
- W1501627097 cites W2110014716 @default.
- W1501627097 cites W2113913482 @default.
- W1501627097 cites W2115887268 @default.
- W1501627097 cites W2121863487 @default.
- W1501627097 cites W2122410182 @default.
- W1501627097 cites W2123542217 @default.
- W1501627097 cites W2123663688 @default.
- W1501627097 cites W2125074935 @default.
- W1501627097 cites W2125838338 @default.
- W1501627097 cites W2142925453 @default.
- W1501627097 cites W2143256202 @default.
- W1501627097 cites W2148962857 @default.
- W1501627097 cites W2149960632 @default.
- W1501627097 cites W2150339816 @default.
- W1501627097 cites W2151040408 @default.
- W1501627097 cites W2152136853 @default.
- W1501627097 cites W2158091072 @default.
- W1501627097 cites W2158282517 @default.
- W1501627097 cites W2160067530 @default.
- W1501627097 cites W2164056559 @default.
- W1501627097 cites W2166610875 @default.
- W1501627097 cites W2172246523 @default.
- W1501627097 cites W2296039540 @default.
- W1501627097 cites W2768155694 @default.
- W1501627097 cites W2912079318 @default.
- W1501627097 cites W2912185451 @default.
- W1501627097 cites W2913703059 @default.
- W1501627097 cites W2914331897 @default.
- W1501627097 cites W2914605006 @default.
- W1501627097 cites W3011120880 @default.
- W1501627097 cites W3139460557 @default.
- W1501627097 cites W41069182 @default.
- W1501627097 cites W42497557 @default.
- W1501627097 cites W51508254 @default.
- W1501627097 cites W54841420 @default.
- W1501627097 cites W6242441 @default.
- W1501627097 cites W80823096 @default.
- W1501627097 cites W84772334 @default.
- W1501627097 hasPublicationYear "2007" @default.
- W1501627097 type Work @default.
- W1501627097 sameAs 1501627097 @default.
- W1501627097 citedByCount "5" @default.
- W1501627097 countsByYear W15016270972022 @default.
- W1501627097 crossrefType "dissertation" @default.
- W1501627097 hasAuthorship W1501627097A5083733165 @default.
- W1501627097 hasConcept C105795698 @default.