Matches in SemOpenAlex for { <https://semopenalex.org/work/W1176136657> ?p ?o ?g. }
- W1176136657 abstract "Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches. One such class of algorithms are the so-called policy-gradient methods from reinforcement learning. They seek to adjust the parameters of an agent in the direction that maximises the long-term average of a reward signal. Policy-gradient methods are attractive as a scalable approach for controlling partially observable Markov decision processes (POMDPs). In the most general case POMDP policies require some form of internal state, or memory, in order to act optimally. Policy-gradient methods have shown promise for problems admitting memory-less policies but have been less successful when memory is required. This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting. Directly, when the dynamics of the world are known, and via Monte-Carlo methods otherwise. The algorithms simultaneously learn how to act and what to remember. Monte-Carlo policy-gradient approaches tend to produce gradient estimates with high variance. Two novel methods for reducing variance are introduced. The first uses high-order filters to replace the eligibility trace of the gradient estimator. The second uses a low-variance value-function method to learn a subset of the parameters and a policy-gradient method to learn the remainder. The algorithms are applied to large domains including a simulated robot navigation scenario, a multi-agent scenario with 21,000 states, and the complex real-world task of large vocabulary continuous speech recognition. To the best of the author’s knowledge, no other policy-gradient algorithms have performed well at such tasks. The high variance of Monte-Carlo methods requires lengthy simulation and hence a super-computer to train agents within a reasonable time. The ANU “Bunyip” Linux cluster was built with such tasks in mind. It was used for several of the experimental results presented here. One chapter of this thesis describes an application written for the Bunyip cluster that won the international Gordon-Bell prize for price/performance in 2001." @default.
- W1176136657 created "2016-06-24" @default.
- W1176136657 creator A5015132137 @default.
- W1176136657 date "2003-04-01" @default.
- W1176136657 modified "2023-09-26" @default.
- W1176136657 title "Policy-Gradient Algorithms for Partially Observable Markov Decision Processes" @default.
- W1176136657 cites W108912041 @default.
- W1176136657 cites W11162148 @default.
- W1176136657 cites W112327418 @default.
- W1176136657 cites W131709709 @default.
- W1176136657 cites W138253549 @default.
- W1176136657 cites W139620917 @default.
- W1176136657 cites W1487408605 @default.
- W1176136657 cites W1489670076 @default.
- W1176136657 cites W1490038383 @default.
- W1176136657 cites W1494689917 @default.
- W1176136657 cites W1499982184 @default.
- W1176136657 cites W1500894465 @default.
- W1176136657 cites W1502178199 @default.
- W1176136657 cites W1502893368 @default.
- W1176136657 cites W1504820690 @default.
- W1176136657 cites W1505888104 @default.
- W1176136657 cites W1508753734 @default.
- W1176136657 cites W1527719492 @default.
- W1176136657 cites W1530444831 @default.
- W1176136657 cites W1534355532 @default.
- W1176136657 cites W1535702289 @default.
- W1176136657 cites W1539054658 @default.
- W1176136657 cites W1539216098 @default.
- W1176136657 cites W1541084404 @default.
- W1176136657 cites W1542886316 @default.
- W1176136657 cites W1543270850 @default.
- W1176136657 cites W1549519846 @default.
- W1176136657 cites W1553004968 @default.
- W1176136657 cites W1555477527 @default.
- W1176136657 cites W1556274146 @default.
- W1176136657 cites W1557073320 @default.
- W1176136657 cites W1563317173 @default.
- W1176136657 cites W1565453839 @default.
- W1176136657 cites W1567923445 @default.
- W1176136657 cites W1574530145 @default.
- W1176136657 cites W1575388622 @default.
- W1176136657 cites W1581055761 @default.
- W1176136657 cites W1584530380 @default.
- W1176136657 cites W1585398001 @default.
- W1176136657 cites W1585861384 @default.
- W1176136657 cites W1586162706 @default.
- W1176136657 cites W1590970202 @default.
- W1176136657 cites W159191692 @default.
- W1176136657 cites W1593223881 @default.
- W1176136657 cites W1594297126 @default.
- W1176136657 cites W1594871463 @default.
- W1176136657 cites W1596364083 @default.
- W1176136657 cites W1601974704 @default.
- W1176136657 cites W1602007439 @default.
- W1176136657 cites W1606274310 @default.
- W1176136657 cites W1617610651 @default.
- W1176136657 cites W1640774615 @default.
- W1176136657 cites W1656336092 @default.
- W1176136657 cites W1657542410 @default.
- W1176136657 cites W1679945064 @default.
- W1176136657 cites W1701684472 @default.
- W1176136657 cites W1702462424 @default.
- W1176136657 cites W1794176462 @default.
- W1176136657 cites W1814308503 @default.
- W1176136657 cites W1880549478 @default.
- W1176136657 cites W1914583973 @default.
- W1176136657 cites W1920503989 @default.
- W1176136657 cites W1934019294 @default.
- W1176136657 cites W1964031104 @default.
- W1176136657 cites W1965537434 @default.
- W1176136657 cites W1965786092 @default.
- W1176136657 cites W1970602736 @default.
- W1176136657 cites W1980501707 @default.
- W1176136657 cites W1983016559 @default.
- W1176136657 cites W1989226853 @default.
- W1176136657 cites W1991133427 @default.
- W1176136657 cites W1996652034 @default.
- W1176136657 cites W1997449813 @default.
- W1176136657 cites W1997696513 @default.
- W1176136657 cites W2007857129 @default.
- W1176136657 cites W2009111612 @default.
- W1176136657 cites W2016589492 @default.
- W1176136657 cites W2017427968 @default.
- W1176136657 cites W2019125718 @default.
- W1176136657 cites W2020294948 @default.
- W1176136657 cites W2028145673 @default.
- W1176136657 cites W2032100464 @default.
- W1176136657 cites W2033976720 @default.
- W1176136657 cites W2034725503 @default.
- W1176136657 cites W2035476608 @default.
- W1176136657 cites W2036317923 @default.
- W1176136657 cites W2041367235 @default.
- W1176136657 cites W2044375425 @default.
- W1176136657 cites W2046765929 @default.
- W1176136657 cites W2049633694 @default.
- W1176136657 cites W2050941272 @default.
- W1176136657 cites W2051195188 @default.
- W1176136657 cites W2064369482 @default.
- W1176136657 cites W2064675550 @default.