Matches in SemOpenAlex for { <https://semopenalex.org/work/W6842097> ?p ?o ?g. }
Showing items 1 to 54 of
54
with 100 items per page.
- W6842097 abstract "Partially observable decision processes POMDP can be used as a model for planning in stochastic domains This paper considers the problem of computing an optimal policy for a nite horizon POMDP The task is di cult because the decision at any time point depends upon information from all previous time points We propose to lter out inconsistencies and insigni cant details in the collection of information being passed from one time point to the next This reduces the number of possible information states and hence speeds up computation A bound on the sacri ce of optimality due to information ltering is given which provides a way for trading o between computational complexity and optimality Introduction There is a growing interest in using Markov decision processes as a model for planning in stochastic domains Dean and Wellman Provan and Clarke Dean et al In this model there is a state variable that represents the environment of an agent The environment evolves stochastically over time At each time point the agent obtains some observations about its environment and takes an action The agent receives a reward or penalty at each time point depending whether the planning goal is achieved and on the costs of actions A plan or a policy speci es for each time point the appropriate action to take in response to each possible contingency An optimal policy is one that maximizes the expected total reward i e a plan that achieves the goal with minimum cost The environment can be either fully observable or only partially observable by the agent The fully observable case has been studied extensively in the dynamic programming literature e g Bertsekas White Dean et al propose a search algorithm to deal with applications having environments with a large number of possible states This paper is concerned with the partially observable case This case is much harder than the fully observable case When the agent knows exactly which state the environment is in information from the past i e past observations and actions is irrelevant to the current decision This is the so called Markov property On the other hand when the agent does not have full observation of the environment past information becomes relevant because it can help the agent to better estimate the true current state of the environment The problem is that the number of possible states of past information increases exponentially with time This problem is usually referred to as the curse of dimensionality White and Scherer proposes an algorithm that uses information from the m most recent time points to approximate the entire past information Apparently this algorithm is suitable only for applications where past information diminishes rapidly in importance relative to recent events The e ects of past information can be captured by a probability distribution over the set of all possible current states of the enviroment Bertsekas This probability distribution is called a belief state In contrast we shall use the term information state to refer to the state of past information Most algorithms for solving POMDPs are in terms of belief state Lovejoy Cassandra The number of possible belief states is in nite The algorithms rely on the so called piecewise linearity of the value functions or nite grid approximations to partition the space of belief states into a nite number of equivalent classes Lovejoy The performances of those algorithms are poor except in small examples One source of ine cency is due to the overlooking by the algorithms of the fact that those belief states that are induced by possible information states do not distribute uniformly over the set of all possible belief states As an example consider path planning for an agent travelling over a nite grid At each time point the agent observes its own location on the grid and decides an action to take Randomness comes in because of uncertainty in environment and noise in sensor and acuator It might be reasonable to assume here that the belief states about the current location of the agent induced by past information are all montain shaped distributions with one peak Still the algorithms will spend a lot of resources trying to classify belief states with two peaks three peaks one hundred peaks and so on This paper proposes a new way to approach POMDP s where both information state and belief state have a role to play We detect and prune inconsistent information states and we lter out the insigni cant details in consistent information states by clustering information states that induce similar belief states This way there is no need to assume a priori that the importance of past information diminishes rapidly since the signi cant aspects are sifted out automatically Also belief states that are not induced by any information state never come into the picture Information ltering cuts down the number of possible information states and hence speeds up computation At the same time the sacri ce in optimality can be expected to be accept able since only inconsistencies and insigni cant details are ltered out In practice one can start with a coarse lter that allows only the most signi cant information to pass through and then gradually re ne the lter to allow more and more details to pass through This way an appropriate trade o between computational complexity and optimality can be achieved The rest of the paper is organized as follows Section reviews nite horizon PODMPs by considering a path planning problem A reformulation of POMDP is presented in Section which leads to the idea of information ltering Section Section describes an inductive method for carrying out information ltering Sections and provide a bound on the sacri ce of optimality due to information ltering Section discusses the e ectiveness of information ltering and conclusions are provided at section POMDPs and planning a stochastic domains This section reviews the concept of POMDP by considering the path planning problem for an agent who travels over a nite grid Figure shows a POMDP model for the path planning problem It consists of three types of variables random decision and value variables which are respectively drawn as ellipses rectangles and diamonds The random variable st represents the location of the agent at time t and is called the state variable The random variable ot stands for the location of the agent as observed by itself The decision variable dt represents the action the agent takes at time t which could be one of stay go east go south go west and go north Here go east means to move one step eastward The value variable rt encodes the planning goal and criteria for good plans The observed location ot depends on the true location st as indicated by the arrow ot st Due to noise in observation this dependency is probabilistic in nature The observation ot also probabilistically depends on the action dt of the previous time point because the observation could be noisier when the agent is moving than when the agent stays still The dependency of ot upon dt and st is numerically characterized by a conditional probability P otjdt st We assume time begins at When t P otjdt st is to be understood as P o js The location st of the agent at the next time point depends on its current location st and the current action dt as indicated by the arcs st st and dt st This dependency is again probabilistic because an action might not have the intended e ects due to a couple of reasons First the agent might not be able to carry actions accurately When executing go north for instance there might be some chance of overshooting and sliding sideways Second a link in the grid might sometimes be broken In such a case the action to move from one end of the link to the other will fail The dependency of st upon st and dt is numerically characterized by a conditional probability P st jst dt The agent s initial location start is encoded by letting P s be when s start and be otherwise The value or reward variable vt depends on the agent s current location st the current action dt and the agent s location st at the next time point The value variable is charac terized by a value or reward function rt st dt st The goal to reach location goal and the preference for short plans can be encoded by setting rt st dt st as follows rt st dt st def cost dt if st goal and st goal reward cost dt if st goal and st goal reward cost dt if st goal and st goal cost dt if st goal and st goal where reward is the reward for achieving the goal Usually reward should be much larger than the costs of actions The numer of time steps N considered in a POMDP model is called the horizon of the" @default.
- W6842097 created "2016-06-24" @default.
- W6842097 creator A5010274354 @default.
- W6842097 creator A5070840566 @default.
- W6842097 date "1995-01-24" @default.
- W6842097 modified "2023-09-27" @default.
- W6842097 title "Information filtering for planning in partially observable stochastic domains" @default.
- W6842097 hasPublicationYear "1995" @default.
- W6842097 type Work @default.
- W6842097 sameAs 6842097 @default.
- W6842097 citedByCount "2" @default.
- W6842097 countsByYear W68420972013 @default.
- W6842097 crossrefType "journal-article" @default.
- W6842097 hasAuthorship W6842097A5010274354 @default.
- W6842097 hasAuthorship W6842097A5070840566 @default.
- W6842097 hasConcept C121332964 @default.
- W6842097 hasConcept C154945302 @default.
- W6842097 hasConcept C32848918 @default.
- W6842097 hasConcept C33923547 @default.
- W6842097 hasConcept C41008148 @default.
- W6842097 hasConcept C62520636 @default.
- W6842097 hasConceptScore W6842097C121332964 @default.
- W6842097 hasConceptScore W6842097C154945302 @default.
- W6842097 hasConceptScore W6842097C32848918 @default.
- W6842097 hasConceptScore W6842097C33923547 @default.
- W6842097 hasConceptScore W6842097C41008148 @default.
- W6842097 hasConceptScore W6842097C62520636 @default.
- W6842097 hasLocation W68420971 @default.
- W6842097 hasOpenAccess W6842097 @default.
- W6842097 hasPrimaryLocation W68420971 @default.
- W6842097 hasRelatedWork W1880292461 @default.
- W6842097 hasRelatedWork W2001761119 @default.
- W6842097 hasRelatedWork W2158610616 @default.
- W6842097 hasRelatedWork W2548718026 @default.
- W6842097 hasRelatedWork W2556899708 @default.
- W6842097 hasRelatedWork W2758385771 @default.
- W6842097 hasRelatedWork W2788986099 @default.
- W6842097 hasRelatedWork W2795942711 @default.
- W6842097 hasRelatedWork W2893699802 @default.
- W6842097 hasRelatedWork W2900468595 @default.
- W6842097 hasRelatedWork W2967434507 @default.
- W6842097 hasRelatedWork W3000275637 @default.
- W6842097 hasRelatedWork W3010497646 @default.
- W6842097 hasRelatedWork W3014510663 @default.
- W6842097 hasRelatedWork W3099448484 @default.
- W6842097 hasRelatedWork W3130386886 @default.
- W6842097 hasRelatedWork W3161020276 @default.
- W6842097 hasRelatedWork W3164790233 @default.
- W6842097 hasRelatedWork W3197240729 @default.
- W6842097 hasRelatedWork W3208170749 @default.
- W6842097 isParatext "false" @default.
- W6842097 isRetracted "false" @default.
- W6842097 magId "6842097" @default.
- W6842097 workType "article" @default.