Matches in SemOpenAlex for { <https://semopenalex.org/work/W40658051> ?p ?o ?g. }
- W40658051 endingPage "1859" @default.
- W40658051 startingPage "1858" @default.
- W40658051 abstract "We consider the problem of policy learning in aMarkov Decision Process (MDP) where only a restricted, limited subset of the full policy space can be used. A MDP consists of a state space S, a set of actions A, a transition probability function t(s, a, s′) and a reward function R : S → R. Also there is the discount factor γ. The problem is to find a policy, a mapping from states to actions π : S → A, which gives the highest discounted return IE ∑∞ i=1 γ R(s) (where s represents the state encountered at time step i) for every possible start state. However, we are not interested in any possible policy, only in a restricted, limited subsetΠ of the full policy space. The assumption will be made that there exists a policy π which is best for every state s ∈ S, compared to the other policies in Π. It is not required that the true optimal policy for the MDP belongs to Π. In some settings we can also consider stochastic policies, which map states to a probability distribution over the action set. This greatly increases the size of the policy search space." @default.
- W40658051 created "2016-06-24" @default.
- W40658051 creator A5075585811 @default.
- W40658051 creator A5077665746 @default.
- W40658051 date "2007-07-22" @default.
- W40658051 modified "2023-09-24" @default.
- W40658051 title "On policy learning in restricted policy spaces" @default.
- W40658051 cites W1515851193 @default.
- W40658051 cites W1914583973 @default.
- W40658051 cites W1970041563 @default.
- W40658051 cites W2012728920 @default.
- W40658051 cites W203338875 @default.
- W40658051 cites W2105507006 @default.
- W40658051 cites W2121863487 @default.
- W40658051 cites W2126049034 @default.
- W40658051 cites W2130801532 @default.
- W40658051 cites W2524179627 @default.
- W40658051 hasPublicationYear "2007" @default.
- W40658051 type Work @default.
- W40658051 sameAs 40658051 @default.
- W40658051 citedByCount "0" @default.
- W40658051 crossrefType "proceedings-article" @default.
- W40658051 hasAuthorship W40658051A5075585811 @default.
- W40658051 hasAuthorship W40658051A5077665746 @default.
- W40658051 hasConcept C10138342 @default.
- W40658051 hasConcept C105795698 @default.
- W40658051 hasConcept C106189395 @default.
- W40658051 hasConcept C111919701 @default.
- W40658051 hasConcept C11413529 @default.
- W40658051 hasConcept C121332964 @default.
- W40658051 hasConcept C126255220 @default.
- W40658051 hasConcept C14036430 @default.
- W40658051 hasConcept C144237770 @default.
- W40658051 hasConcept C154945302 @default.
- W40658051 hasConcept C159886148 @default.
- W40658051 hasConcept C162324750 @default.
- W40658051 hasConcept C177264268 @default.
- W40658051 hasConcept C188116033 @default.
- W40658051 hasConcept C199360897 @default.
- W40658051 hasConcept C2778572836 @default.
- W40658051 hasConcept C2780791683 @default.
- W40658051 hasConcept C33923547 @default.
- W40658051 hasConcept C41008148 @default.
- W40658051 hasConcept C48103436 @default.
- W40658051 hasConcept C6177178 @default.
- W40658051 hasConcept C62520636 @default.
- W40658051 hasConcept C72434380 @default.
- W40658051 hasConcept C78458016 @default.
- W40658051 hasConcept C8272713 @default.
- W40658051 hasConcept C86803240 @default.
- W40658051 hasConcept C97541855 @default.
- W40658051 hasConceptScore W40658051C10138342 @default.
- W40658051 hasConceptScore W40658051C105795698 @default.
- W40658051 hasConceptScore W40658051C106189395 @default.
- W40658051 hasConceptScore W40658051C111919701 @default.
- W40658051 hasConceptScore W40658051C11413529 @default.
- W40658051 hasConceptScore W40658051C121332964 @default.
- W40658051 hasConceptScore W40658051C126255220 @default.
- W40658051 hasConceptScore W40658051C14036430 @default.
- W40658051 hasConceptScore W40658051C144237770 @default.
- W40658051 hasConceptScore W40658051C154945302 @default.
- W40658051 hasConceptScore W40658051C159886148 @default.
- W40658051 hasConceptScore W40658051C162324750 @default.
- W40658051 hasConceptScore W40658051C177264268 @default.
- W40658051 hasConceptScore W40658051C188116033 @default.
- W40658051 hasConceptScore W40658051C199360897 @default.
- W40658051 hasConceptScore W40658051C2778572836 @default.
- W40658051 hasConceptScore W40658051C2780791683 @default.
- W40658051 hasConceptScore W40658051C33923547 @default.
- W40658051 hasConceptScore W40658051C41008148 @default.
- W40658051 hasConceptScore W40658051C48103436 @default.
- W40658051 hasConceptScore W40658051C6177178 @default.
- W40658051 hasConceptScore W40658051C62520636 @default.
- W40658051 hasConceptScore W40658051C72434380 @default.
- W40658051 hasConceptScore W40658051C78458016 @default.
- W40658051 hasConceptScore W40658051C8272713 @default.
- W40658051 hasConceptScore W40658051C86803240 @default.
- W40658051 hasConceptScore W40658051C97541855 @default.
- W40658051 hasLocation W406580511 @default.
- W40658051 hasOpenAccess W40658051 @default.
- W40658051 hasPrimaryLocation W406580511 @default.
- W40658051 hasRelatedWork W13173889 @default.
- W40658051 hasRelatedWork W1557688158 @default.
- W40658051 hasRelatedWork W191780540 @default.
- W40658051 hasRelatedWork W2011074416 @default.
- W40658051 hasRelatedWork W2012045703 @default.
- W40658051 hasRelatedWork W2050323767 @default.
- W40658051 hasRelatedWork W2064637452 @default.
- W40658051 hasRelatedWork W2109693562 @default.
- W40658051 hasRelatedWork W2111003075 @default.
- W40658051 hasRelatedWork W2145559552 @default.
- W40658051 hasRelatedWork W2147505826 @default.
- W40658051 hasRelatedWork W2188623979 @default.
- W40658051 hasRelatedWork W2209913494 @default.
- W40658051 hasRelatedWork W2321325624 @default.
- W40658051 hasRelatedWork W2389489515 @default.
- W40658051 hasRelatedWork W2973166068 @default.
- W40658051 hasRelatedWork W3010332861 @default.