Matches in SemOpenAlex for { <https://semopenalex.org/work/W2964252020> ?p ?o ?g. }
Showing items 1 to 94 of
94
with 100 items per page.
- W2964252020 endingPage "1958" @default.
- W2964252020 startingPage "1949" @default.
- W2964252020 abstract "We introduce Recurrent Predictive State Policy (RPSP) networks, a recurrent architecture that brings insights from predictive state representations to reinforcement learning in partially observable environments. Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to actions, to maximize the cumulative reward. The recursive filter leverages predictive state representations (PSRs) (Rosencrantz and Gordon, 2004; Sun et al., 2016) by modeling predictive state-- a prediction of the distribution of future observations conditioned on history and future actions. This representation gives rise to a rich class of statistically consistent algorithms (Hefny et al., 2018) to initialize the recursive filter. Predictive state serves as an equivalent representation of a belief state. Therefore, the policy component of the RPSP-network can be purely reactive, simplifying training while still allowing optimal behaviour. Moreover, we use the PSR interpretation during training as well, by incorporating prediction error in the loss function. The entire network (recursive filter and reactive policy) is still differentiable and can be trained using gradient based methods. We optimize our policy using a combination of policy gradient based on rewards (Williams, 1992) and gradient descent based on prediction error. We show the efficacy of RPSP-networks under partial observability on a set of robotic control tasks from OpenAI Gym. We empirically show that RPSP-networks perform well compared with memory-preserving networks such as GRUs, as well as finite memory models, being the overall best performing method." @default.
- W2964252020 created "2019-07-30" @default.
- W2964252020 creator A5003652981 @default.
- W2964252020 creator A5012830032 @default.
- W2964252020 creator A5032266950 @default.
- W2964252020 creator A5077719529 @default.
- W2964252020 creator A5081932185 @default.
- W2964252020 date "2018-03-05" @default.
- W2964252020 modified "2023-09-27" @default.
- W2964252020 title "Recurrent Predictive State Policy Networks" @default.
- W2964252020 hasPublicationYear "2018" @default.
- W2964252020 type Work @default.
- W2964252020 sameAs 2964252020 @default.
- W2964252020 citedByCount "1" @default.
- W2964252020 countsByYear W29642520202021 @default.
- W2964252020 crossrefType "proceedings-article" @default.
- W2964252020 hasAuthorship W2964252020A5003652981 @default.
- W2964252020 hasAuthorship W2964252020A5012830032 @default.
- W2964252020 hasAuthorship W2964252020A5032266950 @default.
- W2964252020 hasAuthorship W2964252020A5077719529 @default.
- W2964252020 hasAuthorship W2964252020A5081932185 @default.
- W2964252020 hasConcept C106131492 @default.
- W2964252020 hasConcept C11413529 @default.
- W2964252020 hasConcept C119857082 @default.
- W2964252020 hasConcept C134306372 @default.
- W2964252020 hasConcept C147168706 @default.
- W2964252020 hasConcept C153258448 @default.
- W2964252020 hasConcept C154945302 @default.
- W2964252020 hasConcept C172205157 @default.
- W2964252020 hasConcept C17744445 @default.
- W2964252020 hasConcept C199539241 @default.
- W2964252020 hasConcept C202615002 @default.
- W2964252020 hasConcept C2775924081 @default.
- W2964252020 hasConcept C2776359362 @default.
- W2964252020 hasConcept C28826006 @default.
- W2964252020 hasConcept C31972630 @default.
- W2964252020 hasConcept C33923547 @default.
- W2964252020 hasConcept C36299963 @default.
- W2964252020 hasConcept C41008148 @default.
- W2964252020 hasConcept C48103436 @default.
- W2964252020 hasConcept C50644808 @default.
- W2964252020 hasConcept C94625758 @default.
- W2964252020 hasConcept C97541855 @default.
- W2964252020 hasConceptScore W2964252020C106131492 @default.
- W2964252020 hasConceptScore W2964252020C11413529 @default.
- W2964252020 hasConceptScore W2964252020C119857082 @default.
- W2964252020 hasConceptScore W2964252020C134306372 @default.
- W2964252020 hasConceptScore W2964252020C147168706 @default.
- W2964252020 hasConceptScore W2964252020C153258448 @default.
- W2964252020 hasConceptScore W2964252020C154945302 @default.
- W2964252020 hasConceptScore W2964252020C172205157 @default.
- W2964252020 hasConceptScore W2964252020C17744445 @default.
- W2964252020 hasConceptScore W2964252020C199539241 @default.
- W2964252020 hasConceptScore W2964252020C202615002 @default.
- W2964252020 hasConceptScore W2964252020C2775924081 @default.
- W2964252020 hasConceptScore W2964252020C2776359362 @default.
- W2964252020 hasConceptScore W2964252020C28826006 @default.
- W2964252020 hasConceptScore W2964252020C31972630 @default.
- W2964252020 hasConceptScore W2964252020C33923547 @default.
- W2964252020 hasConceptScore W2964252020C36299963 @default.
- W2964252020 hasConceptScore W2964252020C41008148 @default.
- W2964252020 hasConceptScore W2964252020C48103436 @default.
- W2964252020 hasConceptScore W2964252020C50644808 @default.
- W2964252020 hasConceptScore W2964252020C94625758 @default.
- W2964252020 hasConceptScore W2964252020C97541855 @default.
- W2964252020 hasLocation W29642520201 @default.
- W2964252020 hasOpenAccess W2964252020 @default.
- W2964252020 hasPrimaryLocation W29642520201 @default.
- W2964252020 hasRelatedWork W1486341833 @default.
- W2964252020 hasRelatedWork W2144655553 @default.
- W2964252020 hasRelatedWork W2563705050 @default.
- W2964252020 hasRelatedWork W2573393487 @default.
- W2964252020 hasRelatedWork W2752954743 @default.
- W2964252020 hasRelatedWork W2780045768 @default.
- W2964252020 hasRelatedWork W2786711334 @default.
- W2964252020 hasRelatedWork W2793100723 @default.
- W2964252020 hasRelatedWork W2794683864 @default.
- W2964252020 hasRelatedWork W2910124058 @default.
- W2964252020 hasRelatedWork W2950141689 @default.
- W2964252020 hasRelatedWork W3002765113 @default.
- W2964252020 hasRelatedWork W3024178066 @default.
- W2964252020 hasRelatedWork W3025794913 @default.
- W2964252020 hasRelatedWork W3039664027 @default.
- W2964252020 hasRelatedWork W3101007912 @default.
- W2964252020 hasRelatedWork W3166968915 @default.
- W2964252020 hasRelatedWork W3202097587 @default.
- W2964252020 hasRelatedWork W3204058460 @default.
- W2964252020 hasRelatedWork W3205794883 @default.
- W2964252020 isParatext "false" @default.
- W2964252020 isRetracted "false" @default.
- W2964252020 magId "2964252020" @default.
- W2964252020 workType "article" @default.