Matches in SemOpenAlex for { <https://semopenalex.org/work/W2904881196> ?p ?o ?g. }
- W2904881196 abstract "We present on-line policy gradient algorithms for computing the locally optimal policy of a constrained, average cost, finite state Markov Decision Process. The stochastic approximation algorithms require estimation of the gradient of the cost function with respect to the parameter that characterizes the randomized policy. We propose a spherical coordinate parametrization and present a novel simulation based gradient estimation scheme involving weak derivatives (measure-valued differentiation). Such methods have substantially reduced variance compared to the widely used score function method. Similar to neuro-dynamic programming algorithms (e.g. Q-learning or Temporal Difference methods), the algorithms proposed in this paper are simulation based and do not require explicit knowledge of the underlying parameters such as transition probabilities. However, unlike neuro-dynamic programming methods, the algorithms proposed here can handle constraints and time varying parameters. Numerical examples are given to illustrate the performance of the algorithms. This paper was originally written in 2004. One reason we are putting this on arxiv now is that the score function gradient estimator continues to be used in the online reinforcement learning literature even though its variance grows as $O(n)$ given $n$ data points (for a Markov process). In comparison the weak derivative estimator has significantly smaller variance of $O(1)$ as reported in this paper (and elsewhere)." @default.
- W2904881196 created "2018-12-22" @default.
- W2904881196 creator A5043584689 @default.
- W2904881196 creator A5068804090 @default.
- W2904881196 date "2011-10-22" @default.
- W2904881196 modified "2023-09-27" @default.
- W2904881196 title "Real-Time Reinforcement Learning of Constrained Markov Decision Processes with Weak Derivatives" @default.
- W2904881196 cites W1518931405 @default.
- W2904881196 cites W1550574544 @default.
- W2904881196 cites W1557287189 @default.
- W2904881196 cites W1599444075 @default.
- W2904881196 cites W1983016559 @default.
- W2904881196 cites W1997020095 @default.
- W2904881196 cites W203276351 @default.
- W2904881196 cites W2035446426 @default.
- W2904881196 cites W2061412229 @default.
- W2904881196 cites W2066986122 @default.
- W2904881196 cites W2095077541 @default.
- W2904881196 cites W2095962350 @default.
- W2904881196 cites W2098046554 @default.
- W2904881196 cites W2102134363 @default.
- W2904881196 cites W2107606503 @default.
- W2904881196 cites W2108293203 @default.
- W2904881196 cites W2114757210 @default.
- W2904881196 cites W2115679377 @default.
- W2904881196 cites W2119940423 @default.
- W2904881196 cites W2120198850 @default.
- W2904881196 cites W2124715093 @default.
- W2904881196 cites W2133626316 @default.
- W2904881196 cites W2138410336 @default.
- W2904881196 cites W2138515733 @default.
- W2904881196 cites W2139656913 @default.
- W2904881196 cites W2142929170 @default.
- W2904881196 cites W2157298719 @default.
- W2904881196 cites W2164599584 @default.
- W2904881196 cites W2235056388 @default.
- W2904881196 cites W2266946488 @default.
- W2904881196 cites W2334782222 @default.
- W2904881196 cites W2531891978 @default.
- W2904881196 cites W389907844 @default.
- W2904881196 hasPublicationYear "2011" @default.
- W2904881196 type Work @default.
- W2904881196 sameAs 2904881196 @default.
- W2904881196 citedByCount "2" @default.
- W2904881196 countsByYear W29048811962020 @default.
- W2904881196 countsByYear W29048811962021 @default.
- W2904881196 crossrefType "posted-content" @default.
- W2904881196 hasAuthorship W2904881196A5043584689 @default.
- W2904881196 hasAuthorship W2904881196A5068804090 @default.
- W2904881196 hasConcept C105795698 @default.
- W2904881196 hasConcept C106189395 @default.
- W2904881196 hasConcept C11413529 @default.
- W2904881196 hasConcept C119857082 @default.
- W2904881196 hasConcept C121332964 @default.
- W2904881196 hasConcept C121955636 @default.
- W2904881196 hasConcept C126255220 @default.
- W2904881196 hasConcept C14036430 @default.
- W2904881196 hasConcept C144133560 @default.
- W2904881196 hasConcept C14646407 @default.
- W2904881196 hasConcept C154945302 @default.
- W2904881196 hasConcept C159886148 @default.
- W2904881196 hasConcept C185429906 @default.
- W2904881196 hasConcept C196083921 @default.
- W2904881196 hasConcept C196340769 @default.
- W2904881196 hasConcept C202887219 @default.
- W2904881196 hasConcept C28826006 @default.
- W2904881196 hasConcept C33923547 @default.
- W2904881196 hasConcept C37404715 @default.
- W2904881196 hasConcept C41008148 @default.
- W2904881196 hasConcept C62520636 @default.
- W2904881196 hasConcept C74902906 @default.
- W2904881196 hasConcept C78458016 @default.
- W2904881196 hasConcept C86803240 @default.
- W2904881196 hasConcept C97541855 @default.
- W2904881196 hasConcept C98763669 @default.
- W2904881196 hasConceptScore W2904881196C105795698 @default.
- W2904881196 hasConceptScore W2904881196C106189395 @default.
- W2904881196 hasConceptScore W2904881196C11413529 @default.
- W2904881196 hasConceptScore W2904881196C119857082 @default.
- W2904881196 hasConceptScore W2904881196C121332964 @default.
- W2904881196 hasConceptScore W2904881196C121955636 @default.
- W2904881196 hasConceptScore W2904881196C126255220 @default.
- W2904881196 hasConceptScore W2904881196C14036430 @default.
- W2904881196 hasConceptScore W2904881196C144133560 @default.
- W2904881196 hasConceptScore W2904881196C14646407 @default.
- W2904881196 hasConceptScore W2904881196C154945302 @default.
- W2904881196 hasConceptScore W2904881196C159886148 @default.
- W2904881196 hasConceptScore W2904881196C185429906 @default.
- W2904881196 hasConceptScore W2904881196C196083921 @default.
- W2904881196 hasConceptScore W2904881196C196340769 @default.
- W2904881196 hasConceptScore W2904881196C202887219 @default.
- W2904881196 hasConceptScore W2904881196C28826006 @default.
- W2904881196 hasConceptScore W2904881196C33923547 @default.
- W2904881196 hasConceptScore W2904881196C37404715 @default.
- W2904881196 hasConceptScore W2904881196C41008148 @default.
- W2904881196 hasConceptScore W2904881196C62520636 @default.
- W2904881196 hasConceptScore W2904881196C74902906 @default.
- W2904881196 hasConceptScore W2904881196C78458016 @default.
- W2904881196 hasConceptScore W2904881196C86803240 @default.
- W2904881196 hasConceptScore W2904881196C97541855 @default.