Matches in SemOpenAlex for { <https://semopenalex.org/work/W2119792915> ?p ?o ?g. }
- W2119792915 abstract "We present stochastic approximation algorithms for computing the locally optimal policy of a constrained average cost finite state Markov Decision process. Because the optimal control strategy is known to be a randomized policy, we consider here a parameterization of the action probabilities to establish the optimization problem. The stochastic approximation algorithms require computation of the gradient of the cost function with respect to the parameter that characterizes the randomized policy. This is computed by novel simulation based gradient estimation schemes involving weak derivatives. Similar to neuro-dynamic programming algorithms (e.g. Q-learning or Temporal Difference methods), the algorithms proposed in this paper are simulation based and do not require explicit knowledge of the underlying parameters such as transition probabilities. However, unlike neuro-dynamic programming methods, the algorithms proposed here can handle constraints and time varying parameters. Numerical examples are given to illustrate the performance of the algorithms. Resume Nous considerons le probleme du controle optimale des châines de Markov commandees (MDP) avec des contraintes. Sous la randomisation des actions, le probleme est parametrise et peut se re-ecrire en terme d’un probleme d’optimisation non-lineaire avec des contraintes non -lineaires, pour lequel on applique l’approximation stochastique. Celle-ci a besoin de calculer certains gradients par rapport aux parametres de controle. Nous proposons une nouvelle methode pour l’estimationde tels gradients avec des derivees faibles. Notre methode est robuste (comme les algorithmes de Q-learning et des differences temporales) et ne suppose pas que les probabilites de transition soient connues. Par ailleurs, notre methode peut etre appliquee lors qu’il y a des contraintes, contrairement aux autres methodes mentionees. Nous presentons aussi des exemples numeriques pour illustrer la performance de nos algorithmes. Acknowledgments: This work was done while the first author was on leave at the Department of Electrical and Electronic Engineering at the University of Melbourne. The research was supported by the Australian Research Council and research grants from NSERC, Canada and FCAR, Quebec. Les Cahiers du GERAD G–2003–51 1" @default.
- W2119792915 created "2016-06-24" @default.
- W2119792915 creator A5064028141 @default.
- W2119792915 creator A5068804090 @default.
- W2119792915 date "2003-01-01" @default.
- W2119792915 modified "2023-09-30" @default.
- W2119792915 title "Self Learning Control of Constrained Markov Decision Processes - A Gradient Approach" @default.
- W2119792915 cites W1512866443 @default.
- W2119792915 cites W1518931405 @default.
- W2119792915 cites W1535136082 @default.
- W2119792915 cites W1557287189 @default.
- W2119792915 cites W1576452626 @default.
- W2119792915 cites W1593494339 @default.
- W2119792915 cites W1599444075 @default.
- W2119792915 cites W1669104078 @default.
- W2119792915 cites W1981232797 @default.
- W2119792915 cites W2075167161 @default.
- W2119792915 cites W2095962350 @default.
- W2119792915 cites W2098432798 @default.
- W2119792915 cites W2114364126 @default.
- W2119792915 cites W2114757210 @default.
- W2119792915 cites W2118309135 @default.
- W2119792915 cites W2118943752 @default.
- W2119792915 cites W2128477394 @default.
- W2119792915 cites W2133626316 @default.
- W2119792915 cites W2142929170 @default.
- W2119792915 cites W2161142726 @default.
- W2119792915 cites W2235056388 @default.
- W2119792915 cites W2266946488 @default.
- W2119792915 cites W2334782222 @default.
- W2119792915 cites W2531891978 @default.
- W2119792915 cites W2798766386 @default.
- W2119792915 hasPublicationYear "2003" @default.
- W2119792915 type Work @default.
- W2119792915 sameAs 2119792915 @default.
- W2119792915 citedByCount "2" @default.
- W2119792915 countsByYear W21197929152020 @default.
- W2119792915 crossrefType "journal-article" @default.
- W2119792915 hasAuthorship W2119792915A5064028141 @default.
- W2119792915 hasAuthorship W2119792915A5068804090 @default.
- W2119792915 hasConcept C105795698 @default.
- W2119792915 hasConcept C106189395 @default.
- W2119792915 hasConcept C11413529 @default.
- W2119792915 hasConcept C126255220 @default.
- W2119792915 hasConcept C154945302 @default.
- W2119792915 hasConcept C159886148 @default.
- W2119792915 hasConcept C196340769 @default.
- W2119792915 hasConcept C26517878 @default.
- W2119792915 hasConcept C28826006 @default.
- W2119792915 hasConcept C33923547 @default.
- W2119792915 hasConcept C37404715 @default.
- W2119792915 hasConcept C38652104 @default.
- W2119792915 hasConcept C41008148 @default.
- W2119792915 hasConcept C45374587 @default.
- W2119792915 hasConcept C55479107 @default.
- W2119792915 hasConcept C91575142 @default.
- W2119792915 hasConcept C97541855 @default.
- W2119792915 hasConceptScore W2119792915C105795698 @default.
- W2119792915 hasConceptScore W2119792915C106189395 @default.
- W2119792915 hasConceptScore W2119792915C11413529 @default.
- W2119792915 hasConceptScore W2119792915C126255220 @default.
- W2119792915 hasConceptScore W2119792915C154945302 @default.
- W2119792915 hasConceptScore W2119792915C159886148 @default.
- W2119792915 hasConceptScore W2119792915C196340769 @default.
- W2119792915 hasConceptScore W2119792915C26517878 @default.
- W2119792915 hasConceptScore W2119792915C28826006 @default.
- W2119792915 hasConceptScore W2119792915C33923547 @default.
- W2119792915 hasConceptScore W2119792915C37404715 @default.
- W2119792915 hasConceptScore W2119792915C38652104 @default.
- W2119792915 hasConceptScore W2119792915C41008148 @default.
- W2119792915 hasConceptScore W2119792915C45374587 @default.
- W2119792915 hasConceptScore W2119792915C55479107 @default.
- W2119792915 hasConceptScore W2119792915C91575142 @default.
- W2119792915 hasConceptScore W2119792915C97541855 @default.
- W2119792915 hasLocation W21197929151 @default.
- W2119792915 hasOpenAccess W2119792915 @default.
- W2119792915 hasPrimaryLocation W21197929151 @default.
- W2119792915 hasRelatedWork W1497451517 @default.
- W2119792915 hasRelatedWork W1574991376 @default.
- W2119792915 hasRelatedWork W1604075136 @default.
- W2119792915 hasRelatedWork W1606999389 @default.
- W2119792915 hasRelatedWork W1978111124 @default.
- W2119792915 hasRelatedWork W1998148366 @default.
- W2119792915 hasRelatedWork W2003889205 @default.
- W2119792915 hasRelatedWork W2030547865 @default.
- W2119792915 hasRelatedWork W2039412751 @default.
- W2119792915 hasRelatedWork W2049472152 @default.
- W2119792915 hasRelatedWork W2053507979 @default.
- W2119792915 hasRelatedWork W2070570138 @default.
- W2119792915 hasRelatedWork W2084256858 @default.
- W2119792915 hasRelatedWork W2124217195 @default.
- W2119792915 hasRelatedWork W2156021013 @default.
- W2119792915 hasRelatedWork W2170492303 @default.
- W2119792915 hasRelatedWork W2393849047 @default.
- W2119792915 hasRelatedWork W2561625472 @default.
- W2119792915 hasRelatedWork W429516588 @default.
- W2119792915 hasRelatedWork W2160343662 @default.
- W2119792915 isParatext "false" @default.
- W2119792915 isRetracted "false" @default.
- W2119792915 magId "2119792915" @default.