Matches in SemOpenAlex for { <https://semopenalex.org/work/W3036846812> ?p ?o ?g. }
- W3036846812 abstract "We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modelled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is to quantify exploration in terms of the Shannon Entropy of the trajectories under the MDP and determine the stochastic policy that maximizes it while guaranteeing a low value of the expected cost along a trajectory. This resulting policy enhances the quality of exploration early on in the learning process, and consequently allows faster convergence rates and robust solutions even in the presence of noisy data as demonstrated in our comparisons to popular algorithms such as Q-learning, Double Q-learning and entropy regularized Soft Q-learning. The framework extends to the class of parameterized MDP and RL problems, where states and actions are parameter dependent, and the objective is to determine the optimal parameters along with the corresponding optimal policy. Here, the associated cost function can possibly be non-convex with multiple poor local minima. Simulation results applied to a 5G small cell network problem demonstrate successful determination of communication routes and the small cell locations. We also obtain sensitivity measures to problem parameters and robustness to noisy environment data." @default.
- W3036846812 created "2020-06-25" @default.
- W3036846812 creator A5051680600 @default.
- W3036846812 creator A5069205144 @default.
- W3036846812 date "2020-06-17" @default.
- W3036846812 modified "2023-09-27" @default.
- W3036846812 title "Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework" @default.
- W3036846812 cites W1499669280 @default.
- W3036846812 cites W1509562192 @default.
- W3036846812 cites W1516418405 @default.
- W3036846812 cites W1522099992 @default.
- W3036846812 cites W1576452626 @default.
- W3036846812 cites W1745373831 @default.
- W3036846812 cites W1771410628 @default.
- W3036846812 cites W1951602908 @default.
- W3036846812 cites W1974307281 @default.
- W3036846812 cites W1980648727 @default.
- W3036846812 cites W1998783985 @default.
- W3036846812 cites W2019064215 @default.
- W3036846812 cites W2022918053 @default.
- W3036846812 cites W2032558547 @default.
- W3036846812 cites W2057898911 @default.
- W3036846812 cites W2070469928 @default.
- W3036846812 cites W2077085434 @default.
- W3036846812 cites W2098774185 @default.
- W3036846812 cites W2103718624 @default.
- W3036846812 cites W2108711094 @default.
- W3036846812 cites W2112492957 @default.
- W3036846812 cites W2115211925 @default.
- W3036846812 cites W2121110499 @default.
- W3036846812 cites W2142477416 @default.
- W3036846812 cites W2143072483 @default.
- W3036846812 cites W2157174816 @default.
- W3036846812 cites W2168839459 @default.
- W3036846812 cites W2169324952 @default.
- W3036846812 cites W2253991908 @default.
- W3036846812 cites W2400318990 @default.
- W3036846812 cites W2402108766 @default.
- W3036846812 cites W2560615388 @default.
- W3036846812 cites W2593044849 @default.
- W3036846812 cites W2619268125 @default.
- W3036846812 cites W2752501126 @default.
- W3036846812 cites W2781726626 @default.
- W3036846812 cites W2787933113 @default.
- W3036846812 cites W2897417898 @default.
- W3036846812 cites W2912924301 @default.
- W3036846812 cites W2949078258 @default.
- W3036846812 cites W2949996623 @default.
- W3036846812 cites W2952171869 @default.
- W3036846812 cites W2962847657 @default.
- W3036846812 cites W2963169817 @default.
- W3036846812 cites W2963257680 @default.
- W3036846812 cites W2963312729 @default.
- W3036846812 cites W2971463999 @default.
- W3036846812 cites W2986925736 @default.
- W3036846812 cites W3027719445 @default.
- W3036846812 cites W76530103 @default.
- W3036846812 hasPublicationYear "2020" @default.
- W3036846812 type Work @default.
- W3036846812 sameAs 3036846812 @default.
- W3036846812 citedByCount "0" @default.
- W3036846812 crossrefType "posted-content" @default.
- W3036846812 hasAuthorship W3036846812A5051680600 @default.
- W3036846812 hasAuthorship W3036846812A5069205144 @default.
- W3036846812 hasConcept C104317684 @default.
- W3036846812 hasConcept C105795698 @default.
- W3036846812 hasConcept C106189395 @default.
- W3036846812 hasConcept C106301342 @default.
- W3036846812 hasConcept C11413529 @default.
- W3036846812 hasConcept C121332964 @default.
- W3036846812 hasConcept C126255220 @default.
- W3036846812 hasConcept C134306372 @default.
- W3036846812 hasConcept C14646407 @default.
- W3036846812 hasConcept C154945302 @default.
- W3036846812 hasConcept C159886148 @default.
- W3036846812 hasConcept C165464430 @default.
- W3036846812 hasConcept C185592680 @default.
- W3036846812 hasConcept C186633575 @default.
- W3036846812 hasConcept C188116033 @default.
- W3036846812 hasConcept C33923547 @default.
- W3036846812 hasConcept C41008148 @default.
- W3036846812 hasConcept C55493867 @default.
- W3036846812 hasConcept C62520636 @default.
- W3036846812 hasConcept C63479239 @default.
- W3036846812 hasConcept C9679016 @default.
- W3036846812 hasConcept C97541855 @default.
- W3036846812 hasConceptScore W3036846812C104317684 @default.
- W3036846812 hasConceptScore W3036846812C105795698 @default.
- W3036846812 hasConceptScore W3036846812C106189395 @default.
- W3036846812 hasConceptScore W3036846812C106301342 @default.
- W3036846812 hasConceptScore W3036846812C11413529 @default.
- W3036846812 hasConceptScore W3036846812C121332964 @default.
- W3036846812 hasConceptScore W3036846812C126255220 @default.
- W3036846812 hasConceptScore W3036846812C134306372 @default.
- W3036846812 hasConceptScore W3036846812C14646407 @default.
- W3036846812 hasConceptScore W3036846812C154945302 @default.
- W3036846812 hasConceptScore W3036846812C159886148 @default.
- W3036846812 hasConceptScore W3036846812C165464430 @default.
- W3036846812 hasConceptScore W3036846812C185592680 @default.
- W3036846812 hasConceptScore W3036846812C186633575 @default.