SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W2120968583> ?p ?o ?g. }

Showing items 1 to 95 of 95 with 100 items per page.

W2120968583 abstract "The majority of learning algorithms available today focus on approximating the state (V ) or state-action (Q) value function and efficient action selection comes as an afterthought. On the other hand, real-world problems tend to have large action spaces, where evaluating every possible action becomes impractical. This mismatch presents a major obstacle in successfully applying reinforcement learning to real-world problems. In this paper we present an effective approach to learning and acting in domains with multidimensional and/or continuous control variables where efficient action selection is embedded in the learning process. Instead of learning and representing the state or state-action value function of the MDP, we learn a value function over an implied augmented MDP, where states represent collections of actions in the original MDP and transitions represent choices eliminating parts of the action space at each step. Action selection in the original MDP is reduced to a binary search by the agent in the transformed MDP, with computational complexity logarithmic in the number of actions, or equivalently linear in the number of action dimensions. Our method can be combined with any discrete-action reinforcement learning algorithm for learning multidimensional continuous-action policies using a state value approximator in the transformed MDP. Our preliminary results with two well-known reinforcement learning algorithms (Least-Squares Policy Iteration and Fitted Q-Iteration) on two continuous action domains (1-dimensional inverted pendulum regulator, 2-dimensional bicycle balancing) demonstrate the viability and the potential of the proposed approach." @default.
W2120968583 created "2016-06-24" @default.
W2120968583 creator A5031267271 @default.
W2120968583 creator A5084693234 @default.
W2120968583 date "2011-04-01" @default.
W2120968583 modified "2023-10-14" @default.
W2120968583 title "Reinforcement learning in multidimensional continuous action spaces" @default.
W2120968583 cites W1531900088 @default.
W2120968583 cites W2068052921 @default.
W2120968583 cites W2097451572 @default.
W2120968583 cites W2127107099 @default.
W2120968583 cites W2127251115 @default.
W2120968583 cites W2146199577 @default.
W2120968583 cites W2154549708 @default.
W2120968583 cites W2158479468 @default.
W2120968583 doi "https://doi.org/10.1109/adprl.2011.5967381" @default.
W2120968583 hasPublicationYear "2011" @default.
W2120968583 type Work @default.
W2120968583 sameAs 2120968583 @default.
W2120968583 citedByCount "24" @default.
W2120968583 countsByYear W21209685832012 @default.
W2120968583 countsByYear W21209685832013 @default.
W2120968583 countsByYear W21209685832014 @default.
W2120968583 countsByYear W21209685832015 @default.
W2120968583 countsByYear W21209685832016 @default.
W2120968583 countsByYear W21209685832017 @default.
W2120968583 countsByYear W21209685832018 @default.
W2120968583 countsByYear W21209685832020 @default.
W2120968583 countsByYear W21209685832022 @default.
W2120968583 crossrefType "proceedings-article" @default.
W2120968583 hasAuthorship W2120968583A5031267271 @default.
W2120968583 hasAuthorship W2120968583A5084693234 @default.
W2120968583 hasConcept C105795698 @default.
W2120968583 hasConcept C106189395 @default.
W2120968583 hasConcept C119857082 @default.
W2120968583 hasConcept C121332964 @default.
W2120968583 hasConcept C126255220 @default.
W2120968583 hasConcept C14036430 @default.
W2120968583 hasConcept C14646407 @default.
W2120968583 hasConcept C154945302 @default.
W2120968583 hasConcept C158622935 @default.
W2120968583 hasConcept C159886148 @default.
W2120968583 hasConcept C166109690 @default.
W2120968583 hasConcept C169760540 @default.
W2120968583 hasConcept C188116033 @default.
W2120968583 hasConcept C192921069 @default.
W2120968583 hasConcept C26760741 @default.
W2120968583 hasConcept C2780791683 @default.
W2120968583 hasConcept C33923547 @default.
W2120968583 hasConcept C41008148 @default.
W2120968583 hasConcept C62520636 @default.
W2120968583 hasConcept C72434380 @default.
W2120968583 hasConcept C78458016 @default.
W2120968583 hasConcept C86803240 @default.
W2120968583 hasConcept C97541855 @default.
W2120968583 hasConceptScore W2120968583C105795698 @default.
W2120968583 hasConceptScore W2120968583C106189395 @default.
W2120968583 hasConceptScore W2120968583C119857082 @default.
W2120968583 hasConceptScore W2120968583C121332964 @default.
W2120968583 hasConceptScore W2120968583C126255220 @default.
W2120968583 hasConceptScore W2120968583C14036430 @default.
W2120968583 hasConceptScore W2120968583C14646407 @default.
W2120968583 hasConceptScore W2120968583C154945302 @default.
W2120968583 hasConceptScore W2120968583C158622935 @default.
W2120968583 hasConceptScore W2120968583C159886148 @default.
W2120968583 hasConceptScore W2120968583C166109690 @default.
W2120968583 hasConceptScore W2120968583C169760540 @default.
W2120968583 hasConceptScore W2120968583C188116033 @default.
W2120968583 hasConceptScore W2120968583C192921069 @default.
W2120968583 hasConceptScore W2120968583C26760741 @default.
W2120968583 hasConceptScore W2120968583C2780791683 @default.
W2120968583 hasConceptScore W2120968583C33923547 @default.
W2120968583 hasConceptScore W2120968583C41008148 @default.
W2120968583 hasConceptScore W2120968583C62520636 @default.
W2120968583 hasConceptScore W2120968583C72434380 @default.
W2120968583 hasConceptScore W2120968583C78458016 @default.
W2120968583 hasConceptScore W2120968583C86803240 @default.
W2120968583 hasConceptScore W2120968583C97541855 @default.
W2120968583 hasLocation W21209685831 @default.
W2120968583 hasOpenAccess W2120968583 @default.
W2120968583 hasPrimaryLocation W21209685831 @default.
W2120968583 hasRelatedWork W1583080569 @default.
W2120968583 hasRelatedWork W2059069309 @default.
W2120968583 hasRelatedWork W2120968583 @default.
W2120968583 hasRelatedWork W2145043286 @default.
W2120968583 hasRelatedWork W2151416233 @default.
W2120968583 hasRelatedWork W2170607316 @default.
W2120968583 hasRelatedWork W2353483528 @default.
W2120968583 hasRelatedWork W2383312578 @default.
W2120968583 hasRelatedWork W2937181779 @default.
W2120968583 hasRelatedWork W616059226 @default.
W2120968583 isParatext "false" @default.
W2120968583 isRetracted "false" @default.
W2120968583 magId "2120968583" @default.
W2120968583 workType "article" @default.