Matches in SemOpenAlex for { <https://semopenalex.org/work/W298069310> ?p ?o ?g. }
- W298069310 abstract "Autonomous systems are often difficult to program. Reinforcement learning (RL) is an attractive alternative, as it allows the agent to learn behavior on the basis of sparse, delayed reward signals provided only when the agent reaches desired goals. Recent attempts to address the dimensionality of RL have turned to principled ways of exploiting temporal abstraction where decisions are not required at each step but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical control architectures and associated learning algorithms. This dissertation reviews several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed and presents a new method for the autonomous construction of hierarchical action and state representations in reinforcement learning, aimed at accelerating learning and extending the scope of such systems. In this approach, the agent uses information acquired while learning one task to discover iv subgoals for similar tasks. The agent is able to transfer knowledge to subsequent tasks and to accelerate learning by creating useful new subgoals and by off-line learning of corresponding subtask policies as abstract actions (options). At the same time, the subgoal actions are used to construct a more abstract state representation using action-dependent state space partitioning. This representation forms a new level in the state space hierarchy and serves as the initial representation for new learning tasks (the decision layer). In order to ensure that tasks are learnable, value functions are built simultaneously at different levels of the hierarchy and inconsistencies are used to identify actions to be used to refine relevant portions of the abstract state space. This representation serves as a first layer of the hierarchy. In order to estimate the structure of the state space for learning future tasks, the decision layer is constructed based on an estimate of the expected time to learn a new task and the system's experience with previously learned tasks. Together, these techniques permit the agent to form more abstract action and state representations over time. Experiments in deterministic and stochastic domains show that the presented method can significantly outperform learning on a flat state space representation." @default.
- W298069310 created "2016-06-24" @default.
- W298069310 creator A5044329397 @default.
- W298069310 creator A5047174917 @default.
- W298069310 date "2006-01-01" @default.
- W298069310 modified "2023-09-24" @default.
- W298069310 title "Learning state and action space hierarchies for reinforcement learning using action-dependent partitioning" @default.
- W298069310 cites W102863053 @default.
- W298069310 cites W142462678 @default.
- W298069310 cites W1503515926 @default.
- W298069310 cites W1506742697 @default.
- W298069310 cites W1553182805 @default.
- W298069310 cites W1555801537 @default.
- W298069310 cites W1557415102 @default.
- W298069310 cites W1568042657 @default.
- W298069310 cites W1574700590 @default.
- W298069310 cites W1576452626 @default.
- W298069310 cites W1586162706 @default.
- W298069310 cites W1592847719 @default.
- W298069310 cites W1815493548 @default.
- W298069310 cites W2001729196 @default.
- W298069310 cites W2020149918 @default.
- W298069310 cites W2035446426 @default.
- W298069310 cites W2038694949 @default.
- W298069310 cites W2058735307 @default.
- W298069310 cites W2099529102 @default.
- W298069310 cites W2102000945 @default.
- W298069310 cites W2103626435 @default.
- W298069310 cites W2111625828 @default.
- W298069310 cites W2114451917 @default.
- W298069310 cites W2121517924 @default.
- W298069310 cites W2121863487 @default.
- W298069310 cites W2139418546 @default.
- W298069310 cites W2143435603 @default.
- W298069310 cites W2150339816 @default.
- W298069310 cites W2158548602 @default.
- W298069310 cites W2169022337 @default.
- W298069310 cites W2397253692 @default.
- W298069310 cites W2548352678 @default.
- W298069310 cites W3011120880 @default.
- W298069310 cites W3139377883 @default.
- W298069310 cites W3139460557 @default.
- W298069310 cites W46670808 @default.
- W298069310 hasPublicationYear "2006" @default.
- W298069310 type Work @default.
- W298069310 sameAs 298069310 @default.
- W298069310 citedByCount "2" @default.
- W298069310 countsByYear W2980693102013 @default.
- W298069310 countsByYear W2980693102014 @default.
- W298069310 crossrefType "dissertation" @default.
- W298069310 hasAuthorship W298069310A5044329397 @default.
- W298069310 hasAuthorship W298069310A5047174917 @default.
- W298069310 hasConcept C105795698 @default.
- W298069310 hasConcept C111030470 @default.
- W298069310 hasConcept C111472728 @default.
- W298069310 hasConcept C119857082 @default.
- W298069310 hasConcept C121332964 @default.
- W298069310 hasConcept C124304363 @default.
- W298069310 hasConcept C127413603 @default.
- W298069310 hasConcept C138885662 @default.
- W298069310 hasConcept C154945302 @default.
- W298069310 hasConcept C162324750 @default.
- W298069310 hasConcept C17744445 @default.
- W298069310 hasConcept C199539241 @default.
- W298069310 hasConcept C201995342 @default.
- W298069310 hasConcept C2776359362 @default.
- W298069310 hasConcept C2780451532 @default.
- W298069310 hasConcept C2780791683 @default.
- W298069310 hasConcept C31170391 @default.
- W298069310 hasConcept C33923547 @default.
- W298069310 hasConcept C34447519 @default.
- W298069310 hasConcept C41008148 @default.
- W298069310 hasConcept C62520636 @default.
- W298069310 hasConcept C72434380 @default.
- W298069310 hasConcept C94625758 @default.
- W298069310 hasConcept C97541855 @default.
- W298069310 hasConceptScore W298069310C105795698 @default.
- W298069310 hasConceptScore W298069310C111030470 @default.
- W298069310 hasConceptScore W298069310C111472728 @default.
- W298069310 hasConceptScore W298069310C119857082 @default.
- W298069310 hasConceptScore W298069310C121332964 @default.
- W298069310 hasConceptScore W298069310C124304363 @default.
- W298069310 hasConceptScore W298069310C127413603 @default.
- W298069310 hasConceptScore W298069310C138885662 @default.
- W298069310 hasConceptScore W298069310C154945302 @default.
- W298069310 hasConceptScore W298069310C162324750 @default.
- W298069310 hasConceptScore W298069310C17744445 @default.
- W298069310 hasConceptScore W298069310C199539241 @default.
- W298069310 hasConceptScore W298069310C201995342 @default.
- W298069310 hasConceptScore W298069310C2776359362 @default.
- W298069310 hasConceptScore W298069310C2780451532 @default.
- W298069310 hasConceptScore W298069310C2780791683 @default.
- W298069310 hasConceptScore W298069310C31170391 @default.
- W298069310 hasConceptScore W298069310C33923547 @default.
- W298069310 hasConceptScore W298069310C34447519 @default.
- W298069310 hasConceptScore W298069310C41008148 @default.
- W298069310 hasConceptScore W298069310C62520636 @default.
- W298069310 hasConceptScore W298069310C72434380 @default.
- W298069310 hasConceptScore W298069310C94625758 @default.
- W298069310 hasConceptScore W298069310C97541855 @default.