Matches in SemOpenAlex for { <https://semopenalex.org/work/W2741177894> ?p ?o ?g. }
Showing items 1 to 69 of
69
with 100 items per page.
- W2741177894 abstract "! IntroductionOne of the main research questions in developmental robotics is how an embodied agent can learn from its environment. In this work, a developmental architecture will be implemented for the mobile robot Corvid [1]. The architecture is expected to enable the robot to learn how to control its movements in order to reach desired sensor states. In order to accomplish this, a reinforcement approach will be used. The architecture will be evaluated in experimental setups, in which the robot will have to learn different motor sequences.! Challenges for autonomous learningFor a mobile robot, the exploration space is vast as it can around and observe a potentially very big environment. This poses a big problem for reinforcement approaches in general. In the beginning, the robot will not have any information about the environment and how its actions influence its surroundings. Equipped only with a algorithm, the robot has to spend its time either exploring its environment, or actively pursuing goals using previously learned sequences. This problem is known as the exploration versus exploitation tradeoff.How motor sequences can be learned is another open question that will be addressed in this thesis. In many reinforcement implementations, only the current state is considered. However, a robot can experience the same or similar sensor and motor states in very different situations, leading to what is known as the perceptual aliasing problem.! Learning by trial and errorReinforcement is a supervised technique which has been applied in a wide variety of problems. Robotics presents new challenges for reinforcement as it deals with continuous sensor and motor spaces, which are only partially observable. A main part of this work will be to overcome these obstacles and to enable a robot to learn a set of basic motor sequences, which can be trained by providing a reward signal. The architecture will be implemented on Corvid, a mobile robot that uses two tracks to move, and has eight ultra-sound distance sensors as well as a camera.Reward signals come in a wide variety. In some previous experiments, intrinsic motivation is considered, which means that the reward signal is dependent on an internal measure, such as learning progress [2]. Here, we will consider extrinsic motivation, as the reward signal will be given by the distance to the desired goal state. The system should be able to cope with different reward signals, and therefore be able to learn to reach different goal states. An example for a goal state is move forward as fast as possible, but keep a minimum distance to obstacles. This goal state can be translated to a real-valued reward, depending on the current motor and sensor values. Formulated in another way, the robot would be punished by slowly or driving into objects. Another example that will be considered is the goal state of moving as slow as possible but maintaining a minimum distance to obstacles. Here, the robot is expected to deal with objects that towards it, and react by away from them.! Creating models of sequential dataIn order to implement reinforcement learning, the system needs to be capable of predicting the reward signal, so that it can generate appropriate actions. Ideally, these actions maximize the future reward. In related work, neural networks and other classifiers are often used to predict the reward signal from the agents current state. Due to the perceptual aliasing problem however, using only the current state may not be enough to get a good forward model. Therefore, many approaches take into account previous sensor states of the agent, resulting in time series models. This is also the approach that will be taken in this work. However, it is still very hard for a classifier to directly learn a good model of the reward signal from the agents states.Recently, it has been shown that it might be useful to decouple the supervised process into several different processes, of which only the last one tries to directly predict the reward signal. The idea is to first utilize unsupervised in order to build a good model of how the agents state changes over time. On top of this model, another classifier should then predict the reward signal from the newly learned internal representation.Restricted Boltzmann Machines (RBMs) are neural networks that have properties which make this kind of feasible. They have been shown to outperform other neural networks on several classical machine tasks, when using this layered architecture. Additionally, a variant of RBM is capable of time series models [3]. For this project, the robot will use the forward sequence model generated by the RBMs to sample possible future states, and select the actions with the highest predicted reward.! Outlook While goal-oriented autonomous is an open question in developmental robotics, previous research suggests that some limitations of reinforcement can be overcome. In particular, simple motor sequences by trial and failure should be possible using the proposed architecture. In the upcoming experiments, it will be evaluated how well the neural networks model the sensor space, and whether the mobile robot is able to reach several different goal states.[1] Michael Zillich, Michael Baumann, Wolfgang Knefel, and Christoph Langauer. Corvid: A Versatile Platform for Exploring Mobile Manipulation. 2010.[2] Adrien Baranes and Pierre-Yves Oudeyer. R-IAC: Robust Intrinsically Motivated Exploration and Active Learning. 2009.[3] Ilya Sutskever and Geoffrey Hinton. Learning Multilevel Distributed Representations for High-dimensional Sequences. 2006." @default.
- W2741177894 created "2017-08-08" @default.
- W2741177894 creator A5054372564 @default.
- W2741177894 date "2011-06-08" @default.
- W2741177894 modified "2023-09-27" @default.
- W2741177894 title "Goal-driven developmental learning on a mobile robot" @default.
- W2741177894 hasPublicationYear "2011" @default.
- W2741177894 type Work @default.
- W2741177894 sameAs 2741177894 @default.
- W2741177894 citedByCount "0" @default.
- W2741177894 crossrefType "journal-article" @default.
- W2741177894 hasAuthorship W2741177894A5054372564 @default.
- W2741177894 hasConcept C107457646 @default.
- W2741177894 hasConcept C136197465 @default.
- W2741177894 hasConcept C136536468 @default.
- W2741177894 hasConcept C154945302 @default.
- W2741177894 hasConcept C162947575 @default.
- W2741177894 hasConcept C188888258 @default.
- W2741177894 hasConcept C192327766 @default.
- W2741177894 hasConcept C19966478 @default.
- W2741177894 hasConcept C2778835581 @default.
- W2741177894 hasConcept C34413123 @default.
- W2741177894 hasConcept C4069607 @default.
- W2741177894 hasConcept C41008148 @default.
- W2741177894 hasConcept C65401140 @default.
- W2741177894 hasConcept C90509273 @default.
- W2741177894 hasConcept C97541855 @default.
- W2741177894 hasConceptScore W2741177894C107457646 @default.
- W2741177894 hasConceptScore W2741177894C136197465 @default.
- W2741177894 hasConceptScore W2741177894C136536468 @default.
- W2741177894 hasConceptScore W2741177894C154945302 @default.
- W2741177894 hasConceptScore W2741177894C162947575 @default.
- W2741177894 hasConceptScore W2741177894C188888258 @default.
- W2741177894 hasConceptScore W2741177894C192327766 @default.
- W2741177894 hasConceptScore W2741177894C19966478 @default.
- W2741177894 hasConceptScore W2741177894C2778835581 @default.
- W2741177894 hasConceptScore W2741177894C34413123 @default.
- W2741177894 hasConceptScore W2741177894C4069607 @default.
- W2741177894 hasConceptScore W2741177894C41008148 @default.
- W2741177894 hasConceptScore W2741177894C65401140 @default.
- W2741177894 hasConceptScore W2741177894C90509273 @default.
- W2741177894 hasConceptScore W2741177894C97541855 @default.
- W2741177894 hasLocation W27411778941 @default.
- W2741177894 hasOpenAccess W2741177894 @default.
- W2741177894 hasPrimaryLocation W27411778941 @default.
- W2741177894 hasRelatedWork W140788646 @default.
- W2741177894 hasRelatedWork W1553023101 @default.
- W2741177894 hasRelatedWork W164072472 @default.
- W2741177894 hasRelatedWork W1670925118 @default.
- W2741177894 hasRelatedWork W1820328889 @default.
- W2741177894 hasRelatedWork W1884601587 @default.
- W2741177894 hasRelatedWork W1960231662 @default.
- W2741177894 hasRelatedWork W1991618009 @default.
- W2741177894 hasRelatedWork W2005436834 @default.
- W2741177894 hasRelatedWork W2008386664 @default.
- W2741177894 hasRelatedWork W2100370041 @default.
- W2741177894 hasRelatedWork W2114882146 @default.
- W2741177894 hasRelatedWork W2120982521 @default.
- W2741177894 hasRelatedWork W2623219220 @default.
- W2741177894 hasRelatedWork W3024138969 @default.
- W2741177894 hasRelatedWork W3032077725 @default.
- W2741177894 hasRelatedWork W3083092137 @default.
- W2741177894 hasRelatedWork W3090775013 @default.
- W2741177894 hasRelatedWork W904535523 @default.
- W2741177894 hasRelatedWork W2856242472 @default.
- W2741177894 isParatext "false" @default.
- W2741177894 isRetracted "false" @default.
- W2741177894 magId "2741177894" @default.
- W2741177894 workType "article" @default.