Matches in SemOpenAlex for { <https://semopenalex.org/work/W2890169813> ?p ?o ?g. }
Showing items 1 to 96 of
96
with 100 items per page.
- W2890169813 abstract "Q-learning is one of the most popular methods in Reinforcement Learning (RL). Transfer Learning aims to utilize the learned knowledge from source tasks to help new tasks to improve the sample complexity of the new tasks. Considering that data collection in RL is both more time and cost consuming and Q-learning converges slowly comparing to supervised learning, different kinds of transfer RL algorithms are designed. However, most of them are heuristic with no theoretical guarantee of the convergence rate. Therefore, it is important for us to clearly understand when and how will transfer learning help RL method and provide the theoretical guarantee for the improvement of the sample complexity. In this paper, we propose to transfer the Q-function learned in the source task to the target of the Q-learning in the new task when certain safe conditions are satisfied. We call this new transfer Q-learning method target transfer Q-Learning. The safe conditions are necessary to avoid the harm to the new tasks and thus ensure the convergence of the algorithm. We study the convergence rate of the target transfer Q-learning. We prove that if the two tasks are similar with respect to the MDPs, the optimal Q-functions in the source and new RL tasks are similar which means the error of the transferred target Q-function in new MDP is small. Also, the convergence rate analysis shows that the target transfer Q-Learning will converge faster than Q-learning if the error of the transferred target Q-function is smaller than the current Q-function in the new task. Based on our theoretical results, we design the safe condition as the Bellman error of the transferred target Q-function is less than the current Q-function. Our experiments are consistent with our theoretical founding and verified the effectiveness of our proposed target transfer Q-learning method." @default.
- W2890169813 created "2018-09-27" @default.
- W2890169813 creator A5000006377 @default.
- W2890169813 creator A5030666098 @default.
- W2890169813 creator A5033632697 @default.
- W2890169813 creator A5044802273 @default.
- W2890169813 creator A5070990160 @default.
- W2890169813 creator A5078600050 @default.
- W2890169813 date "2018-09-21" @default.
- W2890169813 modified "2023-09-26" @default.
- W2890169813 title "Target Transfer Q-Learning and Its Convergence Analysis" @default.
- W2890169813 doi "https://doi.org/10.48550/arxiv.1809.08923" @default.
- W2890169813 hasPublicationYear "2018" @default.
- W2890169813 type Work @default.
- W2890169813 sameAs 2890169813 @default.
- W2890169813 citedByCount "2" @default.
- W2890169813 countsByYear W28901698132018 @default.
- W2890169813 countsByYear W28901698132019 @default.
- W2890169813 crossrefType "posted-content" @default.
- W2890169813 hasAuthorship W2890169813A5000006377 @default.
- W2890169813 hasAuthorship W2890169813A5030666098 @default.
- W2890169813 hasAuthorship W2890169813A5033632697 @default.
- W2890169813 hasAuthorship W2890169813A5044802273 @default.
- W2890169813 hasAuthorship W2890169813A5070990160 @default.
- W2890169813 hasAuthorship W2890169813A5078600050 @default.
- W2890169813 hasBestOaLocation W28901698131 @default.
- W2890169813 hasConcept C11413529 @default.
- W2890169813 hasConcept C119857082 @default.
- W2890169813 hasConcept C127162648 @default.
- W2890169813 hasConcept C14036430 @default.
- W2890169813 hasConcept C150899416 @default.
- W2890169813 hasConcept C154945302 @default.
- W2890169813 hasConcept C162324750 @default.
- W2890169813 hasConcept C173608175 @default.
- W2890169813 hasConcept C173801870 @default.
- W2890169813 hasConcept C185592680 @default.
- W2890169813 hasConcept C187736073 @default.
- W2890169813 hasConcept C188116033 @default.
- W2890169813 hasConcept C198531522 @default.
- W2890169813 hasConcept C2776175482 @default.
- W2890169813 hasConcept C2777303404 @default.
- W2890169813 hasConcept C2778445095 @default.
- W2890169813 hasConcept C2780451532 @default.
- W2890169813 hasConcept C28006648 @default.
- W2890169813 hasConcept C31258907 @default.
- W2890169813 hasConcept C41008148 @default.
- W2890169813 hasConcept C43617362 @default.
- W2890169813 hasConcept C50522688 @default.
- W2890169813 hasConcept C57869625 @default.
- W2890169813 hasConcept C78458016 @default.
- W2890169813 hasConcept C86803240 @default.
- W2890169813 hasConcept C97541855 @default.
- W2890169813 hasConceptScore W2890169813C11413529 @default.
- W2890169813 hasConceptScore W2890169813C119857082 @default.
- W2890169813 hasConceptScore W2890169813C127162648 @default.
- W2890169813 hasConceptScore W2890169813C14036430 @default.
- W2890169813 hasConceptScore W2890169813C150899416 @default.
- W2890169813 hasConceptScore W2890169813C154945302 @default.
- W2890169813 hasConceptScore W2890169813C162324750 @default.
- W2890169813 hasConceptScore W2890169813C173608175 @default.
- W2890169813 hasConceptScore W2890169813C173801870 @default.
- W2890169813 hasConceptScore W2890169813C185592680 @default.
- W2890169813 hasConceptScore W2890169813C187736073 @default.
- W2890169813 hasConceptScore W2890169813C188116033 @default.
- W2890169813 hasConceptScore W2890169813C198531522 @default.
- W2890169813 hasConceptScore W2890169813C2776175482 @default.
- W2890169813 hasConceptScore W2890169813C2777303404 @default.
- W2890169813 hasConceptScore W2890169813C2778445095 @default.
- W2890169813 hasConceptScore W2890169813C2780451532 @default.
- W2890169813 hasConceptScore W2890169813C28006648 @default.
- W2890169813 hasConceptScore W2890169813C31258907 @default.
- W2890169813 hasConceptScore W2890169813C41008148 @default.
- W2890169813 hasConceptScore W2890169813C43617362 @default.
- W2890169813 hasConceptScore W2890169813C50522688 @default.
- W2890169813 hasConceptScore W2890169813C57869625 @default.
- W2890169813 hasConceptScore W2890169813C78458016 @default.
- W2890169813 hasConceptScore W2890169813C86803240 @default.
- W2890169813 hasConceptScore W2890169813C97541855 @default.
- W2890169813 hasLocation W28901698131 @default.
- W2890169813 hasLocation W28901698132 @default.
- W2890169813 hasOpenAccess W2890169813 @default.
- W2890169813 hasPrimaryLocation W28901698131 @default.
- W2890169813 hasRelatedWork W101901138 @default.
- W2890169813 hasRelatedWork W1517383877 @default.
- W2890169813 hasRelatedWork W2344556769 @default.
- W2890169813 hasRelatedWork W2784831250 @default.
- W2890169813 hasRelatedWork W2890169813 @default.
- W2890169813 hasRelatedWork W2952448454 @default.
- W2890169813 hasRelatedWork W2962817122 @default.
- W2890169813 hasRelatedWork W4300383067 @default.
- W2890169813 hasRelatedWork W4379662533 @default.
- W2890169813 hasRelatedWork W4379983844 @default.
- W2890169813 isParatext "false" @default.
- W2890169813 isRetracted "false" @default.
- W2890169813 magId "2890169813" @default.
- W2890169813 workType "article" @default.