Matches in SemOpenAlex for { <https://semopenalex.org/work/W2735995851> ?p ?o ?g. }
- W2735995851 abstract "Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (Distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a distilled policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning." @default.
- W2735995851 created "2017-07-21" @default.
- W2735995851 creator A5018191427 @default.
- W2735995851 creator A5031378020 @default.
- W2735995851 creator A5039448321 @default.
- W2735995851 creator A5043910056 @default.
- W2735995851 creator A5062951341 @default.
- W2735995851 creator A5064372346 @default.
- W2735995851 creator A5064373793 @default.
- W2735995851 creator A5079415139 @default.
- W2735995851 date "2017-07-13" @default.
- W2735995851 modified "2023-09-27" @default.
- W2735995851 title "Distral: Robust Multitask Reinforcement Learning" @default.
- W2735995851 cites W181022050 @default.
- W2735995851 cites W1821462560 @default.
- W2735995851 cites W2049633694 @default.
- W2735995851 cites W2107662876 @default.
- W2735995851 cites W2114580749 @default.
- W2735995851 cites W2155027007 @default.
- W2735995851 cites W2155968351 @default.
- W2735995851 cites W2164278908 @default.
- W2735995851 cites W2201581102 @default.
- W2735995851 cites W2295582178 @default.
- W2735995851 cites W2593044849 @default.
- W2735995851 cites W2609650878 @default.
- W2735995851 cites W2913340405 @default.
- W2735995851 cites W2949608212 @default.
- W2735995851 cites W2950872548 @default.
- W2735995851 cites W2963267001 @default.
- W2735995851 cites W2963804082 @default.
- W2735995851 cites W2964043796 @default.
- W2735995851 hasPublicationYear "2017" @default.
- W2735995851 type Work @default.
- W2735995851 sameAs 2735995851 @default.
- W2735995851 citedByCount "49" @default.
- W2735995851 countsByYear W27359958512017 @default.
- W2735995851 countsByYear W27359958512018 @default.
- W2735995851 countsByYear W27359958512019 @default.
- W2735995851 countsByYear W27359958512020 @default.
- W2735995851 countsByYear W27359958512021 @default.
- W2735995851 crossrefType "posted-content" @default.
- W2735995851 hasAuthorship W2735995851A5018191427 @default.
- W2735995851 hasAuthorship W2735995851A5031378020 @default.
- W2735995851 hasAuthorship W2735995851A5039448321 @default.
- W2735995851 hasAuthorship W2735995851A5043910056 @default.
- W2735995851 hasAuthorship W2735995851A5062951341 @default.
- W2735995851 hasAuthorship W2735995851A5064372346 @default.
- W2735995851 hasAuthorship W2735995851A5064373793 @default.
- W2735995851 hasAuthorship W2735995851A5079415139 @default.
- W2735995851 hasConcept C111919701 @default.
- W2735995851 hasConcept C119857082 @default.
- W2735995851 hasConcept C127413603 @default.
- W2735995851 hasConcept C14036430 @default.
- W2735995851 hasConcept C150899416 @default.
- W2735995851 hasConcept C154945302 @default.
- W2735995851 hasConcept C188198153 @default.
- W2735995851 hasConcept C201995342 @default.
- W2735995851 hasConcept C2779436431 @default.
- W2735995851 hasConcept C2780451532 @default.
- W2735995851 hasConcept C28006648 @default.
- W2735995851 hasConcept C41008148 @default.
- W2735995851 hasConcept C50644808 @default.
- W2735995851 hasConcept C78458016 @default.
- W2735995851 hasConcept C78519656 @default.
- W2735995851 hasConcept C86803240 @default.
- W2735995851 hasConcept C97541855 @default.
- W2735995851 hasConcept C98045186 @default.
- W2735995851 hasConceptScore W2735995851C111919701 @default.
- W2735995851 hasConceptScore W2735995851C119857082 @default.
- W2735995851 hasConceptScore W2735995851C127413603 @default.
- W2735995851 hasConceptScore W2735995851C14036430 @default.
- W2735995851 hasConceptScore W2735995851C150899416 @default.
- W2735995851 hasConceptScore W2735995851C154945302 @default.
- W2735995851 hasConceptScore W2735995851C188198153 @default.
- W2735995851 hasConceptScore W2735995851C201995342 @default.
- W2735995851 hasConceptScore W2735995851C2779436431 @default.
- W2735995851 hasConceptScore W2735995851C2780451532 @default.
- W2735995851 hasConceptScore W2735995851C28006648 @default.
- W2735995851 hasConceptScore W2735995851C41008148 @default.
- W2735995851 hasConceptScore W2735995851C50644808 @default.
- W2735995851 hasConceptScore W2735995851C78458016 @default.
- W2735995851 hasConceptScore W2735995851C78519656 @default.
- W2735995851 hasConceptScore W2735995851C86803240 @default.
- W2735995851 hasConceptScore W2735995851C97541855 @default.
- W2735995851 hasConceptScore W2735995851C98045186 @default.
- W2735995851 hasLocation W27359958511 @default.
- W2735995851 hasOpenAccess W2735995851 @default.
- W2735995851 hasPrimaryLocation W27359958511 @default.
- W2735995851 hasRelatedWork W1191599655 @default.
- W2735995851 hasRelatedWork W1757796397 @default.
- W2735995851 hasRelatedWork W1821462560 @default.
- W2735995851 hasRelatedWork W2097381042 @default.
- W2735995851 hasRelatedWork W2119717200 @default.
- W2735995851 hasRelatedWork W2121863487 @default.
- W2735995851 hasRelatedWork W2145339207 @default.
- W2735995851 hasRelatedWork W2155027007 @default.
- W2735995851 hasRelatedWork W2169743339 @default.
- W2735995851 hasRelatedWork W2173248099 @default.
- W2735995851 hasRelatedWork W2174786457 @default.
- W2735995851 hasRelatedWork W2257979135 @default.