Matches in SemOpenAlex for { <https://semopenalex.org/work/W3187116107> ?p ?o ?g. }
- W3187116107 abstract "The powerful learning ability of deep neural networks enables reinforcement learning (RL) agents to learn competent control policies directly from high-dimensional and continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general RL paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of catastrophic interference (a.k.a. catastrophic forgetting) and the collapse in performance as later training is likely to overwrite and interfer with previously learned good policies. In this paper, we introduce the concept of context into the single-task RL and develop a novel scheme, termed as Context Division and Knowledge Distillation (CDaKD) driven RL, to divide all states experienced during training into a series of contexts. Its motivation is to mitigate the challenge of aforementioned catastrophic interference in deep RL, thereby improving the stability and plasticity of RL models. At the heart of CDaKD is a value function, parameterized by a neural network feature extractor shared across all contexts, and a set of output heads, each specializing on an individual context. In CDaKD, we exploit online clustering to achieve context division, and interference is further alleviated by a knowledge distillation regularization term on the output layers for learned contexts. In addition, to effectively obtain the context division in high-dimensional state spaces (e.g., image inputs), we perform clustering in the lower-dimensional representation space of a randomly initialized convolutional encoder, which is fixed throughout training. Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms on classic OpenAI Gym tasks and the more complex high-dimensional Atari tasks, incurring only moderate computational overhead." @default.
- W3187116107 created "2021-08-16" @default.
- W3187116107 creator A5022263689 @default.
- W3187116107 creator A5033786585 @default.
- W3187116107 creator A5047483989 @default.
- W3187116107 creator A5067322004 @default.
- W3187116107 date "2021-08-09" @default.
- W3187116107 modified "2023-09-23" @default.
- W3187116107 title "Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation" @default.
- W3187116107 cites W1492713221 @default.
- W3187116107 cites W1682403713 @default.
- W3187116107 cites W1757796397 @default.
- W3187116107 cites W1821462560 @default.
- W3187116107 cites W2121863487 @default.
- W3187116107 cites W2145339207 @default.
- W3187116107 cites W2155968351 @default.
- W3187116107 cites W2553665199 @default.
- W3187116107 cites W2560647685 @default.
- W3187116107 cites W2583761661 @default.
- W3187116107 cites W2584377191 @default.
- W3187116107 cites W2737492962 @default.
- W3187116107 cites W2794487566 @default.
- W3187116107 cites W2905342215 @default.
- W3187116107 cites W2912063360 @default.
- W3187116107 cites W2922466325 @default.
- W3187116107 cites W2940545298 @default.
- W3187116107 cites W2962515681 @default.
- W3187116107 cites W2962724315 @default.
- W3187116107 cites W2962917939 @default.
- W3187116107 cites W2963072899 @default.
- W3187116107 cites W2963097726 @default.
- W3187116107 cites W2963390791 @default.
- W3187116107 cites W2963423916 @default.
- W3187116107 cites W2963477884 @default.
- W3187116107 cites W2963985863 @default.
- W3187116107 cites W2964043796 @default.
- W3187116107 cites W2964048876 @default.
- W3187116107 cites W2964189064 @default.
- W3187116107 cites W2964291307 @default.
- W3187116107 cites W2970502342 @default.
- W3187116107 cites W2970586779 @default.
- W3187116107 cites W2982388063 @default.
- W3187116107 cites W2996514457 @default.
- W3187116107 cites W2997475283 @default.
- W3187116107 cites W3007132638 @default.
- W3187116107 cites W3016507688 @default.
- W3187116107 cites W3030364939 @default.
- W3187116107 cites W3034848825 @default.
- W3187116107 cites W3037179286 @default.
- W3187116107 cites W3040897095 @default.
- W3187116107 cites W3046814132 @default.
- W3187116107 cites W3097816393 @default.
- W3187116107 cites W3100201759 @default.
- W3187116107 cites W3103780890 @default.
- W3187116107 cites W3117215073 @default.
- W3187116107 cites W3154064177 @default.
- W3187116107 cites W3131871335 @default.
- W3187116107 doi "https://doi.org/10.36227/techrxiv.15105492.v1" @default.
- W3187116107 hasPublicationYear "2021" @default.
- W3187116107 type Work @default.
- W3187116107 sameAs 3187116107 @default.
- W3187116107 citedByCount "0" @default.
- W3187116107 crossrefType "posted-content" @default.
- W3187116107 hasAuthorship W3187116107A5022263689 @default.
- W3187116107 hasAuthorship W3187116107A5033786585 @default.
- W3187116107 hasAuthorship W3187116107A5047483989 @default.
- W3187116107 hasAuthorship W3187116107A5067322004 @default.
- W3187116107 hasBestOaLocation W31871161071 @default.
- W3187116107 hasConcept C119857082 @default.
- W3187116107 hasConcept C151730666 @default.
- W3187116107 hasConcept C154945302 @default.
- W3187116107 hasConcept C2779343474 @default.
- W3187116107 hasConcept C41008148 @default.
- W3187116107 hasConcept C50644808 @default.
- W3187116107 hasConcept C73555534 @default.
- W3187116107 hasConcept C86803240 @default.
- W3187116107 hasConcept C97541855 @default.
- W3187116107 hasConceptScore W3187116107C119857082 @default.
- W3187116107 hasConceptScore W3187116107C151730666 @default.
- W3187116107 hasConceptScore W3187116107C154945302 @default.
- W3187116107 hasConceptScore W3187116107C2779343474 @default.
- W3187116107 hasConceptScore W3187116107C41008148 @default.
- W3187116107 hasConceptScore W3187116107C50644808 @default.
- W3187116107 hasConceptScore W3187116107C73555534 @default.
- W3187116107 hasConceptScore W3187116107C86803240 @default.
- W3187116107 hasConceptScore W3187116107C97541855 @default.
- W3187116107 hasLocation W31871161071 @default.
- W3187116107 hasLocation W31871161072 @default.
- W3187116107 hasLocation W31871161073 @default.
- W3187116107 hasOpenAccess W3187116107 @default.
- W3187116107 hasPrimaryLocation W31871161071 @default.
- W3187116107 hasRelatedWork W2923653485 @default.
- W3187116107 hasRelatedWork W2952472710 @default.
- W3187116107 hasRelatedWork W2957776456 @default.
- W3187116107 hasRelatedWork W3022038857 @default.
- W3187116107 hasRelatedWork W4224287422 @default.
- W3187116107 hasRelatedWork W4255994452 @default.
- W3187116107 hasRelatedWork W4319083788 @default.
- W3187116107 hasRelatedWork W4319773215 @default.
- W3187116107 hasRelatedWork W4361026739 @default.