Matches in SemOpenAlex for { <https://semopenalex.org/work/W3019631794> ?p ?o ?g. }
- W3019631794 abstract "Deep pretrained language models have achieved great success in the way of pretraining first and then fine-tuning. But such a sequential transfer learning paradigm often confronts the catastrophic forgetting problem and leads to sub-optimal performance. To fine-tune with less forgetting, we propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks. Specifically, we propose a Pretraining Simulation mechanism to recall the knowledge from pretraining tasks without data, and an Objective Shifting mechanism to focus the learning on downstream tasks gradually. Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark. Our method also enables BERT-base to achieve better performance than directly fine-tuning of BERT-large. Further, we provide the open-source RecAdam optimizer, which integrates the proposed mechanisms into Adam optimizer, to facility the NLP community." @default.
- W3019631794 created "2020-05-01" @default.
- W3019631794 creator A5005918579 @default.
- W3019631794 creator A5018432470 @default.
- W3019631794 creator A5019108029 @default.
- W3019631794 creator A5051107002 @default.
- W3019631794 creator A5072540013 @default.
- W3019631794 creator A5079533447 @default.
- W3019631794 date "2020-04-27" @default.
- W3019631794 modified "2023-09-27" @default.
- W3019631794 title "Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting" @default.
- W3019631794 cites W131533222 @default.
- W3019631794 cites W1581755290 @default.
- W3019631794 cites W1599016936 @default.
- W3019631794 cites W1682403713 @default.
- W3019631794 cites W1821462560 @default.
- W3019631794 cites W2060277733 @default.
- W3019631794 cites W2113839990 @default.
- W3019631794 cites W2130158090 @default.
- W3019631794 cites W2251939518 @default.
- W3019631794 cites W2396767181 @default.
- W3019631794 cites W2473930607 @default.
- W3019631794 cites W2474280151 @default.
- W3019631794 cites W2560647685 @default.
- W3019631794 cites W2605043629 @default.
- W3019631794 cites W2624871570 @default.
- W3019631794 cites W2737492962 @default.
- W3019631794 cites W2774373350 @default.
- W3019631794 cites W2784121710 @default.
- W3019631794 cites W2786446225 @default.
- W3019631794 cites W2787560479 @default.
- W3019631794 cites W2898700502 @default.
- W3019631794 cites W2908510526 @default.
- W3019631794 cites W2913340405 @default.
- W3019631794 cites W2924984511 @default.
- W3019631794 cites W2927746189 @default.
- W3019631794 cites W2939911019 @default.
- W3019631794 cites W2945383715 @default.
- W3019631794 cites W2962707369 @default.
- W3019631794 cites W2962724315 @default.
- W3019631794 cites W2962863357 @default.
- W3019631794 cites W2963026768 @default.
- W3019631794 cites W2963072899 @default.
- W3019631794 cites W2963223306 @default.
- W3019631794 cites W2963310665 @default.
- W3019631794 cites W2963341956 @default.
- W3019631794 cites W2963559848 @default.
- W3019631794 cites W2963588172 @default.
- W3019631794 cites W2963748441 @default.
- W3019631794 cites W2963788399 @default.
- W3019631794 cites W2963813679 @default.
- W3019631794 cites W2963846996 @default.
- W3019631794 cites W2963850662 @default.
- W3019631794 cites W2964088867 @default.
- W3019631794 cites W2964121744 @default.
- W3019631794 cites W2964186069 @default.
- W3019631794 cites W2964189064 @default.
- W3019631794 cites W2964352358 @default.
- W3019631794 cites W2965373594 @default.
- W3019631794 cites W2970352191 @default.
- W3019631794 cites W2970597249 @default.
- W3019631794 cites W2974317861 @default.
- W3019631794 cites W2974731134 @default.
- W3019631794 cites W2975185270 @default.
- W3019631794 cites W2978670439 @default.
- W3019631794 cites W2979736514 @default.
- W3019631794 cites W2989430018 @default.
- W3019631794 cites W2994415862 @default.
- W3019631794 cites W2996428491 @default.
- W3019631794 cites W3003289092 @default.
- W3019631794 cites W3013325675 @default.
- W3019631794 cites W3103800629 @default.
- W3019631794 cites W3104033643 @default.
- W3019631794 cites W3106003309 @default.
- W3019631794 cites W3127504686 @default.
- W3019631794 cites W3140968660 @default.
- W3019631794 cites W2525127255 @default.
- W3019631794 doi "https://doi.org/10.48550/arxiv.2004.12651" @default.
- W3019631794 hasPublicationYear "2020" @default.
- W3019631794 type Work @default.
- W3019631794 sameAs 3019631794 @default.
- W3019631794 citedByCount "4" @default.
- W3019631794 countsByYear W30196317942020 @default.
- W3019631794 countsByYear W30196317942021 @default.
- W3019631794 crossrefType "posted-content" @default.
- W3019631794 hasAuthorship W3019631794A5005918579 @default.
- W3019631794 hasAuthorship W3019631794A5018432470 @default.
- W3019631794 hasAuthorship W3019631794A5019108029 @default.
- W3019631794 hasAuthorship W3019631794A5051107002 @default.
- W3019631794 hasAuthorship W3019631794A5072540013 @default.
- W3019631794 hasAuthorship W3019631794A5079533447 @default.
- W3019631794 hasBestOaLocation W30196317941 @default.
- W3019631794 hasConcept C100660578 @default.
- W3019631794 hasConcept C108583219 @default.
- W3019631794 hasConcept C111472728 @default.
- W3019631794 hasConcept C119857082 @default.
- W3019631794 hasConcept C120665830 @default.
- W3019631794 hasConcept C121332964 @default.
- W3019631794 hasConcept C13280743 @default.
- W3019631794 hasConcept C137293760 @default.