Matches in SemOpenAlex for { <https://semopenalex.org/work/W2967659330> ?p ?o ?g. }
- W2967659330 abstract "Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks. However, it is unclear why the pre-training-then-fine-tuning paradigm can improve performance and generalization capability across different tasks. In this paper, we propose to visualize loss landscapes and optimization trajectories of fine-tuning BERT on specific datasets. First, we find that pre-training reaches a good initial point across downstream tasks, which leads to wider optima and easier optimization compared with training from scratch. We also demonstrate that the fine-tuning procedure is robust to overfitting, even though BERT is highly over-parameterized for downstream tasks. Second, the visualization results indicate that fine-tuning BERT tends to generalize better because of the flat and wide optima, and the consistency between the training loss surface and the generalization error surface. Third, the lower layers of BERT are more invariant during fine-tuning, which suggests that the layers that are close to input learn more transferable representations of language." @default.
- W2967659330 created "2019-08-22" @default.
- W2967659330 creator A5002980588 @default.
- W2967659330 creator A5014662947 @default.
- W2967659330 creator A5032588791 @default.
- W2967659330 creator A5051688016 @default.
- W2967659330 date "2019-08-15" @default.
- W2967659330 modified "2023-09-26" @default.
- W2967659330 title "Visualizing and Understanding the Effectiveness of BERT" @default.
- W2967659330 cites W131533222 @default.
- W2967659330 cites W2130158090 @default.
- W2967659330 cites W2251939518 @default.
- W2967659330 cites W2396767181 @default.
- W2967659330 cites W2549835527 @default.
- W2967659330 cites W2593267444 @default.
- W2967659330 cites W2792287754 @default.
- W2967659330 cites W2798727047 @default.
- W2967659330 cites W2910243263 @default.
- W2967659330 cites W2912811302 @default.
- W2967659330 cites W2913190747 @default.
- W2967659330 cites W2920812691 @default.
- W2967659330 cites W2945260553 @default.
- W2967659330 cites W2946417913 @default.
- W2967659330 cites W2953369973 @default.
- W2967659330 cites W2962739339 @default.
- W2967659330 cites W2962933129 @default.
- W2967659330 cites W2963026768 @default.
- W2967659330 cites W2963310665 @default.
- W2967659330 cites W2963341956 @default.
- W2967659330 cites W2963403868 @default.
- W2967659330 cites W2963756346 @default.
- W2967659330 cites W2963846996 @default.
- W2967659330 cites W2963854351 @default.
- W2967659330 cites W2963959597 @default.
- W2967659330 cites W2964121744 @default.
- W2967659330 cites W2964160102 @default.
- W2967659330 cites W2964303116 @default.
- W2967659330 cites W2970352191 @default.
- W2967659330 cites W3093329015 @default.
- W2967659330 cites W2525127255 @default.
- W2967659330 doi "https://doi.org/10.48550/arxiv.1908.05620" @default.
- W2967659330 hasPublicationYear "2019" @default.
- W2967659330 type Work @default.
- W2967659330 sameAs 2967659330 @default.
- W2967659330 citedByCount "13" @default.
- W2967659330 countsByYear W29676593302019 @default.
- W2967659330 countsByYear W29676593302020 @default.
- W2967659330 countsByYear W29676593302021 @default.
- W2967659330 crossrefType "posted-content" @default.
- W2967659330 hasAuthorship W2967659330A5002980588 @default.
- W2967659330 hasAuthorship W2967659330A5014662947 @default.
- W2967659330 hasAuthorship W2967659330A5032588791 @default.
- W2967659330 hasAuthorship W2967659330A5051688016 @default.
- W2967659330 hasBestOaLocation W29676593301 @default.
- W2967659330 hasConcept C11413529 @default.
- W2967659330 hasConcept C119857082 @default.
- W2967659330 hasConcept C121332964 @default.
- W2967659330 hasConcept C134306372 @default.
- W2967659330 hasConcept C137293760 @default.
- W2967659330 hasConcept C154945302 @default.
- W2967659330 hasConcept C157524613 @default.
- W2967659330 hasConcept C165464430 @default.
- W2967659330 hasConcept C177148314 @default.
- W2967659330 hasConcept C190470478 @default.
- W2967659330 hasConcept C199360897 @default.
- W2967659330 hasConcept C22019652 @default.
- W2967659330 hasConcept C2524010 @default.
- W2967659330 hasConcept C2776135515 @default.
- W2967659330 hasConcept C2776436953 @default.
- W2967659330 hasConcept C2781235140 @default.
- W2967659330 hasConcept C28719098 @default.
- W2967659330 hasConcept C33923547 @default.
- W2967659330 hasConcept C37914503 @default.
- W2967659330 hasConcept C41008148 @default.
- W2967659330 hasConcept C50644808 @default.
- W2967659330 hasConcept C62520636 @default.
- W2967659330 hasConceptScore W2967659330C11413529 @default.
- W2967659330 hasConceptScore W2967659330C119857082 @default.
- W2967659330 hasConceptScore W2967659330C121332964 @default.
- W2967659330 hasConceptScore W2967659330C134306372 @default.
- W2967659330 hasConceptScore W2967659330C137293760 @default.
- W2967659330 hasConceptScore W2967659330C154945302 @default.
- W2967659330 hasConceptScore W2967659330C157524613 @default.
- W2967659330 hasConceptScore W2967659330C165464430 @default.
- W2967659330 hasConceptScore W2967659330C177148314 @default.
- W2967659330 hasConceptScore W2967659330C190470478 @default.
- W2967659330 hasConceptScore W2967659330C199360897 @default.
- W2967659330 hasConceptScore W2967659330C22019652 @default.
- W2967659330 hasConceptScore W2967659330C2524010 @default.
- W2967659330 hasConceptScore W2967659330C2776135515 @default.
- W2967659330 hasConceptScore W2967659330C2776436953 @default.
- W2967659330 hasConceptScore W2967659330C2781235140 @default.
- W2967659330 hasConceptScore W2967659330C28719098 @default.
- W2967659330 hasConceptScore W2967659330C33923547 @default.
- W2967659330 hasConceptScore W2967659330C37914503 @default.
- W2967659330 hasConceptScore W2967659330C41008148 @default.
- W2967659330 hasConceptScore W2967659330C50644808 @default.
- W2967659330 hasConceptScore W2967659330C62520636 @default.
- W2967659330 hasLocation W29676593301 @default.
- W2967659330 hasOpenAccess W2967659330 @default.