Matches in SemOpenAlex for { <https://semopenalex.org/work/W4316830022> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4316830022 abstract "Standard fine-tuning of language models typically performs well on in-distribution data, but suffers with generalization to distribution shifts. In this work, we aim to improve generalization of adapter-based cross-lingual task transfer where such cross-language distribution shifts are imminent. We investigate scheduled unfreezing algorithms -- originally proposed to mitigate catastrophic forgetting in transfer learning -- for fine-tuning task adapters in cross-lingual transfer. Our experiments show that scheduled unfreezing methods close the gap to full fine-tuning and achieve state-of-the-art transfer performance, suggesting that these methods can go beyond just mitigating catastrophic forgetting. Next, aiming to delve deeper into those empirical findings, we investigate the learning dynamics of scheduled unfreezing using Fisher Information. Our in-depth experiments reveal that scheduled unfreezing induces different learning dynamics compared to standard fine-tuning, and provide evidence that the dynamics of Fisher Information during training correlate with cross-lingual generalization performance. We additionally propose a general scheduled unfreezing algorithm that achieves an average of 2 points improvement over four datasets compared to standard fine-tuning and provides strong empirical evidence for a theory-based justification of the heuristic unfreezing schedule (i.e., the heuristic schedule is implicitly maximizing Fisher Information). Our code will be publicly available." @default.
- W4316830022 created "2023-01-17" @default.
- W4316830022 creator A5014866912 @default.
- W4316830022 creator A5024983536 @default.
- W4316830022 creator A5027450194 @default.
- W4316830022 creator A5029433946 @default.
- W4316830022 date "2023-01-13" @default.
- W4316830022 modified "2023-09-30" @default.
- W4316830022 title "Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing" @default.
- W4316830022 doi "https://doi.org/10.48550/arxiv.2301.05487" @default.
- W4316830022 hasPublicationYear "2023" @default.
- W4316830022 type Work @default.
- W4316830022 citedByCount "0" @default.
- W4316830022 crossrefType "posted-content" @default.
- W4316830022 hasAuthorship W4316830022A5014866912 @default.
- W4316830022 hasAuthorship W4316830022A5024983536 @default.
- W4316830022 hasAuthorship W4316830022A5027450194 @default.
- W4316830022 hasAuthorship W4316830022A5029433946 @default.
- W4316830022 hasBestOaLocation W43168300221 @default.
- W4316830022 hasConcept C111919701 @default.
- W4316830022 hasConcept C11413529 @default.
- W4316830022 hasConcept C134306372 @default.
- W4316830022 hasConcept C138885662 @default.
- W4316830022 hasConcept C150899416 @default.
- W4316830022 hasConcept C154945302 @default.
- W4316830022 hasConcept C162324750 @default.
- W4316830022 hasConcept C173608175 @default.
- W4316830022 hasConcept C173801870 @default.
- W4316830022 hasConcept C177148314 @default.
- W4316830022 hasConcept C177284502 @default.
- W4316830022 hasConcept C187736073 @default.
- W4316830022 hasConcept C2776175482 @default.
- W4316830022 hasConcept C2780451532 @default.
- W4316830022 hasConcept C33923547 @default.
- W4316830022 hasConcept C41008148 @default.
- W4316830022 hasConcept C41895202 @default.
- W4316830022 hasConcept C7149132 @default.
- W4316830022 hasConceptScore W4316830022C111919701 @default.
- W4316830022 hasConceptScore W4316830022C11413529 @default.
- W4316830022 hasConceptScore W4316830022C134306372 @default.
- W4316830022 hasConceptScore W4316830022C138885662 @default.
- W4316830022 hasConceptScore W4316830022C150899416 @default.
- W4316830022 hasConceptScore W4316830022C154945302 @default.
- W4316830022 hasConceptScore W4316830022C162324750 @default.
- W4316830022 hasConceptScore W4316830022C173608175 @default.
- W4316830022 hasConceptScore W4316830022C173801870 @default.
- W4316830022 hasConceptScore W4316830022C177148314 @default.
- W4316830022 hasConceptScore W4316830022C177284502 @default.
- W4316830022 hasConceptScore W4316830022C187736073 @default.
- W4316830022 hasConceptScore W4316830022C2776175482 @default.
- W4316830022 hasConceptScore W4316830022C2780451532 @default.
- W4316830022 hasConceptScore W4316830022C33923547 @default.
- W4316830022 hasConceptScore W4316830022C41008148 @default.
- W4316830022 hasConceptScore W4316830022C41895202 @default.
- W4316830022 hasConceptScore W4316830022C7149132 @default.
- W4316830022 hasLocation W43168300221 @default.
- W4316830022 hasOpenAccess W4316830022 @default.
- W4316830022 hasPrimaryLocation W43168300221 @default.
- W4316830022 hasRelatedWork W1964624709 @default.
- W4316830022 hasRelatedWork W2068195715 @default.
- W4316830022 hasRelatedWork W2896374939 @default.
- W4316830022 hasRelatedWork W3170264434 @default.
- W4316830022 hasRelatedWork W3182098395 @default.
- W4316830022 hasRelatedWork W3203700738 @default.
- W4316830022 hasRelatedWork W4206887880 @default.
- W4316830022 hasRelatedWork W4226393342 @default.
- W4316830022 hasRelatedWork W4286909327 @default.
- W4316830022 hasRelatedWork W4307205255 @default.
- W4316830022 isParatext "false" @default.
- W4316830022 isRetracted "false" @default.
- W4316830022 workType "article" @default.