Matches in SemOpenAlex for { <https://semopenalex.org/work/W4286904992> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4286904992 abstract "We consider language modelling (LM) as a multi-label structured prediction task by re-framing training from solely predicting a single ground-truth word to ranking a set of words which could continue a given context. To avoid annotating top-$k$ ranks, we generate them using pre-trained LMs: GPT-2, BERT, and Born-Again models. This leads to a rank-based form of knowledge distillation (KD). We also develop a method using $N$-grams to create a non-probabilistic teacher which generates the ranks without the need of a pre-trained LM. We confirm the hypotheses that we can treat LMing as a ranking task and that we can do so without the use of a pre-trained LM. We show that rank-based KD generally improves perplexity (PPL), often with statistical significance, when compared to Kullback-Leibler-based KD. Surprisingly, given the simplicity of the method, $N$-grams act as competitive teachers and achieve similar performance as using either BERT or a Born-Again model teachers. GPT-2 always acts as the best teacher, though, and using it and a Transformer-XL student on Wiki-02, rank-based KD reduces a cross-entropy baseline from 65.27 to 55.94 and against a KL-based KD of 56.70." @default.
- W4286904992 created "2022-07-25" @default.
- W4286904992 creator A5005830504 @default.
- W4286904992 creator A5056256317 @default.
- W4286904992 creator A5079129529 @default.
- W4286904992 date "2021-10-13" @default.
- W4286904992 modified "2023-09-24" @default.
- W4286904992 title "Language Modelling via Learning to Rank" @default.
- W4286904992 doi "https://doi.org/10.48550/arxiv.2110.06961" @default.
- W4286904992 hasPublicationYear "2021" @default.
- W4286904992 type Work @default.
- W4286904992 citedByCount "0" @default.
- W4286904992 crossrefType "posted-content" @default.
- W4286904992 hasAuthorship W4286904992A5005830504 @default.
- W4286904992 hasAuthorship W4286904992A5056256317 @default.
- W4286904992 hasAuthorship W4286904992A5079129529 @default.
- W4286904992 hasBestOaLocation W42869049921 @default.
- W4286904992 hasConcept C100279451 @default.
- W4286904992 hasConcept C106301342 @default.
- W4286904992 hasConcept C111368507 @default.
- W4286904992 hasConcept C114614502 @default.
- W4286904992 hasConcept C119857082 @default.
- W4286904992 hasConcept C121332964 @default.
- W4286904992 hasConcept C12725497 @default.
- W4286904992 hasConcept C127313418 @default.
- W4286904992 hasConcept C137293760 @default.
- W4286904992 hasConcept C154945302 @default.
- W4286904992 hasConcept C164226766 @default.
- W4286904992 hasConcept C165801399 @default.
- W4286904992 hasConcept C189430467 @default.
- W4286904992 hasConcept C204321447 @default.
- W4286904992 hasConcept C33923547 @default.
- W4286904992 hasConcept C41008148 @default.
- W4286904992 hasConcept C49937458 @default.
- W4286904992 hasConcept C62520636 @default.
- W4286904992 hasConcept C66322947 @default.
- W4286904992 hasConceptScore W4286904992C100279451 @default.
- W4286904992 hasConceptScore W4286904992C106301342 @default.
- W4286904992 hasConceptScore W4286904992C111368507 @default.
- W4286904992 hasConceptScore W4286904992C114614502 @default.
- W4286904992 hasConceptScore W4286904992C119857082 @default.
- W4286904992 hasConceptScore W4286904992C121332964 @default.
- W4286904992 hasConceptScore W4286904992C12725497 @default.
- W4286904992 hasConceptScore W4286904992C127313418 @default.
- W4286904992 hasConceptScore W4286904992C137293760 @default.
- W4286904992 hasConceptScore W4286904992C154945302 @default.
- W4286904992 hasConceptScore W4286904992C164226766 @default.
- W4286904992 hasConceptScore W4286904992C165801399 @default.
- W4286904992 hasConceptScore W4286904992C189430467 @default.
- W4286904992 hasConceptScore W4286904992C204321447 @default.
- W4286904992 hasConceptScore W4286904992C33923547 @default.
- W4286904992 hasConceptScore W4286904992C41008148 @default.
- W4286904992 hasConceptScore W4286904992C49937458 @default.
- W4286904992 hasConceptScore W4286904992C62520636 @default.
- W4286904992 hasConceptScore W4286904992C66322947 @default.
- W4286904992 hasLocation W42869049921 @default.
- W4286904992 hasOpenAccess W4286904992 @default.
- W4286904992 hasPrimaryLocation W42869049921 @default.
- W4286904992 hasRelatedWork W1989705153 @default.
- W4286904992 hasRelatedWork W2107734859 @default.
- W4286904992 hasRelatedWork W2169518243 @default.
- W4286904992 hasRelatedWork W2496228846 @default.
- W4286904992 hasRelatedWork W2936497627 @default.
- W4286904992 hasRelatedWork W2975649594 @default.
- W4286904992 hasRelatedWork W2996568036 @default.
- W4286904992 hasRelatedWork W3013624417 @default.
- W4286904992 hasRelatedWork W3049463507 @default.
- W4286904992 hasRelatedWork W4287826556 @default.
- W4286904992 isParatext "false" @default.
- W4286904992 isRetracted "false" @default.
- W4286904992 workType "article" @default.