Matches in SemOpenAlex for { <https://semopenalex.org/work/W4288804596> ?p ?o ?g. }
Showing items 1 to 88 of
88
with 100 items per page.
- W4288804596 abstract "We show that autoregressive language models can learn to infill text after we apply a straightforward transformation to the dataset, which simply moves a span of text from the middle of a document to its end. While this data augmentation has garnered much interest in recent years, we provide extensive evidence that training models with a large fraction of data transformed in this way does not harm the original left-to-right generative capability, as measured by perplexity and sampling evaluations across a wide range of scales. Given the usefulness, simplicity, and efficiency of training models to fill-in-the-middle (FIM), we suggest that future autoregressive language models be trained with FIM by default. To this end, we run a series of ablations on key hyperparameters, such as the data transformation frequency, the structure of the transformation, and the method of selecting the infill span. We use these ablations to prescribe strong default settings and best practices to train FIM models. We have released our best infilling model trained with best practices in our API, and release our infilling benchmarks to aid future research." @default.
- W4288804596 created "2022-07-30" @default.
- W4288804596 creator A5013248082 @default.
- W4288804596 creator A5030935953 @default.
- W4288804596 creator A5042623426 @default.
- W4288804596 creator A5043772703 @default.
- W4288804596 creator A5044086066 @default.
- W4288804596 creator A5058738098 @default.
- W4288804596 creator A5065707162 @default.
- W4288804596 date "2022-07-28" @default.
- W4288804596 modified "2023-09-27" @default.
- W4288804596 title "Efficient Training of Language Models to Fill in the Middle" @default.
- W4288804596 doi "https://doi.org/10.48550/arxiv.2207.14255" @default.
- W4288804596 hasPublicationYear "2022" @default.
- W4288804596 type Work @default.
- W4288804596 citedByCount "3" @default.
- W4288804596 countsByYear W42888045962023 @default.
- W4288804596 crossrefType "posted-content" @default.
- W4288804596 hasAuthorship W4288804596A5013248082 @default.
- W4288804596 hasAuthorship W4288804596A5030935953 @default.
- W4288804596 hasAuthorship W4288804596A5042623426 @default.
- W4288804596 hasAuthorship W4288804596A5043772703 @default.
- W4288804596 hasAuthorship W4288804596A5044086066 @default.
- W4288804596 hasAuthorship W4288804596A5058738098 @default.
- W4288804596 hasAuthorship W4288804596A5065707162 @default.
- W4288804596 hasBestOaLocation W42888045961 @default.
- W4288804596 hasConcept C100279451 @default.
- W4288804596 hasConcept C104317684 @default.
- W4288804596 hasConcept C107673813 @default.
- W4288804596 hasConcept C111472728 @default.
- W4288804596 hasConcept C119857082 @default.
- W4288804596 hasConcept C127413603 @default.
- W4288804596 hasConcept C137293760 @default.
- W4288804596 hasConcept C138885662 @default.
- W4288804596 hasConcept C149782125 @default.
- W4288804596 hasConcept C154945302 @default.
- W4288804596 hasConcept C159877910 @default.
- W4288804596 hasConcept C177769412 @default.
- W4288804596 hasConcept C185592680 @default.
- W4288804596 hasConcept C204241405 @default.
- W4288804596 hasConcept C26517878 @default.
- W4288804596 hasConcept C2776372474 @default.
- W4288804596 hasConcept C2781219549 @default.
- W4288804596 hasConcept C33923547 @default.
- W4288804596 hasConcept C38652104 @default.
- W4288804596 hasConcept C39890363 @default.
- W4288804596 hasConcept C41008148 @default.
- W4288804596 hasConcept C55493867 @default.
- W4288804596 hasConcept C66938386 @default.
- W4288804596 hasConceptScore W4288804596C100279451 @default.
- W4288804596 hasConceptScore W4288804596C104317684 @default.
- W4288804596 hasConceptScore W4288804596C107673813 @default.
- W4288804596 hasConceptScore W4288804596C111472728 @default.
- W4288804596 hasConceptScore W4288804596C119857082 @default.
- W4288804596 hasConceptScore W4288804596C127413603 @default.
- W4288804596 hasConceptScore W4288804596C137293760 @default.
- W4288804596 hasConceptScore W4288804596C138885662 @default.
- W4288804596 hasConceptScore W4288804596C149782125 @default.
- W4288804596 hasConceptScore W4288804596C154945302 @default.
- W4288804596 hasConceptScore W4288804596C159877910 @default.
- W4288804596 hasConceptScore W4288804596C177769412 @default.
- W4288804596 hasConceptScore W4288804596C185592680 @default.
- W4288804596 hasConceptScore W4288804596C204241405 @default.
- W4288804596 hasConceptScore W4288804596C26517878 @default.
- W4288804596 hasConceptScore W4288804596C2776372474 @default.
- W4288804596 hasConceptScore W4288804596C2781219549 @default.
- W4288804596 hasConceptScore W4288804596C33923547 @default.
- W4288804596 hasConceptScore W4288804596C38652104 @default.
- W4288804596 hasConceptScore W4288804596C39890363 @default.
- W4288804596 hasConceptScore W4288804596C41008148 @default.
- W4288804596 hasConceptScore W4288804596C55493867 @default.
- W4288804596 hasConceptScore W4288804596C66938386 @default.
- W4288804596 hasLocation W42888045961 @default.
- W4288804596 hasOpenAccess W4288804596 @default.
- W4288804596 hasPrimaryLocation W42888045961 @default.
- W4288804596 hasRelatedWork W10379689 @default.
- W4288804596 hasRelatedWork W10491538 @default.
- W4288804596 hasRelatedWork W12168553 @default.
- W4288804596 hasRelatedWork W2145140 @default.
- W4288804596 hasRelatedWork W4615079 @default.
- W4288804596 hasRelatedWork W5470710 @default.
- W4288804596 hasRelatedWork W5568260 @default.
- W4288804596 hasRelatedWork W6143937 @default.
- W4288804596 hasRelatedWork W8053366 @default.
- W4288804596 hasRelatedWork W8218506 @default.
- W4288804596 isParatext "false" @default.
- W4288804596 isRetracted "false" @default.
- W4288804596 workType "article" @default.