Matches in SemOpenAlex for { <https://semopenalex.org/work/W4313304472> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W4313304472 abstract "Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting." @default.
- W4313304472 created "2023-01-06" @default.
- W4313304472 creator A5049400969 @default.
- W4313304472 creator A5060687985 @default.
- W4313304472 date "2022-12-28" @default.
- W4313304472 modified "2023-10-16" @default.
- W4313304472 title "Cramming: Training a Language Model on a Single GPU in One Day" @default.
- W4313304472 doi "https://doi.org/10.48550/arxiv.2212.14034" @default.
- W4313304472 hasPublicationYear "2022" @default.
- W4313304472 type Work @default.
- W4313304472 citedByCount "0" @default.
- W4313304472 crossrefType "posted-content" @default.
- W4313304472 hasAuthorship W4313304472A5049400969 @default.
- W4313304472 hasAuthorship W4313304472A5060687985 @default.
- W4313304472 hasBestOaLocation W43133044721 @default.
- W4313304472 hasConcept C11413529 @default.
- W4313304472 hasConcept C121332964 @default.
- W4313304472 hasConcept C123657996 @default.
- W4313304472 hasConcept C136264566 @default.
- W4313304472 hasConcept C137293760 @default.
- W4313304472 hasConcept C138885662 @default.
- W4313304472 hasConcept C142362112 @default.
- W4313304472 hasConcept C153349607 @default.
- W4313304472 hasConcept C154945302 @default.
- W4313304472 hasConcept C162324750 @default.
- W4313304472 hasConcept C165801399 @default.
- W4313304472 hasConcept C199360897 @default.
- W4313304472 hasConcept C2524010 @default.
- W4313304472 hasConcept C2778120072 @default.
- W4313304472 hasConcept C2781235140 @default.
- W4313304472 hasConcept C33923547 @default.
- W4313304472 hasConcept C41008148 @default.
- W4313304472 hasConcept C41895202 @default.
- W4313304472 hasConcept C43521106 @default.
- W4313304472 hasConcept C45374587 @default.
- W4313304472 hasConcept C62520636 @default.
- W4313304472 hasConcept C66322947 @default.
- W4313304472 hasConcept C90329073 @default.
- W4313304472 hasConcept C94124525 @default.
- W4313304472 hasConcept C99844830 @default.
- W4313304472 hasConceptScore W4313304472C11413529 @default.
- W4313304472 hasConceptScore W4313304472C121332964 @default.
- W4313304472 hasConceptScore W4313304472C123657996 @default.
- W4313304472 hasConceptScore W4313304472C136264566 @default.
- W4313304472 hasConceptScore W4313304472C137293760 @default.
- W4313304472 hasConceptScore W4313304472C138885662 @default.
- W4313304472 hasConceptScore W4313304472C142362112 @default.
- W4313304472 hasConceptScore W4313304472C153349607 @default.
- W4313304472 hasConceptScore W4313304472C154945302 @default.
- W4313304472 hasConceptScore W4313304472C162324750 @default.
- W4313304472 hasConceptScore W4313304472C165801399 @default.
- W4313304472 hasConceptScore W4313304472C199360897 @default.
- W4313304472 hasConceptScore W4313304472C2524010 @default.
- W4313304472 hasConceptScore W4313304472C2778120072 @default.
- W4313304472 hasConceptScore W4313304472C2781235140 @default.
- W4313304472 hasConceptScore W4313304472C33923547 @default.
- W4313304472 hasConceptScore W4313304472C41008148 @default.
- W4313304472 hasConceptScore W4313304472C41895202 @default.
- W4313304472 hasConceptScore W4313304472C43521106 @default.
- W4313304472 hasConceptScore W4313304472C45374587 @default.
- W4313304472 hasConceptScore W4313304472C62520636 @default.
- W4313304472 hasConceptScore W4313304472C66322947 @default.
- W4313304472 hasConceptScore W4313304472C90329073 @default.
- W4313304472 hasConceptScore W4313304472C94124525 @default.
- W4313304472 hasConceptScore W4313304472C99844830 @default.
- W4313304472 hasLocation W43133044721 @default.
- W4313304472 hasOpenAccess W4313304472 @default.
- W4313304472 hasPrimaryLocation W43133044721 @default.
- W4313304472 hasRelatedWork W1572523360 @default.
- W4313304472 hasRelatedWork W2997712488 @default.
- W4313304472 hasRelatedWork W3015233032 @default.
- W4313304472 hasRelatedWork W3114619416 @default.
- W4313304472 hasRelatedWork W3128902667 @default.
- W4313304472 hasRelatedWork W4221146852 @default.
- W4313304472 hasRelatedWork W4287549856 @default.
- W4313304472 hasRelatedWork W4288091956 @default.
- W4313304472 hasRelatedWork W4291908500 @default.
- W4313304472 hasRelatedWork W4308434687 @default.
- W4313304472 isParatext "false" @default.
- W4313304472 isRetracted "false" @default.
- W4313304472 workType "article" @default.