Matches in SemOpenAlex for { <https://semopenalex.org/work/W4383860315> ?p ?o ?g. }
Showing items 1 to 61 of
61
with 100 items per page.
- W4383860315 abstract "Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction objective. This study investigates how small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective. We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions as a function of training data scale, which, in some cases, can be explained through connections to low-rank matrix completion. Building on prior work, we then train on chain-of-thought style data that includes intermediate step results. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed. We also study the interplay between arithmetic and text data during training and examine the effects of few-shot prompting, pretraining, and model scale. Additionally, we discuss length generalization challenges. Our work highlights the importance of high-quality, instructive data that considers the particular characteristics of the next-word prediction objective for rapidly eliciting arithmetic capabilities." @default.
- W4383860315 created "2023-07-11" @default.
- W4383860315 creator A5002813601 @default.
- W4383860315 creator A5005605490 @default.
- W4383860315 creator A5018630988 @default.
- W4383860315 creator A5059740024 @default.
- W4383860315 creator A5073193050 @default.
- W4383860315 date "2023-07-07" @default.
- W4383860315 modified "2023-10-05" @default.
- W4383860315 title "Teaching Arithmetic to Small Transformers" @default.
- W4383860315 doi "https://doi.org/10.48550/arxiv.2307.03381" @default.
- W4383860315 hasPublicationYear "2023" @default.
- W4383860315 type Work @default.
- W4383860315 citedByCount "0" @default.
- W4383860315 crossrefType "posted-content" @default.
- W4383860315 hasAuthorship W4383860315A5002813601 @default.
- W4383860315 hasAuthorship W4383860315A5005605490 @default.
- W4383860315 hasAuthorship W4383860315A5018630988 @default.
- W4383860315 hasAuthorship W4383860315A5059740024 @default.
- W4383860315 hasAuthorship W4383860315A5073193050 @default.
- W4383860315 hasBestOaLocation W43838603151 @default.
- W4383860315 hasConcept C111919701 @default.
- W4383860315 hasConcept C11413529 @default.
- W4383860315 hasConcept C114466953 @default.
- W4383860315 hasConcept C119857082 @default.
- W4383860315 hasConcept C154945302 @default.
- W4383860315 hasConcept C199360897 @default.
- W4383860315 hasConcept C33923547 @default.
- W4383860315 hasConcept C41008148 @default.
- W4383860315 hasConcept C80444323 @default.
- W4383860315 hasConcept C88006597 @default.
- W4383860315 hasConcept C94375191 @default.
- W4383860315 hasConcept C97256817 @default.
- W4383860315 hasConceptScore W4383860315C111919701 @default.
- W4383860315 hasConceptScore W4383860315C11413529 @default.
- W4383860315 hasConceptScore W4383860315C114466953 @default.
- W4383860315 hasConceptScore W4383860315C119857082 @default.
- W4383860315 hasConceptScore W4383860315C154945302 @default.
- W4383860315 hasConceptScore W4383860315C199360897 @default.
- W4383860315 hasConceptScore W4383860315C33923547 @default.
- W4383860315 hasConceptScore W4383860315C41008148 @default.
- W4383860315 hasConceptScore W4383860315C80444323 @default.
- W4383860315 hasConceptScore W4383860315C88006597 @default.
- W4383860315 hasConceptScore W4383860315C94375191 @default.
- W4383860315 hasConceptScore W4383860315C97256817 @default.
- W4383860315 hasLocation W43838603151 @default.
- W4383860315 hasOpenAccess W4383860315 @default.
- W4383860315 hasPrimaryLocation W43838603151 @default.
- W4383860315 hasRelatedWork W1979256031 @default.
- W4383860315 hasRelatedWork W2316636790 @default.
- W4383860315 hasRelatedWork W2351355159 @default.
- W4383860315 hasRelatedWork W2358639633 @default.
- W4383860315 hasRelatedWork W2368370270 @default.
- W4383860315 hasRelatedWork W2374442885 @default.
- W4383860315 hasRelatedWork W2374512474 @default.
- W4383860315 hasRelatedWork W4234402940 @default.
- W4383860315 hasRelatedWork W4367308673 @default.
- W4383860315 hasRelatedWork W4368275083 @default.
- W4383860315 isParatext "false" @default.
- W4383860315 isRetracted "false" @default.
- W4383860315 workType "article" @default.