Matches in SemOpenAlex for { <https://semopenalex.org/work/W3133029875> ?p ?o ?g. }
- W3133029875 abstract "The ability to perform arithmetic tasks is a remarkable trait of human intelligence and might form a critical component of more complex reasoning tasks. In this work, we investigate if the surface form of a number has any influence on how sequence-to-sequence language models learn simple arithmetic tasks such as addition and subtraction across a wide range of values. We find that how a number is represented in its surface form has a strong influence on the model's accuracy. In particular, the model fails to learn addition of five-digit numbers when using subwords (e.g., 32), and it struggles to learn with character-level representations (e.g., 3 2). By introducing position tokens (e.g., 3 10e1 2), the model learns to accurately add and subtract numbers up to 60 digits. We conclude that modern pretrained language models can easily learn arithmetic from very few examples, as long as we use the proper surface representation. This result bolsters evidence that subword tokenizers and positional encodings are components in current transformer designs that might need improvement. Moreover, we show that regardless of the number of parameters and training examples, models cannot learn addition rules that are independent of the length of the numbers seen during training. Code to reproduce our experiments is available at this https URL" @default.
- W3133029875 created "2021-03-01" @default.
- W3133029875 creator A5016996735 @default.
- W3133029875 creator A5030647281 @default.
- W3133029875 creator A5082997975 @default.
- W3133029875 date "2021-02-25" @default.
- W3133029875 modified "2023-10-18" @default.
- W3133029875 title "Investigating the Limitations of the Transformers with Simple Arithmetic Tasks" @default.
- W3133029875 cites W1732222442 @default.
- W3133029875 cites W1771459135 @default.
- W3133029875 cites W2173051530 @default.
- W3133029875 cites W2548137223 @default.
- W3133029875 cites W2766736793 @default.
- W3133029875 cites W2866343820 @default.
- W3133029875 cites W2887020936 @default.
- W3133029875 cites W2896457183 @default.
- W3133029875 cites W2908510526 @default.
- W3133029875 cites W2919420119 @default.
- W3133029875 cites W2946417913 @default.
- W3133029875 cites W2950645060 @default.
- W3133029875 cites W2951107864 @default.
- W3133029875 cites W2951756287 @default.
- W3133029875 cites W2962739339 @default.
- W3133029875 cites W2963005248 @default.
- W3133029875 cites W2963403868 @default.
- W3133029875 cites W2970308008 @default.
- W3133029875 cites W2970609357 @default.
- W3133029875 cites W2970900584 @default.
- W3133029875 cites W2971094176 @default.
- W3133029875 cites W2981037730 @default.
- W3133029875 cites W2984812384 @default.
- W3133029875 cites W2986266667 @default.
- W3133029875 cites W2995359496 @default.
- W3133029875 cites W2995971510 @default.
- W3133029875 cites W2998209000 @default.
- W3133029875 cites W3021524072 @default.
- W3133029875 cites W3022766797 @default.
- W3133029875 cites W3024482470 @default.
- W3133029875 cites W3033187248 @default.
- W3133029875 cites W3034811314 @default.
- W3133029875 cites W3035428952 @default.
- W3133029875 cites W3037983807 @default.
- W3133029875 cites W3082274269 @default.
- W3133029875 cites W3083835029 @default.
- W3133029875 cites W3092044512 @default.
- W3133029875 cites W3092689172 @default.
- W3133029875 cites W3095645723 @default.
- W3133029875 cites W3098903812 @default.
- W3133029875 cites W3100778284 @default.
- W3133029875 cites W3106210592 @default.
- W3133029875 cites W3106531402 @default.
- W3133029875 cites W3111372685 @default.
- W3133029875 cites W3111739346 @default.
- W3133029875 cites W3128590981 @default.
- W3133029875 cites W3166890286 @default.
- W3133029875 cites W3100879603 @default.
- W3133029875 hasPublicationYear "2021" @default.
- W3133029875 type Work @default.
- W3133029875 sameAs 3133029875 @default.
- W3133029875 citedByCount "9" @default.
- W3133029875 countsByYear W31330298752021 @default.
- W3133029875 countsByYear W31330298752022 @default.
- W3133029875 crossrefType "posted-content" @default.
- W3133029875 hasAuthorship W3133029875A5016996735 @default.
- W3133029875 hasAuthorship W3133029875A5030647281 @default.
- W3133029875 hasAuthorship W3133029875A5082997975 @default.
- W3133029875 hasConcept C111472728 @default.
- W3133029875 hasConcept C11413529 @default.
- W3133029875 hasConcept C121332964 @default.
- W3133029875 hasConcept C138885662 @default.
- W3133029875 hasConcept C145420912 @default.
- W3133029875 hasConcept C154945302 @default.
- W3133029875 hasConcept C165801399 @default.
- W3133029875 hasConcept C177264268 @default.
- W3133029875 hasConcept C17744445 @default.
- W3133029875 hasConcept C199360897 @default.
- W3133029875 hasConcept C199539241 @default.
- W3133029875 hasConcept C2524010 @default.
- W3133029875 hasConcept C2776359362 @default.
- W3133029875 hasConcept C2776760102 @default.
- W3133029875 hasConcept C2778112365 @default.
- W3133029875 hasConcept C2780586882 @default.
- W3133029875 hasConcept C2780861071 @default.
- W3133029875 hasConcept C33923547 @default.
- W3133029875 hasConcept C41008148 @default.
- W3133029875 hasConcept C54355233 @default.
- W3133029875 hasConcept C62520636 @default.
- W3133029875 hasConcept C66322947 @default.
- W3133029875 hasConcept C68060419 @default.
- W3133029875 hasConcept C80444323 @default.
- W3133029875 hasConcept C83817169 @default.
- W3133029875 hasConcept C86803240 @default.
- W3133029875 hasConcept C94375191 @default.
- W3133029875 hasConcept C94625758 @default.
- W3133029875 hasConcept C94957134 @default.
- W3133029875 hasConceptScore W3133029875C111472728 @default.
- W3133029875 hasConceptScore W3133029875C11413529 @default.
- W3133029875 hasConceptScore W3133029875C121332964 @default.
- W3133029875 hasConceptScore W3133029875C138885662 @default.
- W3133029875 hasConceptScore W3133029875C145420912 @default.