Matches in SemOpenAlex for { <https://semopenalex.org/work/W3130868440> ?p ?o ?g. }
- W3130868440 abstract "Large language models have become increasingly difficult to train because of the growing computation time and cost. In this work, we present SRU++, a highly-efficient architecture that combines fast recurrence and attention for sequence modeling. SRU++ exhibits strong modeling capacity and training efficiency. On standard language modeling tasks such as Enwik8, Wiki-103 and Billion Word datasets, our model obtains better bits-per-character and perplexity while using 3x-10x less training cost compared to top-performing Transformer models. For instance, our model achieves a state-of-the-art result on the Enwik8 dataset using 1.6 days of training on an 8-GPU machine. We further demonstrate that SRU++ requires minimal attention for near state-of-the-art performance. Our results suggest jointly leveraging fast recurrence with little attention as a promising direction for accelerating model training and inference." @default.
- W3130868440 created "2021-03-01" @default.
- W3130868440 creator A5030752617 @default.
- W3130868440 date "2021-01-01" @default.
- W3130868440 modified "2023-09-26" @default.
- W3130868440 title "When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute" @default.
- W3130868440 cites W1810943226 @default.
- W3130868440 cites W2064675550 @default.
- W3130868440 cites W2157331557 @default.
- W3130868440 cites W2342173569 @default.
- W3130868440 cites W2474388053 @default.
- W3130868440 cites W2553397501 @default.
- W3130868440 cites W2892090442 @default.
- W3130868440 cites W2946567085 @default.
- W3130868440 cites W2951714314 @default.
- W3130868440 cites W2952494384 @default.
- W3130868440 cites W2955227499 @default.
- W3130868440 cites W2963034893 @default.
- W3130868440 cites W2963174729 @default.
- W3130868440 cites W2963386218 @default.
- W3130868440 cites W2963403868 @default.
- W3130868440 cites W2963494889 @default.
- W3130868440 cites W2963631907 @default.
- W3130868440 cites W2963641307 @default.
- W3130868440 cites W2963655672 @default.
- W3130868440 cites W2963925437 @default.
- W3130868440 cites W2963938518 @default.
- W3130868440 cites W2964110616 @default.
- W3130868440 cites W2964121744 @default.
- W3130868440 cites W2964182247 @default.
- W3130868440 cites W2964308564 @default.
- W3130868440 cites W2970157301 @default.
- W3130868440 cites W2975044525 @default.
- W3130868440 cites W2983981554 @default.
- W3130868440 cites W2991324852 @default.
- W3130868440 cites W2994673210 @default.
- W3130868440 cites W2994689640 @default.
- W3130868440 cites W2995154514 @default.
- W3130868440 cites W2995575179 @default.
- W3130868440 cites W2996428491 @default.
- W3130868440 cites W3007773043 @default.
- W3130868440 cites W3010714856 @default.
- W3130868440 cites W3015468748 @default.
- W3130868440 cites W3025165719 @default.
- W3130868440 cites W3033529678 @default.
- W3130868440 cites W3034573343 @default.
- W3130868440 cites W3034772996 @default.
- W3130868440 cites W3085139254 @default.
- W3130868440 cites W3093960091 @default.
- W3130868440 cites W3101005014 @default.
- W3130868440 cites W3102129360 @default.
- W3130868440 cites W3103334733 @default.
- W3130868440 cites W3106147182 @default.
- W3130868440 cites W3123615524 @default.
- W3130868440 cites W3123673616 @default.
- W3130868440 cites W3131922516 @default.
- W3130868440 cites W3132672614 @default.
- W3130868440 cites W3172099915 @default.
- W3130868440 cites W3174401451 @default.
- W3130868440 cites W3196389743 @default.
- W3130868440 cites W3214897310 @default.
- W3130868440 doi "https://doi.org/10.18653/v1/2021.emnlp-main.602" @default.
- W3130868440 hasPublicationYear "2021" @default.
- W3130868440 type Work @default.
- W3130868440 sameAs 3130868440 @default.
- W3130868440 citedByCount "11" @default.
- W3130868440 countsByYear W31308684402021 @default.
- W3130868440 countsByYear W31308684402022 @default.
- W3130868440 countsByYear W31308684402023 @default.
- W3130868440 crossrefType "proceedings-article" @default.
- W3130868440 hasAuthorship W3130868440A5030752617 @default.
- W3130868440 hasBestOaLocation W31308684401 @default.
- W3130868440 hasConcept C100279451 @default.
- W3130868440 hasConcept C11413529 @default.
- W3130868440 hasConcept C119857082 @default.
- W3130868440 hasConcept C121332964 @default.
- W3130868440 hasConcept C137293760 @default.
- W3130868440 hasConcept C153294291 @default.
- W3130868440 hasConcept C154945302 @default.
- W3130868440 hasConcept C165801399 @default.
- W3130868440 hasConcept C204321447 @default.
- W3130868440 hasConcept C2776214188 @default.
- W3130868440 hasConcept C2777211547 @default.
- W3130868440 hasConcept C41008148 @default.
- W3130868440 hasConcept C45374587 @default.
- W3130868440 hasConcept C62520636 @default.
- W3130868440 hasConcept C66322947 @default.
- W3130868440 hasConceptScore W3130868440C100279451 @default.
- W3130868440 hasConceptScore W3130868440C11413529 @default.
- W3130868440 hasConceptScore W3130868440C119857082 @default.
- W3130868440 hasConceptScore W3130868440C121332964 @default.
- W3130868440 hasConceptScore W3130868440C137293760 @default.
- W3130868440 hasConceptScore W3130868440C153294291 @default.
- W3130868440 hasConceptScore W3130868440C154945302 @default.
- W3130868440 hasConceptScore W3130868440C165801399 @default.
- W3130868440 hasConceptScore W3130868440C204321447 @default.
- W3130868440 hasConceptScore W3130868440C2776214188 @default.
- W3130868440 hasConceptScore W3130868440C2777211547 @default.
- W3130868440 hasConceptScore W3130868440C41008148 @default.
- W3130868440 hasConceptScore W3130868440C45374587 @default.