Matches in SemOpenAlex for { <https://semopenalex.org/work/W4313442864> ?p ?o ?g. }
Showing items 1 to 66 of
66
with 100 items per page.
- W4313442864 abstract "State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention. First, we use synthetic language modeling tasks to understand the gap between SSMs and attention. We find that existing SSMs struggle with two capabilities: recalling earlier tokens in the sequence and comparing tokens across the sequence. To understand the impact on language modeling, we propose a new SSM layer, H3, that is explicitly designed for these abilities. H3 matches attention on the synthetic languages and comes within 0.4 PPL of Transformers on OpenWebText. Furthermore, a hybrid 125M-parameter H3-attention model that retains two attention layers surprisingly outperforms Transformers on OpenWebText by 1.0 PPL. Next, to improve the efficiency of training SSMs on modern hardware, we propose FlashConv. FlashConv uses a fused block FFT algorithm to improve efficiency on sequences up to 8K, and introduces a novel state passing algorithm that exploits the recurrent properties of SSMs to scale to longer sequences. FlashConv yields 2$times$ speedup on the long-range arena benchmark and allows hybrid language models to generate text 2.4$times$ faster than Transformers. Using FlashConv, we scale hybrid H3-attention language models up to 2.7B parameters on the Pile and find promising initial results, achieving lower perplexity than Transformers and outperforming Transformers in zero- and few-shot learning on a majority of tasks in the SuperGLUE benchmark." @default.
- W4313442864 created "2023-01-06" @default.
- W4313442864 creator A5001041485 @default.
- W4313442864 creator A5032865467 @default.
- W4313442864 creator A5070255304 @default.
- W4313442864 creator A5070267604 @default.
- W4313442864 creator A5078381759 @default.
- W4313442864 creator A5091734792 @default.
- W4313442864 date "2022-12-28" @default.
- W4313442864 modified "2023-10-16" @default.
- W4313442864 title "Hungry Hungry Hippos: Towards Language Modeling with State Space Models" @default.
- W4313442864 doi "https://doi.org/10.48550/arxiv.2212.14052" @default.
- W4313442864 hasPublicationYear "2022" @default.
- W4313442864 type Work @default.
- W4313442864 citedByCount "1" @default.
- W4313442864 countsByYear W43134428642023 @default.
- W4313442864 crossrefType "posted-content" @default.
- W4313442864 hasAuthorship W4313442864A5001041485 @default.
- W4313442864 hasAuthorship W4313442864A5032865467 @default.
- W4313442864 hasAuthorship W4313442864A5070255304 @default.
- W4313442864 hasAuthorship W4313442864A5070267604 @default.
- W4313442864 hasAuthorship W4313442864A5078381759 @default.
- W4313442864 hasAuthorship W4313442864A5091734792 @default.
- W4313442864 hasBestOaLocation W43134428641 @default.
- W4313442864 hasConcept C113775141 @default.
- W4313442864 hasConcept C119599485 @default.
- W4313442864 hasConcept C127413603 @default.
- W4313442864 hasConcept C137293760 @default.
- W4313442864 hasConcept C154945302 @default.
- W4313442864 hasConcept C165801399 @default.
- W4313442864 hasConcept C173608175 @default.
- W4313442864 hasConcept C2524010 @default.
- W4313442864 hasConcept C33923547 @default.
- W4313442864 hasConcept C41008148 @default.
- W4313442864 hasConcept C66322947 @default.
- W4313442864 hasConcept C68339613 @default.
- W4313442864 hasConcept C99844830 @default.
- W4313442864 hasConceptScore W4313442864C113775141 @default.
- W4313442864 hasConceptScore W4313442864C119599485 @default.
- W4313442864 hasConceptScore W4313442864C127413603 @default.
- W4313442864 hasConceptScore W4313442864C137293760 @default.
- W4313442864 hasConceptScore W4313442864C154945302 @default.
- W4313442864 hasConceptScore W4313442864C165801399 @default.
- W4313442864 hasConceptScore W4313442864C173608175 @default.
- W4313442864 hasConceptScore W4313442864C2524010 @default.
- W4313442864 hasConceptScore W4313442864C33923547 @default.
- W4313442864 hasConceptScore W4313442864C41008148 @default.
- W4313442864 hasConceptScore W4313442864C66322947 @default.
- W4313442864 hasConceptScore W4313442864C68339613 @default.
- W4313442864 hasConceptScore W4313442864C99844830 @default.
- W4313442864 hasLocation W43134428641 @default.
- W4313442864 hasOpenAccess W4313442864 @default.
- W4313442864 hasPrimaryLocation W43134428641 @default.
- W4313442864 hasRelatedWork W2972605955 @default.
- W4313442864 hasRelatedWork W2990510952 @default.
- W4313442864 hasRelatedWork W3093039666 @default.
- W4313442864 hasRelatedWork W3107474891 @default.
- W4313442864 hasRelatedWork W3127473729 @default.
- W4313442864 hasRelatedWork W3199241049 @default.
- W4313442864 hasRelatedWork W3205492857 @default.
- W4313442864 hasRelatedWork W4220785642 @default.
- W4313442864 hasRelatedWork W4307933444 @default.
- W4313442864 hasRelatedWork W4321636820 @default.
- W4313442864 isParatext "false" @default.
- W4313442864 isRetracted "false" @default.
- W4313442864 workType "article" @default.