Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385573721> ?p ?o ?g. }
Showing items 1 to 65 of
65
with 100 items per page.
- W4385573721 abstract "Transformer-based pre-trained language models are vocabulary-dependent, mapping by default each token to its corresponding embedding. This one-to-one mapping results into embedding matrices that occupy a lot of memory (i.e. millions of parameters) and grow linearly with the size of the vocabulary. Previous work on on-device transformers dynamically generate token embeddings on-the-fly without embedding matrices using locality-sensitive hashing over morphological information. These embeddings are subsequently fed into transformer layers for text classification. However, these methods are not pre-trained. Inspired by this line of work, we propose HashFormers, a new family of vocabulary-independent pre-trained transformers that support an unlimited vocabulary (i.e. all possible tokens in a corpus) given a substantially smaller fixed-sized embedding matrix. We achieve this by first introducing computationally cheap hashing functions that bucket together individual tokens to embeddings. We also propose three variants that do not require an embedding matrix at all, further reducing the memory requirements. We empirically demonstrate that HashFormers are more memory efficient compared to standard pre-trained transformers while achieving comparable predictive performance when fine-tuned on multiple text classification tasks. For example, our most efficient HashFormer variant has a negligible performance degradation (0.4% on GLUE) using only 99.1K parameters for representing the embeddings compared to 12.3-38M parameters of state-of-the-art models." @default.
- W4385573721 created "2023-08-05" @default.
- W4385573721 creator A5010341007 @default.
- W4385573721 creator A5085083080 @default.
- W4385573721 date "2022-01-01" @default.
- W4385573721 modified "2023-09-24" @default.
- W4385573721 title "HashFormers: Towards Vocabulary-independent Pre-trained Transformers" @default.
- W4385573721 doi "https://doi.org/10.18653/v1/2022.emnlp-main.536" @default.
- W4385573721 hasPublicationYear "2022" @default.
- W4385573721 type Work @default.
- W4385573721 citedByCount "0" @default.
- W4385573721 crossrefType "proceedings-article" @default.
- W4385573721 hasAuthorship W4385573721A5010341007 @default.
- W4385573721 hasAuthorship W4385573721A5085083080 @default.
- W4385573721 hasBestOaLocation W43855737211 @default.
- W4385573721 hasConcept C119599485 @default.
- W4385573721 hasConcept C127413603 @default.
- W4385573721 hasConcept C137293760 @default.
- W4385573721 hasConcept C138885662 @default.
- W4385573721 hasConcept C154945302 @default.
- W4385573721 hasConcept C165801399 @default.
- W4385573721 hasConcept C199360897 @default.
- W4385573721 hasConcept C2777601683 @default.
- W4385573721 hasConcept C2779808786 @default.
- W4385573721 hasConcept C28490314 @default.
- W4385573721 hasConcept C38652104 @default.
- W4385573721 hasConcept C41008148 @default.
- W4385573721 hasConcept C41608201 @default.
- W4385573721 hasConcept C41895202 @default.
- W4385573721 hasConcept C48145219 @default.
- W4385573721 hasConcept C66322947 @default.
- W4385573721 hasConcept C99138194 @default.
- W4385573721 hasConceptScore W4385573721C119599485 @default.
- W4385573721 hasConceptScore W4385573721C127413603 @default.
- W4385573721 hasConceptScore W4385573721C137293760 @default.
- W4385573721 hasConceptScore W4385573721C138885662 @default.
- W4385573721 hasConceptScore W4385573721C154945302 @default.
- W4385573721 hasConceptScore W4385573721C165801399 @default.
- W4385573721 hasConceptScore W4385573721C199360897 @default.
- W4385573721 hasConceptScore W4385573721C2777601683 @default.
- W4385573721 hasConceptScore W4385573721C2779808786 @default.
- W4385573721 hasConceptScore W4385573721C28490314 @default.
- W4385573721 hasConceptScore W4385573721C38652104 @default.
- W4385573721 hasConceptScore W4385573721C41008148 @default.
- W4385573721 hasConceptScore W4385573721C41608201 @default.
- W4385573721 hasConceptScore W4385573721C41895202 @default.
- W4385573721 hasConceptScore W4385573721C48145219 @default.
- W4385573721 hasConceptScore W4385573721C66322947 @default.
- W4385573721 hasConceptScore W4385573721C99138194 @default.
- W4385573721 hasLocation W43855737211 @default.
- W4385573721 hasOpenAccess W4385573721 @default.
- W4385573721 hasPrimaryLocation W43855737211 @default.
- W4385573721 hasRelatedWork W2734503711 @default.
- W4385573721 hasRelatedWork W3094152627 @default.
- W4385573721 hasRelatedWork W3126822054 @default.
- W4385573721 hasRelatedWork W3127787589 @default.
- W4385573721 hasRelatedWork W3174977793 @default.
- W4385573721 hasRelatedWork W4226082499 @default.
- W4385573721 hasRelatedWork W4287633642 @default.
- W4385573721 hasRelatedWork W4288102732 @default.
- W4385573721 hasRelatedWork W4306707339 @default.
- W4385573721 hasRelatedWork W4321392417 @default.
- W4385573721 isParatext "false" @default.
- W4385573721 isRetracted "false" @default.
- W4385573721 workType "article" @default.