Matches in SemOpenAlex for { <https://semopenalex.org/work/W4285298148> ?p ?o ?g. }
Showing items 1 to 68 of
68
with 100 items per page.
- W4285298148 abstract "Neural language models (LMs) such as GPT-2 estimate the probability distribution over the next word by a softmax over the vocabulary. The softmax layer produces the distribution based on the dot products of a single hidden state and the embeddings of words in the vocabulary. However, we discover that this single hidden state cannot produce all probability distributions regardless of the LM size or training data size because the single hidden state embedding cannot be close to the embeddings of all the possible next words simultaneously when there are other interfering word embeddings between them. In this work, we demonstrate the importance of this limitation both theoretically and practically. Our work not only deepens our understanding of softmax bottleneck and mixture of softmax (MoS) but also inspires us to propose multi-facet softmax (MFS) to address the limitations of MoS. Extensive empirical analyses confirm our findings and show that against MoS, the proposed MFS achieves two-fold improvements in the perplexity of GPT-2 and BERT." @default.
- W4285298148 created "2022-07-14" @default.
- W4285298148 creator A5008354502 @default.
- W4285298148 creator A5080115221 @default.
- W4285298148 date "2022-01-01" @default.
- W4285298148 modified "2023-09-26" @default.
- W4285298148 title "Softmax Bottleneck Makes Language Models Unable to Represent Multi-mode Word Distributions" @default.
- W4285298148 doi "https://doi.org/10.18653/v1/2022.acl-long.554" @default.
- W4285298148 hasPublicationYear "2022" @default.
- W4285298148 type Work @default.
- W4285298148 citedByCount "1" @default.
- W4285298148 countsByYear W42852981482022 @default.
- W4285298148 crossrefType "proceedings-article" @default.
- W4285298148 hasAuthorship W4285298148A5008354502 @default.
- W4285298148 hasAuthorship W4285298148A5080115221 @default.
- W4285298148 hasBestOaLocation W42852981481 @default.
- W4285298148 hasConcept C100279451 @default.
- W4285298148 hasConcept C137293760 @default.
- W4285298148 hasConcept C138885662 @default.
- W4285298148 hasConcept C149635348 @default.
- W4285298148 hasConcept C154945302 @default.
- W4285298148 hasConcept C174348530 @default.
- W4285298148 hasConcept C188441871 @default.
- W4285298148 hasConcept C204321447 @default.
- W4285298148 hasConcept C2524010 @default.
- W4285298148 hasConcept C2777601683 @default.
- W4285298148 hasConcept C2780513914 @default.
- W4285298148 hasConcept C28490314 @default.
- W4285298148 hasConcept C31258907 @default.
- W4285298148 hasConcept C33923547 @default.
- W4285298148 hasConcept C41008148 @default.
- W4285298148 hasConcept C41895202 @default.
- W4285298148 hasConcept C50644808 @default.
- W4285298148 hasConcept C90805587 @default.
- W4285298148 hasConceptScore W4285298148C100279451 @default.
- W4285298148 hasConceptScore W4285298148C137293760 @default.
- W4285298148 hasConceptScore W4285298148C138885662 @default.
- W4285298148 hasConceptScore W4285298148C149635348 @default.
- W4285298148 hasConceptScore W4285298148C154945302 @default.
- W4285298148 hasConceptScore W4285298148C174348530 @default.
- W4285298148 hasConceptScore W4285298148C188441871 @default.
- W4285298148 hasConceptScore W4285298148C204321447 @default.
- W4285298148 hasConceptScore W4285298148C2524010 @default.
- W4285298148 hasConceptScore W4285298148C2777601683 @default.
- W4285298148 hasConceptScore W4285298148C2780513914 @default.
- W4285298148 hasConceptScore W4285298148C28490314 @default.
- W4285298148 hasConceptScore W4285298148C31258907 @default.
- W4285298148 hasConceptScore W4285298148C33923547 @default.
- W4285298148 hasConceptScore W4285298148C41008148 @default.
- W4285298148 hasConceptScore W4285298148C41895202 @default.
- W4285298148 hasConceptScore W4285298148C50644808 @default.
- W4285298148 hasConceptScore W4285298148C90805587 @default.
- W4285298148 hasLocation W42852981481 @default.
- W4285298148 hasOpenAccess W4285298148 @default.
- W4285298148 hasPrimaryLocation W42852981481 @default.
- W4285298148 hasRelatedWork W130046785 @default.
- W4285298148 hasRelatedWork W1989705153 @default.
- W4285298148 hasRelatedWork W2496228846 @default.
- W4285298148 hasRelatedWork W2835793135 @default.
- W4285298148 hasRelatedWork W2896411932 @default.
- W4285298148 hasRelatedWork W2963494889 @default.
- W4285298148 hasRelatedWork W2963537482 @default.
- W4285298148 hasRelatedWork W4285298148 @default.
- W4285298148 hasRelatedWork W4298422451 @default.
- W4285298148 hasRelatedWork W59929963 @default.
- W4285298148 isParatext "false" @default.
- W4285298148 isRetracted "false" @default.
- W4285298148 workType "article" @default.