Matches in SemOpenAlex for { <https://semopenalex.org/work/W4315705838> ?p ?o ?g. }
Showing items 1 to 91 of
91
with 100 items per page.
- W4315705838 abstract "Generative language models define distributions over sequences of tokens that can represent essentially any combination of data modalities (e.g., any permutation of image tokens from VQ-VAEs, speech tokens from HuBERT, BPE tokens for language or code, and so on). To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modalities and model sizes ranging from 8 million to 30 billion, trained on 5-100 billion tokens. We report new mixed-modal scaling laws that unify the contributions of individual modalities and the interactions between them. Specifically, we explicitly model the optimal synergy and competition due to data and model size as an additive term to previous uni-modal scaling laws. We also find four empirical phenomena observed during the training, such as emergent coordinate-ascent style training that naturally alternates between modalities, guidelines for selecting critical hyper-parameters, and connections between mixed-modal competition and training stability. Finally, we test our scaling law by training a 30B speech-text model, which significantly outperforms the corresponding unimodal models. Overall, our research provides valuable insights into the design and training of mixed-modal generative models, an important new class of unified models that have unique distributional properties." @default.
- W4315705838 created "2023-01-12" @default.
- W4315705838 creator A5004696245 @default.
- W4315705838 creator A5024311574 @default.
- W4315705838 creator A5031662324 @default.
- W4315705838 creator A5049992350 @default.
- W4315705838 creator A5051860831 @default.
- W4315705838 creator A5051950818 @default.
- W4315705838 creator A5062471396 @default.
- W4315705838 creator A5067919401 @default.
- W4315705838 creator A5068394403 @default.
- W4315705838 creator A5075834790 @default.
- W4315705838 date "2023-01-09" @default.
- W4315705838 modified "2023-10-14" @default.
- W4315705838 title "Scaling Laws for Generative Mixed-Modal Language Models" @default.
- W4315705838 doi "https://doi.org/10.48550/arxiv.2301.03728" @default.
- W4315705838 hasPublicationYear "2023" @default.
- W4315705838 type Work @default.
- W4315705838 citedByCount "0" @default.
- W4315705838 crossrefType "posted-content" @default.
- W4315705838 hasAuthorship W4315705838A5004696245 @default.
- W4315705838 hasAuthorship W4315705838A5024311574 @default.
- W4315705838 hasAuthorship W4315705838A5031662324 @default.
- W4315705838 hasAuthorship W4315705838A5049992350 @default.
- W4315705838 hasAuthorship W4315705838A5051860831 @default.
- W4315705838 hasAuthorship W4315705838A5051950818 @default.
- W4315705838 hasAuthorship W4315705838A5062471396 @default.
- W4315705838 hasAuthorship W4315705838A5067919401 @default.
- W4315705838 hasAuthorship W4315705838A5068394403 @default.
- W4315705838 hasAuthorship W4315705838A5075834790 @default.
- W4315705838 hasBestOaLocation W43157058381 @default.
- W4315705838 hasConcept C112972136 @default.
- W4315705838 hasConcept C119857082 @default.
- W4315705838 hasConcept C137293760 @default.
- W4315705838 hasConcept C138885662 @default.
- W4315705838 hasConcept C144024400 @default.
- W4315705838 hasConcept C154945302 @default.
- W4315705838 hasConcept C167966045 @default.
- W4315705838 hasConcept C177264268 @default.
- W4315705838 hasConcept C185592680 @default.
- W4315705838 hasConcept C188027245 @default.
- W4315705838 hasConcept C199360897 @default.
- W4315705838 hasConcept C204321447 @default.
- W4315705838 hasConcept C2524010 @default.
- W4315705838 hasConcept C2776760102 @default.
- W4315705838 hasConcept C2779903281 @default.
- W4315705838 hasConcept C33923547 @default.
- W4315705838 hasConcept C36289849 @default.
- W4315705838 hasConcept C39890363 @default.
- W4315705838 hasConcept C41008148 @default.
- W4315705838 hasConcept C41895202 @default.
- W4315705838 hasConcept C71139939 @default.
- W4315705838 hasConcept C99844830 @default.
- W4315705838 hasConceptScore W4315705838C112972136 @default.
- W4315705838 hasConceptScore W4315705838C119857082 @default.
- W4315705838 hasConceptScore W4315705838C137293760 @default.
- W4315705838 hasConceptScore W4315705838C138885662 @default.
- W4315705838 hasConceptScore W4315705838C144024400 @default.
- W4315705838 hasConceptScore W4315705838C154945302 @default.
- W4315705838 hasConceptScore W4315705838C167966045 @default.
- W4315705838 hasConceptScore W4315705838C177264268 @default.
- W4315705838 hasConceptScore W4315705838C185592680 @default.
- W4315705838 hasConceptScore W4315705838C188027245 @default.
- W4315705838 hasConceptScore W4315705838C199360897 @default.
- W4315705838 hasConceptScore W4315705838C204321447 @default.
- W4315705838 hasConceptScore W4315705838C2524010 @default.
- W4315705838 hasConceptScore W4315705838C2776760102 @default.
- W4315705838 hasConceptScore W4315705838C2779903281 @default.
- W4315705838 hasConceptScore W4315705838C33923547 @default.
- W4315705838 hasConceptScore W4315705838C36289849 @default.
- W4315705838 hasConceptScore W4315705838C39890363 @default.
- W4315705838 hasConceptScore W4315705838C41008148 @default.
- W4315705838 hasConceptScore W4315705838C41895202 @default.
- W4315705838 hasConceptScore W4315705838C71139939 @default.
- W4315705838 hasConceptScore W4315705838C99844830 @default.
- W4315705838 hasLocation W43157058381 @default.
- W4315705838 hasOpenAccess W4315705838 @default.
- W4315705838 hasPrimaryLocation W43157058381 @default.
- W4315705838 hasRelatedWork W1652742547 @default.
- W4315705838 hasRelatedWork W1964591498 @default.
- W4315705838 hasRelatedWork W2919655474 @default.
- W4315705838 hasRelatedWork W3047164996 @default.
- W4315705838 hasRelatedWork W3088348751 @default.
- W4315705838 hasRelatedWork W3107474891 @default.
- W4315705838 hasRelatedWork W3177920269 @default.
- W4315705838 hasRelatedWork W4287553687 @default.
- W4315705838 hasRelatedWork W4309216727 @default.
- W4315705838 hasRelatedWork W89332836 @default.
- W4315705838 isParatext "false" @default.
- W4315705838 isRetracted "false" @default.
- W4315705838 workType "article" @default.