Matches in SemOpenAlex for { <https://semopenalex.org/work/W4366197770> ?p ?o ?g. }
Showing items 1 to 83 of
83
with 100 items per page.
- W4366197770 abstract "In-context vision and language models like Flamingo support arbitrarily interleaved sequences of images and text as input. This format not only enables few-shot learning via interleaving independent supervised (image, text) examples, but also, more complex prompts involving interaction between images, e.g., What do image A and image B have in common? To support this interface, pretraining occurs over web corpora that similarly contain interleaved images+text. To date, however, large-scale data of this form have not been publicly available. We release Multimodal C4, an augmentation of the popular text-only C4 corpus with images interleaved. We use a linear assignment algorithm to place images into longer bodies of text using CLIP features, a process that we show outperforms alternatives. Multimodal C4 spans everyday topics like cooking, travel, technology, etc. A manual inspection of a random sample of documents shows that a vast majority (88%) of images are topically relevant, and that linear assignment frequently selects individual sentences specifically well-aligned with each image (80%). After filtering NSFW images, ads, etc., the resulting corpus consists of 101.2M documents with 571M images interleaved in 43B English tokens." @default.
- W4366197770 created "2023-04-19" @default.
- W4366197770 creator A5008013895 @default.
- W4366197770 creator A5023400154 @default.
- W4366197770 creator A5029444496 @default.
- W4366197770 creator A5039046571 @default.
- W4366197770 creator A5043614405 @default.
- W4366197770 creator A5045464993 @default.
- W4366197770 creator A5050095135 @default.
- W4366197770 creator A5050195037 @default.
- W4366197770 creator A5070635591 @default.
- W4366197770 creator A5079129211 @default.
- W4366197770 date "2023-04-14" @default.
- W4366197770 modified "2023-09-30" @default.
- W4366197770 title "Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text" @default.
- W4366197770 doi "https://doi.org/10.48550/arxiv.2304.06939" @default.
- W4366197770 hasPublicationYear "2023" @default.
- W4366197770 type Work @default.
- W4366197770 citedByCount "0" @default.
- W4366197770 crossrefType "posted-content" @default.
- W4366197770 hasAuthorship W4366197770A5008013895 @default.
- W4366197770 hasAuthorship W4366197770A5023400154 @default.
- W4366197770 hasAuthorship W4366197770A5029444496 @default.
- W4366197770 hasAuthorship W4366197770A5039046571 @default.
- W4366197770 hasAuthorship W4366197770A5043614405 @default.
- W4366197770 hasAuthorship W4366197770A5045464993 @default.
- W4366197770 hasAuthorship W4366197770A5050095135 @default.
- W4366197770 hasAuthorship W4366197770A5050195037 @default.
- W4366197770 hasAuthorship W4366197770A5070635591 @default.
- W4366197770 hasAuthorship W4366197770A5079129211 @default.
- W4366197770 hasBestOaLocation W43661977701 @default.
- W4366197770 hasConcept C111919701 @default.
- W4366197770 hasConcept C115961682 @default.
- W4366197770 hasConcept C121332964 @default.
- W4366197770 hasConcept C151730666 @default.
- W4366197770 hasConcept C153180895 @default.
- W4366197770 hasConcept C154945302 @default.
- W4366197770 hasConcept C185592680 @default.
- W4366197770 hasConcept C198531522 @default.
- W4366197770 hasConcept C204321447 @default.
- W4366197770 hasConcept C23123220 @default.
- W4366197770 hasConcept C2776674983 @default.
- W4366197770 hasConcept C2778755073 @default.
- W4366197770 hasConcept C2779343474 @default.
- W4366197770 hasConcept C28034677 @default.
- W4366197770 hasConcept C41008148 @default.
- W4366197770 hasConcept C43617362 @default.
- W4366197770 hasConcept C62520636 @default.
- W4366197770 hasConcept C86803240 @default.
- W4366197770 hasConceptScore W4366197770C111919701 @default.
- W4366197770 hasConceptScore W4366197770C115961682 @default.
- W4366197770 hasConceptScore W4366197770C121332964 @default.
- W4366197770 hasConceptScore W4366197770C151730666 @default.
- W4366197770 hasConceptScore W4366197770C153180895 @default.
- W4366197770 hasConceptScore W4366197770C154945302 @default.
- W4366197770 hasConceptScore W4366197770C185592680 @default.
- W4366197770 hasConceptScore W4366197770C198531522 @default.
- W4366197770 hasConceptScore W4366197770C204321447 @default.
- W4366197770 hasConceptScore W4366197770C23123220 @default.
- W4366197770 hasConceptScore W4366197770C2776674983 @default.
- W4366197770 hasConceptScore W4366197770C2778755073 @default.
- W4366197770 hasConceptScore W4366197770C2779343474 @default.
- W4366197770 hasConceptScore W4366197770C28034677 @default.
- W4366197770 hasConceptScore W4366197770C41008148 @default.
- W4366197770 hasConceptScore W4366197770C43617362 @default.
- W4366197770 hasConceptScore W4366197770C62520636 @default.
- W4366197770 hasConceptScore W4366197770C86803240 @default.
- W4366197770 hasLocation W43661977701 @default.
- W4366197770 hasOpenAccess W4366197770 @default.
- W4366197770 hasPrimaryLocation W43661977701 @default.
- W4366197770 hasRelatedWork W2043941084 @default.
- W4366197770 hasRelatedWork W2062422262 @default.
- W4366197770 hasRelatedWork W2092957489 @default.
- W4366197770 hasRelatedWork W2357241418 @default.
- W4366197770 hasRelatedWork W2366644548 @default.
- W4366197770 hasRelatedWork W2371129605 @default.
- W4366197770 hasRelatedWork W2376314740 @default.
- W4366197770 hasRelatedWork W2384888906 @default.
- W4366197770 hasRelatedWork W2980434498 @default.
- W4366197770 hasRelatedWork W2583359890 @default.
- W4366197770 isParatext "false" @default.
- W4366197770 isRetracted "false" @default.
- W4366197770 workType "article" @default.