Matches in SemOpenAlex for { <https://semopenalex.org/work/W4378465107> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4378465107 abstract "Performant vision-language (VL) models like CLIP represent captions using a single vector. How much information about language is lost in this bottleneck? We first curate CompPrompts, a set of increasingly compositional image captions that VL models should be able to capture (e.g., single object, to object+property, to multiple interacting objects). Then, we train text-only recovery probes that aim to reconstruct captions from single-vector text representations produced by several VL models. This approach doesn't require images, allowing us to test on a broader range of scenes compared to prior work. We find that: 1) CLIP's text encoder falls short on object relationships, attribute-object association, counting, and negations; 2) some text encoders work significantly better than others; and 3) text-only recovery performance predicts multi-modal matching performance on ControlledImCaps: a new evaluation benchmark we collect+release consisting of fine-grained compositional images+captions. Specifically -- our results suggest text-only recoverability is a necessary (but not sufficient) condition for modeling compositional factors in contrastive vision+language models. We release data+code." @default.
- W4378465107 created "2023-05-27" @default.
- W4378465107 creator A5014130591 @default.
- W4378465107 creator A5043614405 @default.
- W4378465107 creator A5087096372 @default.
- W4378465107 date "2023-05-24" @default.
- W4378465107 modified "2023-09-25" @default.
- W4378465107 title "Text encoders are performance bottlenecks in contrastive vision-language models" @default.
- W4378465107 doi "https://doi.org/10.48550/arxiv.2305.14897" @default.
- W4378465107 hasPublicationYear "2023" @default.
- W4378465107 type Work @default.
- W4378465107 citedByCount "0" @default.
- W4378465107 crossrefType "posted-content" @default.
- W4378465107 hasAuthorship W4378465107A5014130591 @default.
- W4378465107 hasAuthorship W4378465107A5043614405 @default.
- W4378465107 hasAuthorship W4378465107A5087096372 @default.
- W4378465107 hasBestOaLocation W43784651071 @default.
- W4378465107 hasConcept C105795698 @default.
- W4378465107 hasConcept C111919701 @default.
- W4378465107 hasConcept C118505674 @default.
- W4378465107 hasConcept C13280743 @default.
- W4378465107 hasConcept C137293760 @default.
- W4378465107 hasConcept C149635348 @default.
- W4378465107 hasConcept C154945302 @default.
- W4378465107 hasConcept C165064840 @default.
- W4378465107 hasConcept C177264268 @default.
- W4378465107 hasConcept C185798385 @default.
- W4378465107 hasConcept C199360897 @default.
- W4378465107 hasConcept C204321447 @default.
- W4378465107 hasConcept C205649164 @default.
- W4378465107 hasConcept C2776760102 @default.
- W4378465107 hasConcept C2780513914 @default.
- W4378465107 hasConcept C2781238097 @default.
- W4378465107 hasConcept C33923547 @default.
- W4378465107 hasConcept C41008148 @default.
- W4378465107 hasConcept C70437156 @default.
- W4378465107 hasConceptScore W4378465107C105795698 @default.
- W4378465107 hasConceptScore W4378465107C111919701 @default.
- W4378465107 hasConceptScore W4378465107C118505674 @default.
- W4378465107 hasConceptScore W4378465107C13280743 @default.
- W4378465107 hasConceptScore W4378465107C137293760 @default.
- W4378465107 hasConceptScore W4378465107C149635348 @default.
- W4378465107 hasConceptScore W4378465107C154945302 @default.
- W4378465107 hasConceptScore W4378465107C165064840 @default.
- W4378465107 hasConceptScore W4378465107C177264268 @default.
- W4378465107 hasConceptScore W4378465107C185798385 @default.
- W4378465107 hasConceptScore W4378465107C199360897 @default.
- W4378465107 hasConceptScore W4378465107C204321447 @default.
- W4378465107 hasConceptScore W4378465107C205649164 @default.
- W4378465107 hasConceptScore W4378465107C2776760102 @default.
- W4378465107 hasConceptScore W4378465107C2780513914 @default.
- W4378465107 hasConceptScore W4378465107C2781238097 @default.
- W4378465107 hasConceptScore W4378465107C33923547 @default.
- W4378465107 hasConceptScore W4378465107C41008148 @default.
- W4378465107 hasConceptScore W4378465107C70437156 @default.
- W4378465107 hasLocation W43784651071 @default.
- W4378465107 hasOpenAccess W4378465107 @default.
- W4378465107 hasPrimaryLocation W43784651071 @default.
- W4378465107 hasRelatedWork W142374489 @default.
- W4378465107 hasRelatedWork W2062543035 @default.
- W4378465107 hasRelatedWork W2249736983 @default.
- W4378465107 hasRelatedWork W2547835662 @default.
- W4378465107 hasRelatedWork W2756241593 @default.
- W4378465107 hasRelatedWork W3040872486 @default.
- W4378465107 hasRelatedWork W3107474891 @default.
- W4378465107 hasRelatedWork W3133724979 @default.
- W4378465107 hasRelatedWork W3154023894 @default.
- W4378465107 hasRelatedWork W3208409104 @default.
- W4378465107 isParatext "false" @default.
- W4378465107 isRetracted "false" @default.
- W4378465107 workType "article" @default.