Matches in SemOpenAlex for { <https://semopenalex.org/work/W4298052184> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4298052184 abstract "Most existing methods in vision-language retrieval match two modalities by either comparing their global feature vectors which misses sufficient information and lacks interpretability, detecting objects in images or videos and aligning the text with fine-grained features which relies on complicated model designs, or modeling fine-grained interaction via cross-attention upon visual and textual tokens which suffers from inferior efficiency. To address these limitations, some recent works simply aggregate the token-wise similarities to achieve fine-grained alignment, but they lack intuitive explanations as well as neglect the relationships between token-level features and global representations with high-level semantics. In this work, we rethink fine-grained cross-modal alignment and devise a new model-agnostic formulation for it. We additionally demystify the recent popular works and subsume them into our scheme. Furthermore, inspired by optimal transport theory, we introduce TokenFlow, an instantiation of the proposed scheme. By modifying only the similarity function, the performance of our method is comparable to the SoTA algorithms with heavy model designs on major video-text retrieval benchmarks. The visualization further indicates that TokenFlow successfully leverages the fine-grained information and achieves better interpretability." @default.
- W4298052184 created "2022-10-01" @default.
- W4298052184 creator A5038459817 @default.
- W4298052184 creator A5069622043 @default.
- W4298052184 creator A5072568834 @default.
- W4298052184 creator A5077816945 @default.
- W4298052184 date "2022-09-28" @default.
- W4298052184 modified "2023-09-28" @default.
- W4298052184 title "TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval" @default.
- W4298052184 doi "https://doi.org/10.48550/arxiv.2209.13822" @default.
- W4298052184 hasPublicationYear "2022" @default.
- W4298052184 type Work @default.
- W4298052184 citedByCount "0" @default.
- W4298052184 crossrefType "posted-content" @default.
- W4298052184 hasAuthorship W4298052184A5038459817 @default.
- W4298052184 hasAuthorship W4298052184A5069622043 @default.
- W4298052184 hasAuthorship W4298052184A5072568834 @default.
- W4298052184 hasAuthorship W4298052184A5077816945 @default.
- W4298052184 hasBestOaLocation W42980521841 @default.
- W4298052184 hasConcept C103278499 @default.
- W4298052184 hasConcept C115961682 @default.
- W4298052184 hasConcept C134306372 @default.
- W4298052184 hasConcept C138885662 @default.
- W4298052184 hasConcept C154945302 @default.
- W4298052184 hasConcept C184337299 @default.
- W4298052184 hasConcept C185592680 @default.
- W4298052184 hasConcept C188027245 @default.
- W4298052184 hasConcept C199360897 @default.
- W4298052184 hasConcept C204321447 @default.
- W4298052184 hasConcept C23123220 @default.
- W4298052184 hasConcept C2776401178 @default.
- W4298052184 hasConcept C2781067378 @default.
- W4298052184 hasConcept C33923547 @default.
- W4298052184 hasConcept C36464697 @default.
- W4298052184 hasConcept C38652104 @default.
- W4298052184 hasConcept C41008148 @default.
- W4298052184 hasConcept C41895202 @default.
- W4298052184 hasConcept C48145219 @default.
- W4298052184 hasConcept C71139939 @default.
- W4298052184 hasConcept C77618280 @default.
- W4298052184 hasConcept C80444323 @default.
- W4298052184 hasConceptScore W4298052184C103278499 @default.
- W4298052184 hasConceptScore W4298052184C115961682 @default.
- W4298052184 hasConceptScore W4298052184C134306372 @default.
- W4298052184 hasConceptScore W4298052184C138885662 @default.
- W4298052184 hasConceptScore W4298052184C154945302 @default.
- W4298052184 hasConceptScore W4298052184C184337299 @default.
- W4298052184 hasConceptScore W4298052184C185592680 @default.
- W4298052184 hasConceptScore W4298052184C188027245 @default.
- W4298052184 hasConceptScore W4298052184C199360897 @default.
- W4298052184 hasConceptScore W4298052184C204321447 @default.
- W4298052184 hasConceptScore W4298052184C23123220 @default.
- W4298052184 hasConceptScore W4298052184C2776401178 @default.
- W4298052184 hasConceptScore W4298052184C2781067378 @default.
- W4298052184 hasConceptScore W4298052184C33923547 @default.
- W4298052184 hasConceptScore W4298052184C36464697 @default.
- W4298052184 hasConceptScore W4298052184C38652104 @default.
- W4298052184 hasConceptScore W4298052184C41008148 @default.
- W4298052184 hasConceptScore W4298052184C41895202 @default.
- W4298052184 hasConceptScore W4298052184C48145219 @default.
- W4298052184 hasConceptScore W4298052184C71139939 @default.
- W4298052184 hasConceptScore W4298052184C77618280 @default.
- W4298052184 hasConceptScore W4298052184C80444323 @default.
- W4298052184 hasLocation W42980521841 @default.
- W4298052184 hasOpenAccess W4298052184 @default.
- W4298052184 hasPrimaryLocation W42980521841 @default.
- W4298052184 hasRelatedWork W1541271503 @default.
- W4298052184 hasRelatedWork W2068018629 @default.
- W4298052184 hasRelatedWork W2349125667 @default.
- W4298052184 hasRelatedWork W2805914100 @default.
- W4298052184 hasRelatedWork W2912445262 @default.
- W4298052184 hasRelatedWork W3027591380 @default.
- W4298052184 hasRelatedWork W3097853387 @default.
- W4298052184 hasRelatedWork W3214915308 @default.
- W4298052184 hasRelatedWork W4281690070 @default.
- W4298052184 hasRelatedWork W4296442150 @default.
- W4298052184 isParatext "false" @default.
- W4298052184 isRetracted "false" @default.
- W4298052184 workType "article" @default.