Matches in SemOpenAlex for { <https://semopenalex.org/work/W4311000294> ?p ?o ?g. }
Showing items 1 to 77 of
77
with 100 items per page.
- W4311000294 abstract "Vision-Language Pretraining (VLP) and Foundation models have been the go-to recipe for achieving SoTA performance on general benchmarks. However, leveraging these powerful techniques for more complex vision-language tasks, such as cooking applications, with more structured input data, is still little investigated. In this work, we propose to leverage these techniques for structured-text based computational cuisine tasks. Our strategy, dubbed VLPCook, first transforms existing image-text pairs to image and structured-text pairs. This allows to pretrain our VLPCook model using VLP objectives adapted to the strutured data of the resulting datasets, then finetuning it on downstream computational cooking tasks. During finetuning, we also enrich the visual encoder, leveraging pretrained foundation models (e.g. CLIP) to provide local and global textual context. VLPCook outperforms current SoTA by a significant margin (+3.3 Recall@1 absolute improvement) on the task of Cross-Modal Food Retrieval on the large Recipe1M dataset. We conduct further experiments on VLP to validate their importance, especially on the Recipe1M+ dataset. Finally, we validate the generalization of the approach to other tasks (i.e, Food Recognition) and domains with structured text such as the Medical domain on the ROCO dataset. The code is available here: https://github.com/mshukor/VLPCook" @default.
- W4311000294 created "2022-12-22" @default.
- W4311000294 creator A5017490804 @default.
- W4311000294 creator A5022871131 @default.
- W4311000294 creator A5036799594 @default.
- W4311000294 date "2022-12-08" @default.
- W4311000294 modified "2023-09-28" @default.
- W4311000294 title "Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval" @default.
- W4311000294 doi "https://doi.org/10.48550/arxiv.2212.04267" @default.
- W4311000294 hasPublicationYear "2022" @default.
- W4311000294 type Work @default.
- W4311000294 citedByCount "0" @default.
- W4311000294 crossrefType "posted-content" @default.
- W4311000294 hasAuthorship W4311000294A5017490804 @default.
- W4311000294 hasAuthorship W4311000294A5022871131 @default.
- W4311000294 hasAuthorship W4311000294A5036799594 @default.
- W4311000294 hasBestOaLocation W43110002941 @default.
- W4311000294 hasConcept C111919701 @default.
- W4311000294 hasConcept C118505674 @default.
- W4311000294 hasConcept C119857082 @default.
- W4311000294 hasConcept C134306372 @default.
- W4311000294 hasConcept C151730666 @default.
- W4311000294 hasConcept C153083717 @default.
- W4311000294 hasConcept C154945302 @default.
- W4311000294 hasConcept C162324750 @default.
- W4311000294 hasConcept C185592680 @default.
- W4311000294 hasConcept C187736073 @default.
- W4311000294 hasConcept C188027245 @default.
- W4311000294 hasConcept C204321447 @default.
- W4311000294 hasConcept C22367795 @default.
- W4311000294 hasConcept C23123220 @default.
- W4311000294 hasConcept C2779343474 @default.
- W4311000294 hasConcept C2780451532 @default.
- W4311000294 hasConcept C33923547 @default.
- W4311000294 hasConcept C36503486 @default.
- W4311000294 hasConcept C41008148 @default.
- W4311000294 hasConcept C71139939 @default.
- W4311000294 hasConcept C774472 @default.
- W4311000294 hasConcept C86803240 @default.
- W4311000294 hasConceptScore W4311000294C111919701 @default.
- W4311000294 hasConceptScore W4311000294C118505674 @default.
- W4311000294 hasConceptScore W4311000294C119857082 @default.
- W4311000294 hasConceptScore W4311000294C134306372 @default.
- W4311000294 hasConceptScore W4311000294C151730666 @default.
- W4311000294 hasConceptScore W4311000294C153083717 @default.
- W4311000294 hasConceptScore W4311000294C154945302 @default.
- W4311000294 hasConceptScore W4311000294C162324750 @default.
- W4311000294 hasConceptScore W4311000294C185592680 @default.
- W4311000294 hasConceptScore W4311000294C187736073 @default.
- W4311000294 hasConceptScore W4311000294C188027245 @default.
- W4311000294 hasConceptScore W4311000294C204321447 @default.
- W4311000294 hasConceptScore W4311000294C22367795 @default.
- W4311000294 hasConceptScore W4311000294C23123220 @default.
- W4311000294 hasConceptScore W4311000294C2779343474 @default.
- W4311000294 hasConceptScore W4311000294C2780451532 @default.
- W4311000294 hasConceptScore W4311000294C33923547 @default.
- W4311000294 hasConceptScore W4311000294C36503486 @default.
- W4311000294 hasConceptScore W4311000294C41008148 @default.
- W4311000294 hasConceptScore W4311000294C71139939 @default.
- W4311000294 hasConceptScore W4311000294C774472 @default.
- W4311000294 hasConceptScore W4311000294C86803240 @default.
- W4311000294 hasLocation W43110002941 @default.
- W4311000294 hasOpenAccess W4311000294 @default.
- W4311000294 hasPrimaryLocation W43110002941 @default.
- W4311000294 hasRelatedWork W1504101963 @default.
- W4311000294 hasRelatedWork W1509467138 @default.
- W4311000294 hasRelatedWork W1716025118 @default.
- W4311000294 hasRelatedWork W2081647779 @default.
- W4311000294 hasRelatedWork W2990109640 @default.
- W4311000294 hasRelatedWork W3107474891 @default.
- W4311000294 hasRelatedWork W3172706523 @default.
- W4311000294 hasRelatedWork W3185852197 @default.
- W4311000294 hasRelatedWork W32283444 @default.
- W4311000294 hasRelatedWork W4295267149 @default.
- W4311000294 isParatext "false" @default.
- W4311000294 isRetracted "false" @default.
- W4311000294 workType "article" @default.