SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4324291273> ?p ?o ?g. }

Showing items 1 to 100 of ±139 with 100 items per page.

W4324291273 endingPage "5497" @default.
W4324291273 startingPage "5486" @default.
W4324291273 abstract "Cross-modal retrieval aims to enable a flexible bi-directional retrieval experience across different modalities ( <italic xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>e.g</i> ., searching for videos with texts). Many existing efforts tend to learn a common semantic representation embedding space in which items of different modalities can be directly compared, wherein the positive global representations of video-text pairs are pulled close while the negative ones are pushed apart via pair-wise ranking loss. However, such a vanilla loss would unfortunately yield ambiguous feature embeddings for texts of different videos, causing inaccurate cross-modal matching and unreliable retrievals. Toward this end, we propose a multimodal contrastive knowledge distillation method for instance video-text retrieval, called MCKD, by adaptively using the general knowledge of self-supervised model (teacher) to calibrate mixed boundaries. Specifically, the teacher model is tailored for robust (less-ambiguous) visual-text joint semantic space by maximizing mutual information of co-occurred modalities during multimodal contrastive learning. This robust and structural inter-instance knowledge is then distilled, with the help of explicit discrimination loss, to a student model for improved matching performance. Extensive experiments on four public benchmark video-text datasets (MSR-VTT, TGIF, VATEX, and Youtube2Text) demonstrate that our MCKD can achieve at most 8.8%, 6.4%, 5.9%, and 5.3% improvement in text-to-video performance by the R@1 metric, compared with 14 SoTA baselines." @default.
W4324291273 created "2023-03-16" @default.
W4324291273 creator A5006334685 @default.
W4324291273 creator A5007745122 @default.
W4324291273 creator A5031638251 @default.
W4324291273 creator A5044528861 @default.
W4324291273 creator A5069484115 @default.
W4324291273 date "2023-10-01" @default.
W4324291273 modified "2023-10-17" @default.
W4324291273 title "Using Multimodal Contrastive Knowledge Distillation for Video-Text Retrieval" @default.
W4324291273 cites W1964073652 @default.
W4324291273 cites W2127944900 @default.
W4324291273 cites W2132551672 @default.
W4324291273 cites W2142900973 @default.
W4324291273 cites W2194775991 @default.
W4324291273 cites W2250539671 @default.
W4324291273 cites W2425121537 @default.
W4324291273 cites W2588534625 @default.
W4324291273 cites W2808847742 @default.
W4324291273 cites W2885775891 @default.
W4324291273 cites W2956018683 @default.
W4324291273 cites W2963293463 @default.
W4324291273 cites W2963524571 @default.
W4324291273 cites W2975813532 @default.
W4324291273 cites W2981716253 @default.
W4324291273 cites W2989322838 @default.
W4324291273 cites W3029678209 @default.
W4324291273 cites W3034368386 @default.
W4324291273 cites W3034882096 @default.
W4324291273 cites W3034890701 @default.
W4324291273 cites W3035309251 @default.
W4324291273 cites W3035356601 @default.
W4324291273 cites W3099614098 @default.
W4324291273 cites W3102887392 @default.
W4324291273 cites W3108655343 @default.
W4324291273 cites W3128401049 @default.
W4324291273 cites W3130796238 @default.
W4324291273 cites W3153005511 @default.
W4324291273 cites W3154682722 @default.
W4324291273 cites W3161771873 @default.
W4324291273 cites W3162694035 @default.
W4324291273 cites W3174873881 @default.
W4324291273 cites W3175939205 @default.
W4324291273 cites W3176451698 @default.
W4324291273 cites W3197447668 @default.
W4324291273 cites W3205408642 @default.
W4324291273 cites W3207042189 @default.
W4324291273 cites W4211053420 @default.
W4324291273 cites W4304083193 @default.
W4324291273 cites W4307233751 @default.
W4324291273 doi "https://doi.org/10.1109/tcsvt.2023.3257193" @default.
W4324291273 hasPublicationYear "2023" @default.
W4324291273 type Work @default.
W4324291273 citedByCount "0" @default.
W4324291273 crossrefType "journal-article" @default.
W4324291273 hasAuthorship W4324291273A5006334685 @default.
W4324291273 hasAuthorship W4324291273A5007745122 @default.
W4324291273 hasAuthorship W4324291273A5031638251 @default.
W4324291273 hasAuthorship W4324291273A5044528861 @default.
W4324291273 hasAuthorship W4324291273A5069484115 @default.
W4324291273 hasConcept C103278499 @default.
W4324291273 hasConcept C105795698 @default.
W4324291273 hasConcept C115961682 @default.
W4324291273 hasConcept C119857082 @default.
W4324291273 hasConcept C13280743 @default.
W4324291273 hasConcept C138885662 @default.
W4324291273 hasConcept C144024400 @default.
W4324291273 hasConcept C154945302 @default.
W4324291273 hasConcept C162324750 @default.
W4324291273 hasConcept C165064840 @default.
W4324291273 hasConcept C176217482 @default.
W4324291273 hasConcept C17744445 @default.
W4324291273 hasConcept C185798385 @default.
W4324291273 hasConcept C189430467 @default.
W4324291273 hasConcept C199539241 @default.
W4324291273 hasConcept C204321447 @default.
W4324291273 hasConcept C205649164 @default.
W4324291273 hasConcept C21547014 @default.
W4324291273 hasConcept C23123220 @default.
W4324291273 hasConcept C2776359362 @default.
W4324291273 hasConcept C2776401178 @default.
W4324291273 hasConcept C2779903281 @default.
W4324291273 hasConcept C33923547 @default.
W4324291273 hasConcept C36289849 @default.
W4324291273 hasConcept C41008148 @default.
W4324291273 hasConcept C41608201 @default.
W4324291273 hasConcept C41895202 @default.
W4324291273 hasConcept C59404180 @default.
W4324291273 hasConcept C94625758 @default.
W4324291273 hasConceptScore W4324291273C103278499 @default.
W4324291273 hasConceptScore W4324291273C105795698 @default.
W4324291273 hasConceptScore W4324291273C115961682 @default.
W4324291273 hasConceptScore W4324291273C119857082 @default.
W4324291273 hasConceptScore W4324291273C13280743 @default.
W4324291273 hasConceptScore W4324291273C138885662 @default.
W4324291273 hasConceptScore W4324291273C144024400 @default.
W4324291273 hasConceptScore W4324291273C154945302 @default.
W4324291273 hasConceptScore W4324291273C162324750 @default.