Matches in SemOpenAlex for { <https://semopenalex.org/work/W4306795416> ?p ?o ?g. }
Showing items 1 to 69 of
69
with 100 items per page.
- W4306795416 abstract "Cross-modal video retrieval aims to retrieve the semantically relevant videos given a text as a query, and is one of the fundamental tasks in Multimedia. Most of top-performing methods primarily leverage Visual Transformer (ViT) to extract video features [1, 2, 3], suffering from high computational complexity of ViT especially for encoding long videos. A common and simple solution is to uniformly sample a small number (say, 4 or 8) of frames from the video (instead of using the whole video) as input to ViT. The number of frames has a strong influence on the performance of ViT, e.g., using 8 frames performs better than using 4 frames yet needs more computational resources, resulting in a trade-off. To get free from this trade-off, this paper introduces an automatic video compression method based on a bilevel optimization program (BOP) consisting of both model-level (i.e., base-level) and frame-level (i.e., meta-level) optimizations. The model-level learns a cross-modal video retrieval model whose input is the compressed frames learned by frame-level optimization. In turn, the frame-level optimization is through gradient descent using the meta loss of video retrieval model computed on the whole video. We call this BOP method as well as the compressed frames as Meta-Optimized Frames (MOF). By incorporating MOF, the video retrieval model is able to utilize the information of whole videos (for training) while taking only a small number of input frames in actual implementation. The convergence of MOF is guaranteed by meta gradient descent algorithms. For evaluation, we conduct extensive experiments of cross-modal video retrieval on three large-scale benchmarks: MSR-VTT, MSVD, and DiDeMo. Our results show that MOF is a generic and efficient method to boost multiple baseline methods, and can achieve a new state-of-the-art performance." @default.
- W4306795416 created "2022-10-20" @default.
- W4306795416 creator A5009930587 @default.
- W4306795416 creator A5022499603 @default.
- W4306795416 creator A5034434932 @default.
- W4306795416 creator A5039617569 @default.
- W4306795416 creator A5075943848 @default.
- W4306795416 date "2022-10-16" @default.
- W4306795416 modified "2023-10-16" @default.
- W4306795416 title "Efficient Cross-Modal Video Retrieval with Meta-Optimized Frames" @default.
- W4306795416 doi "https://doi.org/10.48550/arxiv.2210.08452" @default.
- W4306795416 hasPublicationYear "2022" @default.
- W4306795416 type Work @default.
- W4306795416 citedByCount "0" @default.
- W4306795416 crossrefType "posted-content" @default.
- W4306795416 hasAuthorship W4306795416A5009930587 @default.
- W4306795416 hasAuthorship W4306795416A5022499603 @default.
- W4306795416 hasAuthorship W4306795416A5034434932 @default.
- W4306795416 hasAuthorship W4306795416A5039617569 @default.
- W4306795416 hasAuthorship W4306795416A5075943848 @default.
- W4306795416 hasBestOaLocation W43067954161 @default.
- W4306795416 hasConcept C106030495 @default.
- W4306795416 hasConcept C126042441 @default.
- W4306795416 hasConcept C153083717 @default.
- W4306795416 hasConcept C154945302 @default.
- W4306795416 hasConcept C167510206 @default.
- W4306795416 hasConcept C172849965 @default.
- W4306795416 hasConcept C185592680 @default.
- W4306795416 hasConcept C188027245 @default.
- W4306795416 hasConcept C202474056 @default.
- W4306795416 hasConcept C204641915 @default.
- W4306795416 hasConcept C23123220 @default.
- W4306795416 hasConcept C31972630 @default.
- W4306795416 hasConcept C41008148 @default.
- W4306795416 hasConcept C65483669 @default.
- W4306795416 hasConcept C71139939 @default.
- W4306795416 hasConcept C76155785 @default.
- W4306795416 hasConceptScore W4306795416C106030495 @default.
- W4306795416 hasConceptScore W4306795416C126042441 @default.
- W4306795416 hasConceptScore W4306795416C153083717 @default.
- W4306795416 hasConceptScore W4306795416C154945302 @default.
- W4306795416 hasConceptScore W4306795416C167510206 @default.
- W4306795416 hasConceptScore W4306795416C172849965 @default.
- W4306795416 hasConceptScore W4306795416C185592680 @default.
- W4306795416 hasConceptScore W4306795416C188027245 @default.
- W4306795416 hasConceptScore W4306795416C202474056 @default.
- W4306795416 hasConceptScore W4306795416C204641915 @default.
- W4306795416 hasConceptScore W4306795416C23123220 @default.
- W4306795416 hasConceptScore W4306795416C31972630 @default.
- W4306795416 hasConceptScore W4306795416C41008148 @default.
- W4306795416 hasConceptScore W4306795416C65483669 @default.
- W4306795416 hasConceptScore W4306795416C71139939 @default.
- W4306795416 hasConceptScore W4306795416C76155785 @default.
- W4306795416 hasLocation W43067954161 @default.
- W4306795416 hasOpenAccess W4306795416 @default.
- W4306795416 hasPrimaryLocation W43067954161 @default.
- W4306795416 hasRelatedWork W2107302173 @default.
- W4306795416 hasRelatedWork W2118574600 @default.
- W4306795416 hasRelatedWork W2118995646 @default.
- W4306795416 hasRelatedWork W2137513187 @default.
- W4306795416 hasRelatedWork W2186219439 @default.
- W4306795416 hasRelatedWork W2350572889 @default.
- W4306795416 hasRelatedWork W2375154311 @default.
- W4306795416 hasRelatedWork W2404934409 @default.
- W4306795416 hasRelatedWork W2541131230 @default.
- W4306795416 hasRelatedWork W2187171999 @default.
- W4306795416 isParatext "false" @default.
- W4306795416 isRetracted "false" @default.
- W4306795416 workType "article" @default.