Matches in SemOpenAlex for { <https://semopenalex.org/work/W4225414521> ?p ?o ?g. }
Showing items 1 to 84 of
84
with 100 items per page.
- W4225414521 abstract "Recently, large-scale pre-training methods like CLIP have made great progress in multi-modal research such as text-video retrieval. In CLIP, transformers are vital for modeling complex multi-modal relations. However, in the vision transformer of CLIP, the essential visual tokenization process, which produces discrete visual token sequences, generates many homogeneous tokens due to the redundancy nature of consecutive and similar frames in videos. This significantly increases computation costs and hinders the deployment of video retrieval models in web applications. In this paper, to reduce the number of redundant video tokens, we design a multi-segment token clustering algorithm to find the most representative tokens and drop the non-essential ones. As the frame redundancy occurs mostly in consecutive frames, we divide videos into multiple segments and conduct segment-level clustering. Center tokens from each segment are later concatenated into a new sequence, while their original spatial-temporal relations are well maintained. We instantiate two clustering algorithms to efficiently find deterministic medoids and iteratively partition groups in high dimensional space. Through this token clustering and center selection procedure, we successfully reduce computation costs by removing redundant visual tokens. This method further enhances segment-level semantic alignment between video and text representations, enforcing the spatio-temporal interactions of tokens from within-segment frames. Our method, coined as CenterCLIP, surpasses existing state-of-the-art by a large margin on typical text-video benchmarks, while reducing the training memory cost by 35% and accelerating the inference speed by 14% at the best case. The code is available at href{{https://github.com/mzhaoshuai/CenterCLIP}}{{https://github.com/mzhaoshuai/CenterCLIP}}." @default.
- W4225414521 created "2022-05-05" @default.
- W4225414521 creator A5005421447 @default.
- W4225414521 creator A5022703708 @default.
- W4225414521 creator A5028165423 @default.
- W4225414521 creator A5043617790 @default.
- W4225414521 date "2022-07-06" @default.
- W4225414521 modified "2023-10-18" @default.
- W4225414521 title "CenterCLIP" @default.
- W4225414521 cites W1893116441 @default.
- W4225414521 cites W2081495115 @default.
- W4225414521 cites W2122051577 @default.
- W4225414521 cites W2151148091 @default.
- W4225414521 cites W2425121537 @default.
- W4225414521 cites W2507009361 @default.
- W4225414521 cites W2565656701 @default.
- W4225414521 cites W2606982687 @default.
- W4225414521 cites W2750526644 @default.
- W4225414521 cites W2765716052 @default.
- W4225414521 cites W2885775891 @default.
- W4225414521 cites W2897439619 @default.
- W4225414521 cites W2962784628 @default.
- W4225414521 cites W2963576560 @default.
- W4225414521 cites W2981851019 @default.
- W4225414521 cites W2984008963 @default.
- W4225414521 cites W3034730770 @default.
- W4225414521 cites W3035265375 @default.
- W4225414521 cites W3035635319 @default.
- W4225414521 cites W3043840704 @default.
- W4225414521 cites W3131922516 @default.
- W4225414521 cites W3153005511 @default.
- W4225414521 cites W3155119115 @default.
- W4225414521 cites W3168640669 @default.
- W4225414521 cites W3174873881 @default.
- W4225414521 cites W3205497712 @default.
- W4225414521 cites W4214926101 @default.
- W4225414521 doi "https://doi.org/10.1145/3477495.3531950" @default.
- W4225414521 hasPublicationYear "2022" @default.
- W4225414521 type Work @default.
- W4225414521 citedByCount "15" @default.
- W4225414521 countsByYear W42254145212022 @default.
- W4225414521 countsByYear W42254145212023 @default.
- W4225414521 crossrefType "proceedings-article" @default.
- W4225414521 hasAuthorship W4225414521A5005421447 @default.
- W4225414521 hasAuthorship W4225414521A5022703708 @default.
- W4225414521 hasAuthorship W4225414521A5028165423 @default.
- W4225414521 hasAuthorship W4225414521A5043617790 @default.
- W4225414521 hasBestOaLocation W42254145212 @default.
- W4225414521 hasConcept C111919701 @default.
- W4225414521 hasConcept C152124472 @default.
- W4225414521 hasConcept C153180895 @default.
- W4225414521 hasConcept C154945302 @default.
- W4225414521 hasConcept C2776214188 @default.
- W4225414521 hasConcept C31258907 @default.
- W4225414521 hasConcept C41008148 @default.
- W4225414521 hasConcept C48145219 @default.
- W4225414521 hasConcept C73555534 @default.
- W4225414521 hasConceptScore W4225414521C111919701 @default.
- W4225414521 hasConceptScore W4225414521C152124472 @default.
- W4225414521 hasConceptScore W4225414521C153180895 @default.
- W4225414521 hasConceptScore W4225414521C154945302 @default.
- W4225414521 hasConceptScore W4225414521C2776214188 @default.
- W4225414521 hasConceptScore W4225414521C31258907 @default.
- W4225414521 hasConceptScore W4225414521C41008148 @default.
- W4225414521 hasConceptScore W4225414521C48145219 @default.
- W4225414521 hasConceptScore W4225414521C73555534 @default.
- W4225414521 hasLocation W42254145211 @default.
- W4225414521 hasLocation W42254145212 @default.
- W4225414521 hasLocation W42254145213 @default.
- W4225414521 hasOpenAccess W4225414521 @default.
- W4225414521 hasPrimaryLocation W42254145211 @default.
- W4225414521 hasRelatedWork W1985412924 @default.
- W4225414521 hasRelatedWork W2033914206 @default.
- W4225414521 hasRelatedWork W2146076056 @default.
- W4225414521 hasRelatedWork W2163831990 @default.
- W4225414521 hasRelatedWork W2375389409 @default.
- W4225414521 hasRelatedWork W2488051804 @default.
- W4225414521 hasRelatedWork W2917049097 @default.
- W4225414521 hasRelatedWork W3003836766 @default.
- W4225414521 hasRelatedWork W3211815623 @default.
- W4225414521 hasRelatedWork W4299878869 @default.
- W4225414521 isParatext "false" @default.
- W4225414521 isRetracted "false" @default.
- W4225414521 workType "article" @default.