Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386083119> ?p ?o ?g. }
Showing items 1 to 90 of
90
with 100 items per page.
- W4386083119 abstract "Vision Transformers (ViTs) are built on the assumption of treating image patches as “visual tokens” and learn patch-to-patch attention. The patch embedding based tokenizer has a semantic gap with respect to its counterpart, the textual tokenizer. The patch-to-patch attention suffers from the quadratic complexity issue, and also makes it non-trivial to explain learned ViTs. To address these issues in ViT, this paper proposes to learn Patch-to-Cluster attention (PaCa) in ViT. Queries in our PaCa-ViT starts with patches, while keys and values are directly based on clustering (with a predefined small number of clusters). The clusters are learned end-to-end, leading to better tokenizers and inducing joint clustering-for-attention and attention-for-clustering for better and interpretable models. The quadratic complexity is relaxed to linear complexity. The proposed PaCa module is used in designing efficient and interpretable ViT backbones and semantic segmentation head networks. In experiments, the proposed methods are tested on ImageNet-1k image classification, MS-COCO object detection and instance segmentation and MIT-ADE20k semantic segmentation. Compared with the prior art, it obtains better performance in all the three benchmarks than the SWin [32] and the PVTs [47], [48] by significant margins in ImageNet-1k and MIT-ADE20k. It is also significantly more efficient than PVT models in MS-COCO and MIT-ADE20k due to the linear complexity. The learned clusters are semantically meaningful. Code and model checkpoints are available at https:/github.com/iVMCL/PaCaViT." @default.
- W4386083119 created "2023-08-23" @default.
- W4386083119 creator A5005521128 @default.
- W4386083119 creator A5016812434 @default.
- W4386083119 creator A5064889561 @default.
- W4386083119 creator A5071333372 @default.
- W4386083119 creator A5085488411 @default.
- W4386083119 creator A5091159460 @default.
- W4386083119 date "2023-06-01" @default.
- W4386083119 modified "2023-09-26" @default.
- W4386083119 title "PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers" @default.
- W4386083119 cites W2097117768 @default.
- W4386083119 cites W2108598243 @default.
- W4386083119 cites W2183341477 @default.
- W4386083119 cites W2194775991 @default.
- W4386083119 cites W2507296351 @default.
- W4386083119 cites W2565639579 @default.
- W4386083119 cites W2568841898 @default.
- W4386083119 cites W2910628332 @default.
- W4386083119 cites W2963150697 @default.
- W4386083119 cites W2963163009 @default.
- W4386083119 cites W2995523160 @default.
- W4386083119 cites W2998508940 @default.
- W4386083119 cites W3105238007 @default.
- W4386083119 cites W3108981297 @default.
- W4386083119 cites W3121523901 @default.
- W4386083119 cites W3138516171 @default.
- W4386083119 cites W3176196997 @default.
- W4386083119 cites W3188427387 @default.
- W4386083119 cites W4214493665 @default.
- W4386083119 cites W4252182960 @default.
- W4386083119 cites W4312849330 @default.
- W4386083119 doi "https://doi.org/10.1109/cvpr52729.2023.01781" @default.
- W4386083119 hasPublicationYear "2023" @default.
- W4386083119 type Work @default.
- W4386083119 citedByCount "0" @default.
- W4386083119 crossrefType "proceedings-article" @default.
- W4386083119 hasAuthorship W4386083119A5005521128 @default.
- W4386083119 hasAuthorship W4386083119A5016812434 @default.
- W4386083119 hasAuthorship W4386083119A5064889561 @default.
- W4386083119 hasAuthorship W4386083119A5071333372 @default.
- W4386083119 hasAuthorship W4386083119A5085488411 @default.
- W4386083119 hasAuthorship W4386083119A5091159460 @default.
- W4386083119 hasConcept C119857082 @default.
- W4386083119 hasConcept C121332964 @default.
- W4386083119 hasConcept C126255220 @default.
- W4386083119 hasConcept C153180895 @default.
- W4386083119 hasConcept C154945302 @default.
- W4386083119 hasConcept C165801399 @default.
- W4386083119 hasConcept C33923547 @default.
- W4386083119 hasConcept C41008148 @default.
- W4386083119 hasConcept C41608201 @default.
- W4386083119 hasConcept C62520636 @default.
- W4386083119 hasConcept C66322947 @default.
- W4386083119 hasConcept C73555534 @default.
- W4386083119 hasConcept C81845259 @default.
- W4386083119 hasConcept C89600930 @default.
- W4386083119 hasConceptScore W4386083119C119857082 @default.
- W4386083119 hasConceptScore W4386083119C121332964 @default.
- W4386083119 hasConceptScore W4386083119C126255220 @default.
- W4386083119 hasConceptScore W4386083119C153180895 @default.
- W4386083119 hasConceptScore W4386083119C154945302 @default.
- W4386083119 hasConceptScore W4386083119C165801399 @default.
- W4386083119 hasConceptScore W4386083119C33923547 @default.
- W4386083119 hasConceptScore W4386083119C41008148 @default.
- W4386083119 hasConceptScore W4386083119C41608201 @default.
- W4386083119 hasConceptScore W4386083119C62520636 @default.
- W4386083119 hasConceptScore W4386083119C66322947 @default.
- W4386083119 hasConceptScore W4386083119C73555534 @default.
- W4386083119 hasConceptScore W4386083119C81845259 @default.
- W4386083119 hasConceptScore W4386083119C89600930 @default.
- W4386083119 hasFunder F4320306076 @default.
- W4386083119 hasFunder F4320312530 @default.
- W4386083119 hasFunder F4320338281 @default.
- W4386083119 hasLocation W43860831191 @default.
- W4386083119 hasOpenAccess W4386083119 @default.
- W4386083119 hasPrimaryLocation W43860831191 @default.
- W4386083119 hasRelatedWork W2961085424 @default.
- W4386083119 hasRelatedWork W3046775127 @default.
- W4386083119 hasRelatedWork W3170094116 @default.
- W4386083119 hasRelatedWork W4206148502 @default.
- W4386083119 hasRelatedWork W4210265465 @default.
- W4386083119 hasRelatedWork W4285260836 @default.
- W4386083119 hasRelatedWork W4286629047 @default.
- W4386083119 hasRelatedWork W4306321456 @default.
- W4386083119 hasRelatedWork W4306674287 @default.
- W4386083119 hasRelatedWork W4224009465 @default.
- W4386083119 isParatext "false" @default.
- W4386083119 isRetracted "false" @default.
- W4386083119 workType "article" @default.