Matches in SemOpenAlex for { <https://semopenalex.org/work/W3110662498> ?p ?o ?g. }
- W3110662498 abstract "Recent vision-language (VL) studies have shown remarkable progress by learning generic representations from massive image-text pairs with transformer models and then fine-tuning on downstream VL tasks. While existing research has been focused on achieving high accuracy with large pre-trained models, building a lightweight model is of great value in practice but is less explored. In this paper, we propose a smaller and faster VL model, MiniVLM, which can be finetuned with good performance on various downstream tasks like its larger counterpart. MiniVLM consists of two modules, a vision feature extractor and a transformer-based vision-language fusion module. We design a Two-stage Efficient feature Extractor (TEE), inspired by the one-stage EfficientDet network, to significantly reduce the time cost of visual feature extraction by $95%$, compared to a baseline model. We adopt the MiniLM structure to reduce the computation cost of the transformer module after comparing different compact BERT models. In addition, we improve the MiniVLM pre-training by adding $7M$ Open Images data, which are pseudo-labeled by a state-of-the-art captioning model. We also pre-train with high-quality image tags obtained from a strong tagging model to enhance cross-modality alignment. The large models are used offline without adding any overhead in fine-tuning and inference. With the above design choices, our MiniVLM reduces the model size by $73%$ and the inference time cost by $94%$ while being able to retain $94-97%$ of the accuracy on multiple VL tasks. We hope that MiniVLM helps ease the use of the state-of-the-art VL research for on-the-edge applications." @default.
- W3110662498 created "2020-12-21" @default.
- W3110662498 creator A5021140826 @default.
- W3110662498 creator A5025592561 @default.
- W3110662498 creator A5027851405 @default.
- W3110662498 creator A5047233371 @default.
- W3110662498 creator A5048295582 @default.
- W3110662498 creator A5059735251 @default.
- W3110662498 creator A5071798264 @default.
- W3110662498 creator A5073435344 @default.
- W3110662498 date "2020-12-13" @default.
- W3110662498 modified "2023-09-23" @default.
- W3110662498 title "MiniVLM: A Smaller and Faster Vision-Language Model." @default.
- W3110662498 cites W1821462560 @default.
- W3110662498 cites W1905882502 @default.
- W3110662498 cites W1956340063 @default.
- W3110662498 cites W2101105183 @default.
- W3110662498 cites W2109586012 @default.
- W3110662498 cites W2117539524 @default.
- W3110662498 cites W2133459682 @default.
- W3110662498 cites W2185175083 @default.
- W3110662498 cites W2194775991 @default.
- W3110662498 cites W2277195237 @default.
- W3110662498 cites W2506483933 @default.
- W3110662498 cites W2565639579 @default.
- W3110662498 cites W2612624696 @default.
- W3110662498 cites W2613718673 @default.
- W3110662498 cites W2745461083 @default.
- W3110662498 cites W2806070179 @default.
- W3110662498 cites W2884561390 @default.
- W3110662498 cites W2886641317 @default.
- W3110662498 cites W2908510526 @default.
- W3110662498 cites W2936404177 @default.
- W3110662498 cites W2943152387 @default.
- W3110662498 cites W2952228917 @default.
- W3110662498 cites W2955425717 @default.
- W3110662498 cites W2962772649 @default.
- W3110662498 cites W2963037989 @default.
- W3110662498 cites W2963263347 @default.
- W3110662498 cites W2963341956 @default.
- W3110662498 cites W2963403868 @default.
- W3110662498 cites W2963518342 @default.
- W3110662498 cites W2963530300 @default.
- W3110662498 cites W2964444661 @default.
- W3110662498 cites W2968124245 @default.
- W3110662498 cites W2970231061 @default.
- W3110662498 cites W2970608575 @default.
- W3110662498 cites W2974875810 @default.
- W3110662498 cites W2978017171 @default.
- W3110662498 cites W2981927700 @default.
- W3110662498 cites W2983943451 @default.
- W3110662498 cites W2986670728 @default.
- W3110662498 cites W2995460200 @default.
- W3110662498 cites W2997591391 @default.
- W3110662498 cites W2998356391 @default.
- W3110662498 cites W3008374555 @default.
- W3110662498 cites W3014611590 @default.
- W3110662498 cites W3026176584 @default.
- W3110662498 cites W3034399919 @default.
- W3110662498 cites W3034971973 @default.
- W3110662498 cites W3035396860 @default.
- W3110662498 cites W3035497460 @default.
- W3110662498 cites W3035652667 @default.
- W3110662498 cites W3035694605 @default.
- W3110662498 cites W3090449556 @default.
- W3110662498 cites W3091588028 @default.
- W3110662498 cites W3106250896 @default.
- W3110662498 cites W3121480429 @default.
- W3110662498 cites W3138819159 @default.
- W3110662498 cites W3190043560 @default.
- W3110662498 hasPublicationYear "2020" @default.
- W3110662498 type Work @default.
- W3110662498 sameAs 3110662498 @default.
- W3110662498 citedByCount "5" @default.
- W3110662498 countsByYear W31106624982021 @default.
- W3110662498 countsByYear W31106624982022 @default.
- W3110662498 crossrefType "posted-content" @default.
- W3110662498 hasAuthorship W3110662498A5021140826 @default.
- W3110662498 hasAuthorship W3110662498A5025592561 @default.
- W3110662498 hasAuthorship W3110662498A5027851405 @default.
- W3110662498 hasAuthorship W3110662498A5047233371 @default.
- W3110662498 hasAuthorship W3110662498A5048295582 @default.
- W3110662498 hasAuthorship W3110662498A5059735251 @default.
- W3110662498 hasAuthorship W3110662498A5071798264 @default.
- W3110662498 hasAuthorship W3110662498A5073435344 @default.
- W3110662498 hasConcept C113775141 @default.
- W3110662498 hasConcept C115961682 @default.
- W3110662498 hasConcept C119857082 @default.
- W3110662498 hasConcept C121332964 @default.
- W3110662498 hasConcept C137293760 @default.
- W3110662498 hasConcept C154945302 @default.
- W3110662498 hasConcept C157657479 @default.
- W3110662498 hasConcept C165801399 @default.
- W3110662498 hasConcept C199360897 @default.
- W3110662498 hasConcept C2776214188 @default.
- W3110662498 hasConcept C2779960059 @default.
- W3110662498 hasConcept C41008148 @default.
- W3110662498 hasConcept C52622490 @default.
- W3110662498 hasConcept C62520636 @default.
- W3110662498 hasConcept C66322947 @default.