Matches in SemOpenAlex for { <https://semopenalex.org/work/W3214192224> ?p ?o ?g. }
- W3214192224 abstract "Current metrics for video captioning are mostly based on the text-level comparison between reference and candidate captions. However, they have some insuperable drawbacks, e.g., they cannot handle videos without references, and they may result in biased evaluation due to the one-to-many nature of video-to-text and the neglect of visual relevance. From the human evaluator's viewpoint, a high-quality caption should be consistent with the provided video, but not necessarily be similar to the reference in literal or semantics. Inspired by human evaluation, we propose EMScore (Embedding Matching-based score), a novel reference-free metric for video captioning, which directly measures similarity between video and candidate captions. Benefit from the recent development of large-scale pre-training models, we exploit a well pre-trained vision-language model to extract visual and linguistic embeddings for computing EMScore. Specifically, EMScore combines matching scores of both coarse-grained (video and caption) and fine-grained (frames and words) levels, which takes the overall understanding and detailed characteristics of the video into account. Furthermore, considering the potential information gain, EMScore can be flexibly extended to the conditions where human-labeled references are available. Last but not least, we collect VATEX-EVAL and ActivityNet-FOIl datasets to systematically evaluate the existing metrics. VATEX-EVAL experiments demonstrate that EMScore has higher human correlation and lower reference dependency. ActivityNet-FOIL experiment verifies that EMScore can effectively identify hallucinating captions. The datasets will be released to facilitate the development of video captioning metrics. The code is available at: https://github.com/ShiYaya/emscore." @default.
- W3214192224 created "2021-11-22" @default.
- W3214192224 creator A5003217535 @default.
- W3214192224 creator A5005377211 @default.
- W3214192224 creator A5019915855 @default.
- W3214192224 creator A5047332568 @default.
- W3214192224 creator A5064305881 @default.
- W3214192224 creator A5066544410 @default.
- W3214192224 creator A5083581319 @default.
- W3214192224 date "2022-06-01" @default.
- W3214192224 modified "2023-10-12" @default.
- W3214192224 title "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching" @default.
- W3214192224 cites W1956340063 @default.
- W3214192224 cites W2101105183 @default.
- W3214192224 cites W2745461083 @default.
- W3214192224 cites W2886641317 @default.
- W3214192224 cites W2887585070 @default.
- W3214192224 cites W2938603906 @default.
- W3214192224 cites W2962735233 @default.
- W3214192224 cites W2962964995 @default.
- W3214192224 cites W2970858040 @default.
- W3214192224 cites W2984121207 @default.
- W3214192224 cites W2989322838 @default.
- W3214192224 cites W3034417909 @default.
- W3214192224 cites W3035365026 @default.
- W3214192224 cites W3035372819 @default.
- W3214192224 cites W3035635319 @default.
- W3214192224 cites W3104379732 @default.
- W3214192224 cites W3105232955 @default.
- W3214192224 cites W3134875898 @default.
- W3214192224 cites W3153469116 @default.
- W3214192224 cites W3174151851 @default.
- W3214192224 cites W3175939205 @default.
- W3214192224 doi "https://doi.org/10.1109/cvpr52688.2022.01740" @default.
- W3214192224 hasPublicationYear "2022" @default.
- W3214192224 type Work @default.
- W3214192224 sameAs 3214192224 @default.
- W3214192224 citedByCount "4" @default.
- W3214192224 countsByYear W32141922242023 @default.
- W3214192224 crossrefType "proceedings-article" @default.
- W3214192224 hasAuthorship W3214192224A5003217535 @default.
- W3214192224 hasAuthorship W3214192224A5005377211 @default.
- W3214192224 hasAuthorship W3214192224A5019915855 @default.
- W3214192224 hasAuthorship W3214192224A5047332568 @default.
- W3214192224 hasAuthorship W3214192224A5064305881 @default.
- W3214192224 hasAuthorship W3214192224A5066544410 @default.
- W3214192224 hasAuthorship W3214192224A5083581319 @default.
- W3214192224 hasBestOaLocation W32141922242 @default.
- W3214192224 hasConcept C103278499 @default.
- W3214192224 hasConcept C105795698 @default.
- W3214192224 hasConcept C115961682 @default.
- W3214192224 hasConcept C121332964 @default.
- W3214192224 hasConcept C154945302 @default.
- W3214192224 hasConcept C157657479 @default.
- W3214192224 hasConcept C162324750 @default.
- W3214192224 hasConcept C165064840 @default.
- W3214192224 hasConcept C165696696 @default.
- W3214192224 hasConcept C176217482 @default.
- W3214192224 hasConcept C184337299 @default.
- W3214192224 hasConcept C199360897 @default.
- W3214192224 hasConcept C204321447 @default.
- W3214192224 hasConcept C21547014 @default.
- W3214192224 hasConcept C23123220 @default.
- W3214192224 hasConcept C2778755073 @default.
- W3214192224 hasConcept C28490314 @default.
- W3214192224 hasConcept C33923547 @default.
- W3214192224 hasConcept C38652104 @default.
- W3214192224 hasConcept C41008148 @default.
- W3214192224 hasConcept C41608201 @default.
- W3214192224 hasConcept C62520636 @default.
- W3214192224 hasConceptScore W3214192224C103278499 @default.
- W3214192224 hasConceptScore W3214192224C105795698 @default.
- W3214192224 hasConceptScore W3214192224C115961682 @default.
- W3214192224 hasConceptScore W3214192224C121332964 @default.
- W3214192224 hasConceptScore W3214192224C154945302 @default.
- W3214192224 hasConceptScore W3214192224C157657479 @default.
- W3214192224 hasConceptScore W3214192224C162324750 @default.
- W3214192224 hasConceptScore W3214192224C165064840 @default.
- W3214192224 hasConceptScore W3214192224C165696696 @default.
- W3214192224 hasConceptScore W3214192224C176217482 @default.
- W3214192224 hasConceptScore W3214192224C184337299 @default.
- W3214192224 hasConceptScore W3214192224C199360897 @default.
- W3214192224 hasConceptScore W3214192224C204321447 @default.
- W3214192224 hasConceptScore W3214192224C21547014 @default.
- W3214192224 hasConceptScore W3214192224C23123220 @default.
- W3214192224 hasConceptScore W3214192224C2778755073 @default.
- W3214192224 hasConceptScore W3214192224C28490314 @default.
- W3214192224 hasConceptScore W3214192224C33923547 @default.
- W3214192224 hasConceptScore W3214192224C38652104 @default.
- W3214192224 hasConceptScore W3214192224C41008148 @default.
- W3214192224 hasConceptScore W3214192224C41608201 @default.
- W3214192224 hasConceptScore W3214192224C62520636 @default.
- W3214192224 hasFunder F4320322919 @default.
- W3214192224 hasLocation W32141922241 @default.
- W3214192224 hasLocation W32141922242 @default.
- W3214192224 hasLocation W32141922243 @default.
- W3214192224 hasOpenAccess W3214192224 @default.
- W3214192224 hasPrimaryLocation W32141922241 @default.
- W3214192224 hasRelatedWork W2767048014 @default.
- W3214192224 hasRelatedWork W2775506363 @default.