Matches in SemOpenAlex for { <https://semopenalex.org/work/W4298052603> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4298052603 abstract "With the advent of rich visual representations and pre-trained language models, video captioning has seen continuous improvement over time. Despite the performance improvement, video captioning models are prone to hallucination. Hallucination refers to the generation of highly pathological descriptions that are detached from the source material. In video captioning, there are two kinds of hallucination: object and action hallucination. Instead of endeavoring to learn better representations of a video, in this work, we investigate the fundamental sources of the hallucination problem. We identify three main factors: (i) inadequate visual features extracted from pre-trained models, (ii) improper influences of source and target contexts during multi-modal fusion, and (iii) exposure bias in the training strategy. To alleviate these problems, we propose two robust solutions: (a) the introduction of auxiliary heads trained in multi-label settings on top of the extracted visual features and (b) the addition of context gates, which dynamically select the features during fusion. The standard evaluation metrics for video captioning measures similarity with ground truth captions and do not adequately capture object and action relevance. To this end, we propose a new metric, COAHA (caption object and action hallucination assessment), which assesses the degree of hallucination. Our method achieves state-of-the-art performance on the MSR-Video to Text (MSR-VTT) and the Microsoft Research Video Description Corpus (MSVD) datasets, especially by a massive margin in CIDEr score." @default.
- W4298052603 created "2022-10-01" @default.
- W4298052603 creator A5000405859 @default.
- W4298052603 creator A5021779406 @default.
- W4298052603 date "2022-09-28" @default.
- W4298052603 modified "2023-09-27" @default.
- W4298052603 title "Thinking Hallucination for Video Captioning" @default.
- W4298052603 doi "https://doi.org/10.48550/arxiv.2209.13853" @default.
- W4298052603 hasPublicationYear "2022" @default.
- W4298052603 type Work @default.
- W4298052603 citedByCount "0" @default.
- W4298052603 crossrefType "posted-content" @default.
- W4298052603 hasAuthorship W4298052603A5000405859 @default.
- W4298052603 hasAuthorship W4298052603A5021779406 @default.
- W4298052603 hasBestOaLocation W42980526031 @default.
- W4298052603 hasConcept C103278499 @default.
- W4298052603 hasConcept C115961682 @default.
- W4298052603 hasConcept C118552586 @default.
- W4298052603 hasConcept C119857082 @default.
- W4298052603 hasConcept C121332964 @default.
- W4298052603 hasConcept C151730666 @default.
- W4298052603 hasConcept C154945302 @default.
- W4298052603 hasConcept C15744967 @default.
- W4298052603 hasConcept C157657479 @default.
- W4298052603 hasConcept C162324750 @default.
- W4298052603 hasConcept C176217482 @default.
- W4298052603 hasConcept C204321447 @default.
- W4298052603 hasConcept C21547014 @default.
- W4298052603 hasConcept C2779343474 @default.
- W4298052603 hasConcept C2780791683 @default.
- W4298052603 hasConcept C2781238097 @default.
- W4298052603 hasConcept C2908998935 @default.
- W4298052603 hasConcept C41008148 @default.
- W4298052603 hasConcept C62520636 @default.
- W4298052603 hasConcept C774472 @default.
- W4298052603 hasConcept C86803240 @default.
- W4298052603 hasConceptScore W4298052603C103278499 @default.
- W4298052603 hasConceptScore W4298052603C115961682 @default.
- W4298052603 hasConceptScore W4298052603C118552586 @default.
- W4298052603 hasConceptScore W4298052603C119857082 @default.
- W4298052603 hasConceptScore W4298052603C121332964 @default.
- W4298052603 hasConceptScore W4298052603C151730666 @default.
- W4298052603 hasConceptScore W4298052603C154945302 @default.
- W4298052603 hasConceptScore W4298052603C15744967 @default.
- W4298052603 hasConceptScore W4298052603C157657479 @default.
- W4298052603 hasConceptScore W4298052603C162324750 @default.
- W4298052603 hasConceptScore W4298052603C176217482 @default.
- W4298052603 hasConceptScore W4298052603C204321447 @default.
- W4298052603 hasConceptScore W4298052603C21547014 @default.
- W4298052603 hasConceptScore W4298052603C2779343474 @default.
- W4298052603 hasConceptScore W4298052603C2780791683 @default.
- W4298052603 hasConceptScore W4298052603C2781238097 @default.
- W4298052603 hasConceptScore W4298052603C2908998935 @default.
- W4298052603 hasConceptScore W4298052603C41008148 @default.
- W4298052603 hasConceptScore W4298052603C62520636 @default.
- W4298052603 hasConceptScore W4298052603C774472 @default.
- W4298052603 hasConceptScore W4298052603C86803240 @default.
- W4298052603 hasLocation W42980526031 @default.
- W4298052603 hasOpenAccess W4298052603 @default.
- W4298052603 hasPrimaryLocation W42980526031 @default.
- W4298052603 hasRelatedWork W2013066329 @default.
- W4298052603 hasRelatedWork W2045954434 @default.
- W4298052603 hasRelatedWork W2505639562 @default.
- W4298052603 hasRelatedWork W2791000049 @default.
- W4298052603 hasRelatedWork W3107474891 @default.
- W4298052603 hasRelatedWork W3202965415 @default.
- W4298052603 hasRelatedWork W4281690070 @default.
- W4298052603 hasRelatedWork W4286911391 @default.
- W4298052603 hasRelatedWork W4298379512 @default.
- W4298052603 hasRelatedWork W4299291873 @default.
- W4298052603 isParatext "false" @default.
- W4298052603 isRetracted "false" @default.
- W4298052603 workType "article" @default.