SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387385612> ?p ?o ?g. }

Showing items 1 to 60 of 60 with 100 items per page.

W4387385612 endingPage "12" @default.
W4387385612 startingPage "1" @default.
W4387385612 abstract "Video captioning is a more challenging task compared to image captioning, primarily due to differences in content density. Video data contains redundant visual content, making it difficult for captioners to generalize diverse content and avoid being misled by irrelevant elements. Moreover, redundant content is not well-trimmed to match the corresponding visual semantics in the ground truth, further increasing the difficulty of video captioning. Current research in video captioning predominantly focuses on captioner design, neglecting the impact of content density on captioner performance. Considering the differences between videos and images, there exists an another line to improve video captioning by leveraging concise and easily-learned image samples to further diversify video samples. This modification to content density compels the captioner to learn more effectively against redundancy and ambiguity. In this paper, we propose a novel approach called <underline xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>I</u> mage- <underline xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>Co</u> mpounded learning for video <underline xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>Cap</u> tioners (IcoCap) to facilitate better learning of complex video semantics. IcoCap comprises two components: the Image-Video Compounding Strategy (ICS) and Visual-Semantic Guided Captioning (VGC). ICS compounds easily-learned image semantics into video semantics, further diversifying video content and prompting the network to generalize contents in a more diverse sample. Besides, learning with the sample compounded with image contents, the captioner is compelled to better extract valuable video cues in the presence of straightforward image semantics. This helps the captioner further focus on relevant information while filtering out extraneous content. Then, VGC guides the network in flexibly learning ground truth captions based on the compounded samples, helping to mitigate the mismatch between the ground truth and ambiguous semantics in video samples. Our experimental results demonstrate the effectiveness of IcoCap in improving the learning of video captioners. Applied to the widely-used MSVD, MSR-VTT, and VATEX datasets, our approach achieves competitive or superior results compared to state-of-the-art methods, illustrating its capacity to handle the redundant and ambiguous video data." @default.
W4387385612 created "2023-10-06" @default.
W4387385612 creator A5005421447 @default.
W4387385612 creator A5043617790 @default.
W4387385612 creator A5083809031 @default.
W4387385612 creator A5092091508 @default.
W4387385612 date "2023-01-01" @default.
W4387385612 modified "2023-10-16" @default.
W4387385612 title "IcoCap: Improving Video Captioning by Compounding Images" @default.
W4387385612 doi "https://doi.org/10.1109/tmm.2023.3322329" @default.
W4387385612 hasPublicationYear "2023" @default.
W4387385612 type Work @default.
W4387385612 citedByCount "0" @default.
W4387385612 crossrefType "journal-article" @default.
W4387385612 hasAuthorship W4387385612A5005421447 @default.
W4387385612 hasAuthorship W4387385612A5043617790 @default.
W4387385612 hasAuthorship W4387385612A5083809031 @default.
W4387385612 hasAuthorship W4387385612A5092091508 @default.
W4387385612 hasConcept C111919701 @default.
W4387385612 hasConcept C115961682 @default.
W4387385612 hasConcept C152124472 @default.
W4387385612 hasConcept C154945302 @default.
W4387385612 hasConcept C157657479 @default.
W4387385612 hasConcept C184337299 @default.
W4387385612 hasConcept C199360897 @default.
W4387385612 hasConcept C204321447 @default.
W4387385612 hasConcept C23123220 @default.
W4387385612 hasConcept C36464697 @default.
W4387385612 hasConcept C41008148 @default.
W4387385612 hasConcept C49774154 @default.
W4387385612 hasConceptScore W4387385612C111919701 @default.
W4387385612 hasConceptScore W4387385612C115961682 @default.
W4387385612 hasConceptScore W4387385612C152124472 @default.
W4387385612 hasConceptScore W4387385612C154945302 @default.
W4387385612 hasConceptScore W4387385612C157657479 @default.
W4387385612 hasConceptScore W4387385612C184337299 @default.
W4387385612 hasConceptScore W4387385612C199360897 @default.
W4387385612 hasConceptScore W4387385612C204321447 @default.
W4387385612 hasConceptScore W4387385612C23123220 @default.
W4387385612 hasConceptScore W4387385612C36464697 @default.
W4387385612 hasConceptScore W4387385612C41008148 @default.
W4387385612 hasConceptScore W4387385612C49774154 @default.
W4387385612 hasLocation W43873856121 @default.
W4387385612 hasOpenAccess W4387385612 @default.
W4387385612 hasPrimaryLocation W43873856121 @default.
W4387385612 hasRelatedWork W1938708284 @default.
W4387385612 hasRelatedWork W2775506363 @default.
W4387385612 hasRelatedWork W2949362007 @default.
W4387385612 hasRelatedWork W3122720459 @default.
W4387385612 hasRelatedWork W3164229987 @default.
W4387385612 hasRelatedWork W3215212336 @default.
W4387385612 hasRelatedWork W4210416330 @default.
W4387385612 hasRelatedWork W4290852288 @default.
W4387385612 hasRelatedWork W4298897568 @default.
W4387385612 hasRelatedWork W4380190185 @default.
W4387385612 isParatext "false" @default.
W4387385612 isRetracted "false" @default.
W4387385612 workType "article" @default.