Matches in SemOpenAlex for { <https://semopenalex.org/work/W4229444697> ?p ?o ?g. }
- W4229444697 abstract "Video captioning aims to understand the spatio-temporal semantic concept of the video and generate descriptive sentences. The de-facto approach to this task dictates a text generator to learn from offline-extracted motion or appearance features from pre-trained vision models. However, these methods may suffer from the so-called couple drawbacks on both video spatio-temporal representation and sentence generation. For the former, couple means learning spatio-temporal representation in a single model(3DCNN), resulting the problems named disconnection in task/pre-train domain and hard for end-to-end training. As for the latter, couple means treating the generation of visual semantic and syntax-related words equally. To this end, we present D2 - a dual-level decoupled transformer pipeline to solve the above drawbacks: (i) for video spatio-temporal representation, we decouple the process of it into first-spatial-then-temporal paradigm, releasing the potential of using dedicated model(e.g. image-text pre-training) to connect the pre-training and downstream tasks, and makes the entire model end-to-end trainable. (ii) for sentence generation, we propose Syntax-Aware Decoder to dynamically measure the contribution of visual semantic and syntax-related words. Extensive experiments on three widely-used benchmarks (MSVD, MSR-VTT and VATEX) have shown great potential of the proposed D2 and surpassed the previous methods by a large margin in the task of video captioning." @default.
- W4229444697 created "2022-05-11" @default.
- W4229444697 creator A5027308029 @default.
- W4229444697 creator A5044936257 @default.
- W4229444697 creator A5049078993 @default.
- W4229444697 creator A5055917672 @default.
- W4229444697 creator A5074655314 @default.
- W4229444697 creator A5077386763 @default.
- W4229444697 creator A5080036850 @default.
- W4229444697 date "2022-06-27" @default.
- W4229444697 modified "2023-10-16" @default.
- W4229444697 title "Dual-Level Decoupled Transformer for Video Captioning" @default.
- W4229444697 cites W1586939924 @default.
- W4229444697 cites W1601567445 @default.
- W4229444697 cites W2194775991 @default.
- W4229444697 cites W2425121537 @default.
- W4229444697 cites W2584992898 @default.
- W4229444697 cites W2739107216 @default.
- W4229444697 cites W2740388348 @default.
- W4229444697 cites W2755876276 @default.
- W4229444697 cites W2765658575 @default.
- W4229444697 cites W2896878184 @default.
- W4229444697 cites W2905145027 @default.
- W4229444697 cites W2948358897 @default.
- W4229444697 cites W2951390634 @default.
- W4229444697 cites W2962681491 @default.
- W4229444697 cites W2962990649 @default.
- W4229444697 cites W2963084599 @default.
- W4229444697 cites W2963524571 @default.
- W4229444697 cites W2979826702 @default.
- W4229444697 cites W2984862483 @default.
- W4229444697 cites W2988753485 @default.
- W4229444697 cites W2989322838 @default.
- W4229444697 cites W3034221024 @default.
- W4229444697 cites W3034655362 @default.
- W4229444697 cites W3035160838 @default.
- W4229444697 cites W3035365026 @default.
- W4229444697 cites W3035392611 @default.
- W4229444697 cites W3035635319 @default.
- W4229444697 cites W3039060838 @default.
- W4229444697 cites W3096935578 @default.
- W4229444697 cites W3121523901 @default.
- W4229444697 cites W3131500599 @default.
- W4229444697 cites W3176425931 @default.
- W4229444697 cites W3176689360 @default.
- W4229444697 doi "https://doi.org/10.1145/3512527.3531380" @default.
- W4229444697 hasPublicationYear "2022" @default.
- W4229444697 type Work @default.
- W4229444697 citedByCount "1" @default.
- W4229444697 countsByYear W42294446972023 @default.
- W4229444697 crossrefType "proceedings-article" @default.
- W4229444697 hasAuthorship W4229444697A5027308029 @default.
- W4229444697 hasAuthorship W4229444697A5044936257 @default.
- W4229444697 hasAuthorship W4229444697A5049078993 @default.
- W4229444697 hasAuthorship W4229444697A5055917672 @default.
- W4229444697 hasAuthorship W4229444697A5074655314 @default.
- W4229444697 hasAuthorship W4229444697A5077386763 @default.
- W4229444697 hasAuthorship W4229444697A5080036850 @default.
- W4229444697 hasBestOaLocation W42294446972 @default.
- W4229444697 hasConcept C115961682 @default.
- W4229444697 hasConcept C119857082 @default.
- W4229444697 hasConcept C121332964 @default.
- W4229444697 hasConcept C154945302 @default.
- W4229444697 hasConcept C157657479 @default.
- W4229444697 hasConcept C162324750 @default.
- W4229444697 hasConcept C165801399 @default.
- W4229444697 hasConcept C187736073 @default.
- W4229444697 hasConcept C199360897 @default.
- W4229444697 hasConcept C204321447 @default.
- W4229444697 hasConcept C2777530160 @default.
- W4229444697 hasConcept C2780451532 @default.
- W4229444697 hasConcept C28490314 @default.
- W4229444697 hasConcept C41008148 @default.
- W4229444697 hasConcept C43521106 @default.
- W4229444697 hasConcept C62520636 @default.
- W4229444697 hasConcept C66322947 @default.
- W4229444697 hasConcept C774472 @default.
- W4229444697 hasConceptScore W4229444697C115961682 @default.
- W4229444697 hasConceptScore W4229444697C119857082 @default.
- W4229444697 hasConceptScore W4229444697C121332964 @default.
- W4229444697 hasConceptScore W4229444697C154945302 @default.
- W4229444697 hasConceptScore W4229444697C157657479 @default.
- W4229444697 hasConceptScore W4229444697C162324750 @default.
- W4229444697 hasConceptScore W4229444697C165801399 @default.
- W4229444697 hasConceptScore W4229444697C187736073 @default.
- W4229444697 hasConceptScore W4229444697C199360897 @default.
- W4229444697 hasConceptScore W4229444697C204321447 @default.
- W4229444697 hasConceptScore W4229444697C2777530160 @default.
- W4229444697 hasConceptScore W4229444697C2780451532 @default.
- W4229444697 hasConceptScore W4229444697C28490314 @default.
- W4229444697 hasConceptScore W4229444697C41008148 @default.
- W4229444697 hasConceptScore W4229444697C43521106 @default.
- W4229444697 hasConceptScore W4229444697C62520636 @default.
- W4229444697 hasConceptScore W4229444697C66322947 @default.
- W4229444697 hasConceptScore W4229444697C774472 @default.
- W4229444697 hasFunder F4320321001 @default.
- W4229444697 hasFunder F4320336567 @default.
- W4229444697 hasLocation W42294446971 @default.
- W4229444697 hasLocation W42294446972 @default.
- W4229444697 hasOpenAccess W4229444697 @default.