Matches in SemOpenAlex for { <https://semopenalex.org/work/W3211772574> ?p ?o ?g. }
- W3211772574 abstract "Video grounding aims to localize the temporal segment corresponding to a sentence query from an untrimmed video. Almost all existing video grounding methods fall into two frameworks: 1) Top-down model: It predefines a set of segment candidates and then conducts segment classification and regression. 2) Bottom-up model: It directly predicts frame-wise probabilities of the referential segment boundaries. However, all these methods are not end-to-end, i.e., they always rely on some time-consuming post-processing steps to refine predictions. To this end, we reformulate video grounding as a set prediction task and propose a novel end-to-end multi-modal Transformer model, dubbed as GTR. Specifically, GTR has two encoders for video and language encoding, and a cross-modal decoder for grounding prediction. To facilitate the end-to-end training, we use a Cubic Embedding layer to transform the raw videos into a set of visual tokens. To better fuse these two modalities in the decoder, we design a new Multi-head Cross-Modal Attention. The whole GTR is optimized via a Many-to-One matching loss. Furthermore, we conduct comprehensive studies to investigate different model design choices. Extensive results on three benchmarks have validated the superiority of GTR. All three typical GTR variants achieve record-breaking performance on all datasets and metrics, with several times faster inference speed." @default.
- W3211772574 created "2021-11-22" @default.
- W3211772574 creator A5002795838 @default.
- W3211772574 creator A5012324763 @default.
- W3211772574 creator A5067002890 @default.
- W3211772574 creator A5068937750 @default.
- W3211772574 creator A5076252609 @default.
- W3211772574 date "2021-01-01" @default.
- W3211772574 modified "2023-10-16" @default.
- W3211772574 title "On Pursuit of Designing Multi-modal Transformer for Video Grounding" @default.
- W3211772574 cites W1522734439 @default.
- W3211772574 cites W2111078031 @default.
- W3211772574 cites W2250539671 @default.
- W3211772574 cites W2798354744 @default.
- W3211772574 cites W2890502146 @default.
- W3211772574 cites W2894280539 @default.
- W3211772574 cites W2897628926 @default.
- W3211772574 cites W2903901502 @default.
- W3211772574 cites W2950541952 @default.
- W3211772574 cites W2962766617 @default.
- W3211772574 cites W2962869524 @default.
- W3211772574 cites W2963017553 @default.
- W3211772574 cites W2963095467 @default.
- W3211772574 cites W2963341956 @default.
- W3211772574 cites W2963393391 @default.
- W3211772574 cites W2963403868 @default.
- W3211772574 cites W2963521717 @default.
- W3211772574 cites W2963916161 @default.
- W3211772574 cites W2964089981 @default.
- W3211772574 cites W2964214371 @default.
- W3211772574 cites W2964216549 @default.
- W3211772574 cites W2964232540 @default.
- W3211772574 cites W2970401629 @default.
- W3211772574 cites W2970898753 @default.
- W3211772574 cites W2997429269 @default.
- W3211772574 cites W2997762001 @default.
- W3211772574 cites W2998495542 @default.
- W3211772574 cites W3025323587 @default.
- W3211772574 cites W3034743747 @default.
- W3211772574 cites W3035022492 @default.
- W3211772574 cites W3035339529 @default.
- W3211772574 cites W3035640828 @default.
- W3211772574 cites W3092739351 @default.
- W3211772574 cites W3096609285 @default.
- W3211772574 cites W3116489684 @default.
- W3211772574 cites W3119686997 @default.
- W3211772574 cites W3122239467 @default.
- W3211772574 cites W3128723389 @default.
- W3211772574 cites W3132890542 @default.
- W3211772574 cites W3138878737 @default.
- W3211772574 cites W3145269263 @default.
- W3211772574 cites W3174364033 @default.
- W3211772574 cites W3174421047 @default.
- W3211772574 cites W3190216403 @default.
- W3211772574 cites W3207520933 @default.
- W3211772574 cites W3212735573 @default.
- W3211772574 cites W3214586131 @default.
- W3211772574 cites W3216763528 @default.
- W3211772574 cites W607748843 @default.
- W3211772574 doi "https://doi.org/10.18653/v1/2021.emnlp-main.773" @default.
- W3211772574 hasPublicationYear "2021" @default.
- W3211772574 type Work @default.
- W3211772574 sameAs 3211772574 @default.
- W3211772574 citedByCount "17" @default.
- W3211772574 countsByYear W32117725742021 @default.
- W3211772574 countsByYear W32117725742022 @default.
- W3211772574 countsByYear W32117725742023 @default.
- W3211772574 crossrefType "proceedings-article" @default.
- W3211772574 hasAuthorship W3211772574A5002795838 @default.
- W3211772574 hasAuthorship W3211772574A5012324763 @default.
- W3211772574 hasAuthorship W3211772574A5067002890 @default.
- W3211772574 hasAuthorship W3211772574A5068937750 @default.
- W3211772574 hasAuthorship W3211772574A5076252609 @default.
- W3211772574 hasBestOaLocation W32117725741 @default.
- W3211772574 hasConcept C111919701 @default.
- W3211772574 hasConcept C118505674 @default.
- W3211772574 hasConcept C121332964 @default.
- W3211772574 hasConcept C153180895 @default.
- W3211772574 hasConcept C154945302 @default.
- W3211772574 hasConcept C165801399 @default.
- W3211772574 hasConcept C168993435 @default.
- W3211772574 hasConcept C177264268 @default.
- W3211772574 hasConcept C185592680 @default.
- W3211772574 hasConcept C188027245 @default.
- W3211772574 hasConcept C199360897 @default.
- W3211772574 hasConcept C31972630 @default.
- W3211772574 hasConcept C38652104 @default.
- W3211772574 hasConcept C41008148 @default.
- W3211772574 hasConcept C41608201 @default.
- W3211772574 hasConcept C48145219 @default.
- W3211772574 hasConcept C62520636 @default.
- W3211772574 hasConcept C66322947 @default.
- W3211772574 hasConcept C71139939 @default.
- W3211772574 hasConceptScore W3211772574C111919701 @default.
- W3211772574 hasConceptScore W3211772574C118505674 @default.
- W3211772574 hasConceptScore W3211772574C121332964 @default.
- W3211772574 hasConceptScore W3211772574C153180895 @default.
- W3211772574 hasConceptScore W3211772574C154945302 @default.
- W3211772574 hasConceptScore W3211772574C165801399 @default.
- W3211772574 hasConceptScore W3211772574C168993435 @default.