Matches in SemOpenAlex for { <https://semopenalex.org/work/W4375801629> ?p ?o ?g. }
- W4375801629 endingPage "580" @default.
- W4375801629 startingPage "569" @default.
- W4375801629 abstract "Temporal action localization plays an important role in video analysis, which aims to localize and classify actions in untrimmed videos. Previous methods often predict actions on a feature space of a single temporal scale. However, the temporal features of a low-level scale lack sufficient semantics for action classification, while a high-level scale cannot provide the rich details of the action boundaries. In addition, the long-range dependencies of video frames are often ignored. To address these issues, a novel multitemporal-scale spatial–temporal transformer (MSST) network is proposed for temporal action localization, which predicts actions on a feature space of multiple temporal scales. Specifically, we first use refined feature pyramids of different scales to pass semantics from high-level scales to low-level scales. Second, to establish the long temporal scale of the entire video, we use a spatial–temporal transformer encoder to capture the long-range dependencies of video frames. Then, the refined features with long-range dependencies are fed into a classifier for coarse action prediction. Finally, to further improve the prediction accuracy, we propose a frame-level self-attention module to refine the classification and boundaries of each action instance. Most importantly, these three modules are jointly explored in a unified framework, and MSST has an anchor-free and end-to-end architecture. Extensive experiments show that the proposed method can outperform state-of-the-art approaches on the THUMOS14 dataset and achieve comparable performance on the ActivityNet1.3 dataset. Compared with A2Net (TIP20, Avg{0.3:0.7}), Sub-Action (CSVT2022, Avg{0.1:0.5}), and AFSD (CVPR21, Avg{0.3:0.7}) on the THUMOS14 dataset, the proposed method can achieve improvements of 12.6%, 17.4%, and 2.2%, respectively." @default.
- W4375801629 created "2023-05-09" @default.
- W4375801629 creator A5042406159 @default.
- W4375801629 creator A5046690683 @default.
- W4375801629 creator A5048947531 @default.
- W4375801629 creator A5049674712 @default.
- W4375801629 creator A5055627037 @default.
- W4375801629 creator A5068843001 @default.
- W4375801629 creator A5080551725 @default.
- W4375801629 date "2023-06-01" @default.
- W4375801629 modified "2023-10-16" @default.
- W4375801629 title "A Multitemporal Scale and Spatial–Temporal Transformer Network for Temporal Action Localization" @default.
- W4375801629 cites W1536680647 @default.
- W4375801629 cites W1927052826 @default.
- W4375801629 cites W2072070523 @default.
- W4375801629 cites W2336403884 @default.
- W4375801629 cites W2344034899 @default.
- W4375801629 cites W2593722617 @default.
- W4375801629 cites W2597958930 @default.
- W4375801629 cites W2884561390 @default.
- W4375801629 cites W2905706796 @default.
- W4375801629 cites W2952435096 @default.
- W4375801629 cites W2962677524 @default.
- W4375801629 cites W2962766617 @default.
- W4375801629 cites W2962876901 @default.
- W4375801629 cites W2963247196 @default.
- W4375801629 cites W2963524571 @default.
- W4375801629 cites W2964107628 @default.
- W4375801629 cites W2964121718 @default.
- W4375801629 cites W2964214371 @default.
- W4375801629 cites W2964216549 @default.
- W4375801629 cites W2971915722 @default.
- W4375801629 cites W2983918066 @default.
- W4375801629 cites W2986407524 @default.
- W4375801629 cites W2997410994 @default.
- W4375801629 cites W2997706915 @default.
- W4375801629 cites W2999794487 @default.
- W4375801629 cites W3007751154 @default.
- W4375801629 cites W3012573144 @default.
- W4375801629 cites W3034623254 @default.
- W4375801629 cites W3069380482 @default.
- W4375801629 cites W3100481960 @default.
- W4375801629 cites W3106041614 @default.
- W4375801629 cites W3110589170 @default.
- W4375801629 cites W3110854813 @default.
- W4375801629 cites W3111420154 @default.
- W4375801629 cites W3128626728 @default.
- W4375801629 cites W3137592945 @default.
- W4375801629 cites W3171707680 @default.
- W4375801629 cites W3172837290 @default.
- W4375801629 cites W3174569083 @default.
- W4375801629 cites W3176444885 @default.
- W4375801629 cites W3176641851 @default.
- W4375801629 cites W3202076256 @default.
- W4375801629 cites W3215017813 @default.
- W4375801629 cites W4200630755 @default.
- W4375801629 cites W4205260486 @default.
- W4375801629 cites W4213183958 @default.
- W4375801629 cites W4214612132 @default.
- W4375801629 cites W4221160129 @default.
- W4375801629 cites W4230270698 @default.
- W4375801629 cites W4284965682 @default.
- W4375801629 cites W4312305353 @default.
- W4375801629 cites W4313555695 @default.
- W4375801629 cites W639708223 @default.
- W4375801629 doi "https://doi.org/10.1109/thms.2023.3266037" @default.
- W4375801629 hasPublicationYear "2023" @default.
- W4375801629 type Work @default.
- W4375801629 citedByCount "0" @default.
- W4375801629 crossrefType "journal-article" @default.
- W4375801629 hasAuthorship W4375801629A5042406159 @default.
- W4375801629 hasAuthorship W4375801629A5046690683 @default.
- W4375801629 hasAuthorship W4375801629A5048947531 @default.
- W4375801629 hasAuthorship W4375801629A5049674712 @default.
- W4375801629 hasAuthorship W4375801629A5055627037 @default.
- W4375801629 hasAuthorship W4375801629A5068843001 @default.
- W4375801629 hasAuthorship W4375801629A5080551725 @default.
- W4375801629 hasConcept C111919701 @default.
- W4375801629 hasConcept C118505674 @default.
- W4375801629 hasConcept C121332964 @default.
- W4375801629 hasConcept C138885662 @default.
- W4375801629 hasConcept C153180895 @default.
- W4375801629 hasConcept C154945302 @default.
- W4375801629 hasConcept C165801399 @default.
- W4375801629 hasConcept C18903297 @default.
- W4375801629 hasConcept C205649164 @default.
- W4375801629 hasConcept C2776401178 @default.
- W4375801629 hasConcept C2777489503 @default.
- W4375801629 hasConcept C2778755073 @default.
- W4375801629 hasConcept C41008148 @default.
- W4375801629 hasConcept C41895202 @default.
- W4375801629 hasConcept C58640448 @default.
- W4375801629 hasConcept C62520636 @default.
- W4375801629 hasConcept C66322947 @default.
- W4375801629 hasConcept C86803240 @default.
- W4375801629 hasConcept C95623464 @default.
- W4375801629 hasConceptScore W4375801629C111919701 @default.
- W4375801629 hasConceptScore W4375801629C118505674 @default.