Matches in SemOpenAlex for { <https://semopenalex.org/work/W4313015773> ?p ?o ?g. }
- W4313015773 endingPage "691" @default.
- W4313015773 startingPage "675" @default.
- W4313015773 abstract "Image pre-training, the current de-facto paradigm for a wide range of visual tasks, is generally less favored in the field of video recognition. By contrast, a common strategy is to directly train with spatiotemporal convolutional neural networks (CNNs) from scratch. Nonetheless, interestingly, by taking a closer look at these from-scratch learned CNNs, we note there exist certain 3D kernels that exhibit much stronger appearance modeling ability than others, arguably suggesting appearance information is already well disentangled in learning. Inspired by this observation, we hypothesize that the key to effectively leveraging image pre-training lies in the decomposition of learning spatial and temporal features, and revisiting image pre-training as the appearance prior to initializing 3D kernels. In addition, we propose Spatial-Temporal Separable (STS) convolution, which explicitly splits the feature channels into spatial and temporal groups, to further enable a more thorough decomposition of spatiotemporal features for fine-tuning 3D CNNs. Our experiments show that simply replacing 3D convolution with STS notably improves a wide range of 3D CNNs without increasing parameters and computation on both Kinetics-400 and Something-Something V2. Moreover, this new training pipeline consistently achieves better results on video recognition with significant speedup. For instance, we achieve $$+0.6%$$ top-1 of Slowfast on Kinetics-400 over the strong 256-epoch 128-GPU baseline while fine-tuning for only 50 epochs with 4 GPUs. The code and models are available at https://github.com/UCSC-VLAA/Image-Pretraining-for-Video ." @default.
- W4313015773 created "2023-01-05" @default.
- W4313015773 creator A5022344556 @default.
- W4313015773 creator A5030048445 @default.
- W4313015773 creator A5034551921 @default.
- W4313015773 creator A5067640436 @default.
- W4313015773 creator A5077206511 @default.
- W4313015773 creator A5077260423 @default.
- W4313015773 creator A5086706224 @default.
- W4313015773 date "2022-01-01" @default.
- W4313015773 modified "2023-10-03" @default.
- W4313015773 title "In Defense of Image Pre-Training for Spatiotemporal Recognition" @default.
- W4313015773 cites W1522734439 @default.
- W4313015773 cites W1536680647 @default.
- W4313015773 cites W1903029394 @default.
- W4313015773 cites W2097117768 @default.
- W4313015773 cites W2108598243 @default.
- W4313015773 cites W2194775991 @default.
- W4313015773 cites W2507009361 @default.
- W4313015773 cites W2625366777 @default.
- W4313015773 cites W2770804203 @default.
- W4313015773 cites W2806331055 @default.
- W4313015773 cites W2883429621 @default.
- W4313015773 cites W2962843773 @default.
- W4313015773 cites W2962858109 @default.
- W4313015773 cites W2963091558 @default.
- W4313015773 cites W2963150697 @default.
- W4313015773 cites W2963155035 @default.
- W4313015773 cites W2963319519 @default.
- W4313015773 cites W2963524571 @default.
- W4313015773 cites W2963820951 @default.
- W4313015773 cites W2981385151 @default.
- W4313015773 cites W2982220924 @default.
- W4313015773 cites W2984287396 @default.
- W4313015773 cites W2990152177 @default.
- W4313015773 cites W2990503944 @default.
- W4313015773 cites W2991391304 @default.
- W4313015773 cites W2992308087 @default.
- W4313015773 cites W2996901793 @default.
- W4313015773 cites W3034572008 @default.
- W4313015773 cites W3034600407 @default.
- W4313015773 cites W3034768625 @default.
- W4313015773 cites W3035104321 @default.
- W4313015773 cites W3035303837 @default.
- W4313015773 cites W3035413240 @default.
- W4313015773 cites W3035619757 @default.
- W4313015773 cites W3035682985 @default.
- W4313015773 cites W3107634219 @default.
- W4313015773 cites W3173621652 @default.
- W4313015773 cites W3175528717 @default.
- W4313015773 cites W4214612132 @default.
- W4313015773 cites W4214614183 @default.
- W4313015773 cites W4312560592 @default.
- W4313015773 doi "https://doi.org/10.1007/978-3-031-19806-9_39" @default.
- W4313015773 hasPublicationYear "2022" @default.
- W4313015773 type Work @default.
- W4313015773 citedByCount "0" @default.
- W4313015773 crossrefType "book-chapter" @default.
- W4313015773 hasAuthorship W4313015773A5022344556 @default.
- W4313015773 hasAuthorship W4313015773A5030048445 @default.
- W4313015773 hasAuthorship W4313015773A5034551921 @default.
- W4313015773 hasAuthorship W4313015773A5067640436 @default.
- W4313015773 hasAuthorship W4313015773A5077206511 @default.
- W4313015773 hasAuthorship W4313015773A5077260423 @default.
- W4313015773 hasAuthorship W4313015773A5086706224 @default.
- W4313015773 hasBestOaLocation W43130157732 @default.
- W4313015773 hasConcept C108583219 @default.
- W4313015773 hasConcept C111919701 @default.
- W4313015773 hasConcept C114466953 @default.
- W4313015773 hasConcept C115961682 @default.
- W4313015773 hasConcept C138885662 @default.
- W4313015773 hasConcept C153180895 @default.
- W4313015773 hasConcept C154945302 @default.
- W4313015773 hasConcept C159985019 @default.
- W4313015773 hasConcept C173608175 @default.
- W4313015773 hasConcept C192562407 @default.
- W4313015773 hasConcept C199360897 @default.
- W4313015773 hasConcept C204323151 @default.
- W4313015773 hasConcept C2776401178 @default.
- W4313015773 hasConcept C2781235140 @default.
- W4313015773 hasConcept C41008148 @default.
- W4313015773 hasConcept C41895202 @default.
- W4313015773 hasConcept C43521106 @default.
- W4313015773 hasConcept C45347329 @default.
- W4313015773 hasConcept C50644808 @default.
- W4313015773 hasConcept C68339613 @default.
- W4313015773 hasConcept C81363708 @default.
- W4313015773 hasConceptScore W4313015773C108583219 @default.
- W4313015773 hasConceptScore W4313015773C111919701 @default.
- W4313015773 hasConceptScore W4313015773C114466953 @default.
- W4313015773 hasConceptScore W4313015773C115961682 @default.
- W4313015773 hasConceptScore W4313015773C138885662 @default.
- W4313015773 hasConceptScore W4313015773C153180895 @default.
- W4313015773 hasConceptScore W4313015773C154945302 @default.
- W4313015773 hasConceptScore W4313015773C159985019 @default.
- W4313015773 hasConceptScore W4313015773C173608175 @default.
- W4313015773 hasConceptScore W4313015773C192562407 @default.
- W4313015773 hasConceptScore W4313015773C199360897 @default.