Matches in SemOpenAlex for { <https://semopenalex.org/work/W4312689520> ?p ?o ?g. }
- W4312689520 abstract "Audio-Guided video object segmentation is a challenging problem in visual analysis and editing, which automatically separates foreground objects from the background in a video sequence according to the referring audio expressions. However, existing referring video object segmentation works mainly focus on the guidance of text-based referring expressions, due to the lack of modeling the semantic representations of audio-video interaction contents. In this paper, we consider the problem of audio-guided video semantic segmentation from the viewpoint of end-to-end denoising encoder-decoder network learning. We propose the wavelet-based encoder network to learn the cross-modal representations of the video contents with audio-form queries. Specifically, we adopt the multi-head cross-modal attention layers to explore the potential relations of video and query contents. A 2-dimension discrete wavelet trans-form is merged into the transformer encoder to decompose the audio-video features. Next, we maximize mutual information between the encoded features and multi-modal features after cross-modal attention layers to enhance the au-dio guidance. Then, a self attention-free decoder network is developed to generate the target masks with frequency-domain transforms. In addition, we construct the first large-scale audio-guided video semantic segmentation dataset. The extensive experiments show the effectiveness of our method <sup xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sup> <sup xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sup> Code is available at: https://github.com/asudahkzj/Wnet.git." @default.
- W4312689520 created "2023-01-05" @default.
- W4312689520 creator A5004882141 @default.
- W4312689520 creator A5027937729 @default.
- W4312689520 creator A5044327434 @default.
- W4312689520 creator A5047455588 @default.
- W4312689520 creator A5048669373 @default.
- W4312689520 creator A5050817770 @default.
- W4312689520 creator A5066645546 @default.
- W4312689520 creator A5079003184 @default.
- W4312689520 creator A5079260216 @default.
- W4312689520 creator A5083350101 @default.
- W4312689520 date "2022-06-01" @default.
- W4312689520 modified "2023-09-27" @default.
- W4312689520 title "Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross- Modal Denoising Networks" @default.
- W4312689520 cites W1605211334 @default.
- W4312689520 cites W1905722737 @default.
- W4312689520 cites W2034014085 @default.
- W4312689520 cites W2115619136 @default.
- W4312689520 cites W2194775991 @default.
- W4312689520 cites W2218975416 @default.
- W4312689520 cites W2302548814 @default.
- W4312689520 cites W2470139095 @default.
- W4312689520 cites W2509407805 @default.
- W4312689520 cites W2586148577 @default.
- W4312689520 cites W2796315435 @default.
- W4312689520 cites W2798556392 @default.
- W4312689520 cites W2876852810 @default.
- W4312689520 cites W2884561390 @default.
- W4312689520 cites W2894964039 @default.
- W4312689520 cites W2927673779 @default.
- W4312689520 cites W2962766617 @default.
- W4312689520 cites W2962862718 @default.
- W4312689520 cites W2962914239 @default.
- W4312689520 cites W2962942822 @default.
- W4312689520 cites W2963354481 @default.
- W4312689520 cites W2964001192 @default.
- W4312689520 cites W2964051877 @default.
- W4312689520 cites W2964099072 @default.
- W4312689520 cites W2964345792 @default.
- W4312689520 cites W2973049979 @default.
- W4312689520 cites W2973233205 @default.
- W4312689520 cites W2980088508 @default.
- W4312689520 cites W2982723417 @default.
- W4312689520 cites W2983693499 @default.
- W4312689520 cites W2989358187 @default.
- W4312689520 cites W2997063389 @default.
- W4312689520 cites W3015300171 @default.
- W4312689520 cites W3023463084 @default.
- W4312689520 cites W3034325957 @default.
- W4312689520 cites W3034692043 @default.
- W4312689520 cites W3034777757 @default.
- W4312689520 cites W3035097537 @default.
- W4312689520 cites W3084809789 @default.
- W4312689520 cites W3104844437 @default.
- W4312689520 cites W3122784054 @default.
- W4312689520 cites W3156632700 @default.
- W4312689520 cites W3159476814 @default.
- W4312689520 cites W3161348170 @default.
- W4312689520 cites W3171516518 @default.
- W4312689520 cites W3187664142 @default.
- W4312689520 cites W4238100585 @default.
- W4312689520 doi "https://doi.org/10.1109/cvpr52688.2022.00138" @default.
- W4312689520 hasPublicationYear "2022" @default.
- W4312689520 type Work @default.
- W4312689520 citedByCount "0" @default.
- W4312689520 crossrefType "proceedings-article" @default.
- W4312689520 hasAuthorship W4312689520A5004882141 @default.
- W4312689520 hasAuthorship W4312689520A5027937729 @default.
- W4312689520 hasAuthorship W4312689520A5044327434 @default.
- W4312689520 hasAuthorship W4312689520A5047455588 @default.
- W4312689520 hasAuthorship W4312689520A5048669373 @default.
- W4312689520 hasAuthorship W4312689520A5050817770 @default.
- W4312689520 hasAuthorship W4312689520A5066645546 @default.
- W4312689520 hasAuthorship W4312689520A5079003184 @default.
- W4312689520 hasAuthorship W4312689520A5079260216 @default.
- W4312689520 hasAuthorship W4312689520A5083350101 @default.
- W4312689520 hasConcept C111919701 @default.
- W4312689520 hasConcept C118505674 @default.
- W4312689520 hasConcept C120665830 @default.
- W4312689520 hasConcept C121332964 @default.
- W4312689520 hasConcept C154945302 @default.
- W4312689520 hasConcept C192209626 @default.
- W4312689520 hasConcept C202474056 @default.
- W4312689520 hasConcept C23431618 @default.
- W4312689520 hasConcept C2781238097 @default.
- W4312689520 hasConcept C28490314 @default.
- W4312689520 hasConcept C30814859 @default.
- W4312689520 hasConcept C31972630 @default.
- W4312689520 hasConcept C41008148 @default.
- W4312689520 hasConcept C89600930 @default.
- W4312689520 hasConceptScore W4312689520C111919701 @default.
- W4312689520 hasConceptScore W4312689520C118505674 @default.
- W4312689520 hasConceptScore W4312689520C120665830 @default.
- W4312689520 hasConceptScore W4312689520C121332964 @default.
- W4312689520 hasConceptScore W4312689520C154945302 @default.
- W4312689520 hasConceptScore W4312689520C192209626 @default.
- W4312689520 hasConceptScore W4312689520C202474056 @default.
- W4312689520 hasConceptScore W4312689520C23431618 @default.
- W4312689520 hasConceptScore W4312689520C2781238097 @default.