SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385570223> ?p ?o ?g. }

Showing items 1 to 75 of 75 with 100 items per page.

W4385570223 abstract "The task of spoken video grounding aims to localize moments in videos that are relevant to descriptive spoken queries. However, extracting semantic information from speech and modeling the cross-modal correlation pose two critical challenges. Previous studies solve them by representing spoken queries based on the matched video frames, which require tremendous effort for frame-level labeling. In this work, we investigate weakly-supervised spoken video grounding, i.e., learning to localize moments without expensive temporal annotations. To effectively represent the cross-modal semantics, we propose Semantic Interaction Learning (SIL), a novel framework consisting of the acoustic-semantic pre-training (ASP) and acoustic-visual contrastive learning (AVCL). In ASP, we pre-train an effective encoder for the grounding task with three comprehensive tasks, where the robustness task enhances stability by explicitly capturing the invariance between time- and frequency-domain features, the conciseness task avoids over-smooth attention by compressing long sequence into segments, and the semantic task improves spoken language understanding by modeling the precise semantics. In AVCL, we mine pseudo labels with discriminative sampling strategies and directly strengthen the interaction between speech and video by maximizing their mutual information. Extensive experiments demonstrate the effectiveness and superiority of our method." @default.
W4385570223 created "2023-08-05" @default.
W4385570223 creator A5009897266 @default.
W4385570223 creator A5016950354 @default.
W4385570223 creator A5019373084 @default.
W4385570223 creator A5030574650 @default.
W4385570223 creator A5065592637 @default.
W4385570223 creator A5077333536 @default.
W4385570223 creator A5079260216 @default.
W4385570223 date "2023-01-01" @default.
W4385570223 modified "2023-09-24" @default.
W4385570223 title "Weakly-Supervised Spoken Video Grounding via Semantic Interaction Learning" @default.
W4385570223 doi "https://doi.org/10.18653/v1/2023.acl-long.611" @default.
W4385570223 hasPublicationYear "2023" @default.
W4385570223 type Work @default.
W4385570223 citedByCount "0" @default.
W4385570223 crossrefType "proceedings-article" @default.
W4385570223 hasAuthorship W4385570223A5009897266 @default.
W4385570223 hasAuthorship W4385570223A5016950354 @default.
W4385570223 hasAuthorship W4385570223A5019373084 @default.
W4385570223 hasAuthorship W4385570223A5030574650 @default.
W4385570223 hasAuthorship W4385570223A5065592637 @default.
W4385570223 hasAuthorship W4385570223A5077333536 @default.
W4385570223 hasAuthorship W4385570223A5079260216 @default.
W4385570223 hasBestOaLocation W43855702231 @default.
W4385570223 hasConcept C104317684 @default.
W4385570223 hasConcept C154945302 @default.
W4385570223 hasConcept C162324750 @default.
W4385570223 hasConcept C184337299 @default.
W4385570223 hasConcept C185592680 @default.
W4385570223 hasConcept C187736073 @default.
W4385570223 hasConcept C188027245 @default.
W4385570223 hasConcept C199360897 @default.
W4385570223 hasConcept C204321447 @default.
W4385570223 hasConcept C2776230583 @default.
W4385570223 hasConcept C2780451532 @default.
W4385570223 hasConcept C28490314 @default.
W4385570223 hasConcept C41008148 @default.
W4385570223 hasConcept C55493867 @default.
W4385570223 hasConcept C63479239 @default.
W4385570223 hasConcept C71139939 @default.
W4385570223 hasConcept C97931131 @default.
W4385570223 hasConceptScore W4385570223C104317684 @default.
W4385570223 hasConceptScore W4385570223C154945302 @default.
W4385570223 hasConceptScore W4385570223C162324750 @default.
W4385570223 hasConceptScore W4385570223C184337299 @default.
W4385570223 hasConceptScore W4385570223C185592680 @default.
W4385570223 hasConceptScore W4385570223C187736073 @default.
W4385570223 hasConceptScore W4385570223C188027245 @default.
W4385570223 hasConceptScore W4385570223C199360897 @default.
W4385570223 hasConceptScore W4385570223C204321447 @default.
W4385570223 hasConceptScore W4385570223C2776230583 @default.
W4385570223 hasConceptScore W4385570223C2780451532 @default.
W4385570223 hasConceptScore W4385570223C28490314 @default.
W4385570223 hasConceptScore W4385570223C41008148 @default.
W4385570223 hasConceptScore W4385570223C55493867 @default.
W4385570223 hasConceptScore W4385570223C63479239 @default.
W4385570223 hasConceptScore W4385570223C71139939 @default.
W4385570223 hasConceptScore W4385570223C97931131 @default.
W4385570223 hasLocation W43855702231 @default.
W4385570223 hasOpenAccess W4385570223 @default.
W4385570223 hasPrimaryLocation W43855702231 @default.
W4385570223 hasRelatedWork W1987863801 @default.
W4385570223 hasRelatedWork W2026121273 @default.
W4385570223 hasRelatedWork W2102106825 @default.
W4385570223 hasRelatedWork W2757507228 @default.
W4385570223 hasRelatedWork W2801772698 @default.
W4385570223 hasRelatedWork W2892923641 @default.
W4385570223 hasRelatedWork W2983744209 @default.
W4385570223 hasRelatedWork W3100092831 @default.
W4385570223 hasRelatedWork W4302060929 @default.
W4385570223 hasRelatedWork W66955737 @default.
W4385570223 isParatext "false" @default.
W4385570223 isRetracted "false" @default.
W4385570223 workType "article" @default.