SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387247768> ?p ?o ?g. }

Showing items 1 to 74 of 74 with 100 items per page.

W4387247768 endingPage "1" @default.
W4387247768 startingPage "1" @default.
W4387247768 abstract "As a newly emerging task, audio-visual question answering (AVQA) has attracted research attention. Compared with traditional single-modality (e.g., audio or visual) QA tasks, it poses new challenges due to the higher complexity of feature extraction and fusion brought by the multimodal inputs. First, AVQA requires more comprehensive understanding of the scene which involves both audio and visual information; Second, in the presence of more information, feature extraction has to be better connected with a given question; Third, features from different modalities need to be sufficiently correlated and fused. To address this situation, this work proposes a novel framework for multimodal question answering task. It characterises an audiovisual scene at both global and local levels, and within each level, the features from different modalities are well fused. Furthermore, the given question is utilised to guide not only the feature extraction at the local level but also the final fusion of global and local features to predict the answer. Our framework provides a new perspective for audio-visual scene understanding through focusing on both general and specific representations as well as aggregating multimodalities by prioritizing question-related information. As experimentally demonstrated, our method significantly improves the existing audio-visual question answering performance, with the averaged absolute gain of 3.3% and 3.1% on MUSIC-AVQA and AVQA datasets, respectively. Moreover, the ablation study verifies the necessity and effectiveness of our design. Our code will be publicly released." @default.
W4387247768 created "2023-10-03" @default.
W4387247768 creator A5029110813 @default.
W4387247768 creator A5034617242 @default.
W4387247768 creator A5041791232 @default.
W4387247768 creator A5059312754 @default.
W4387247768 date "2023-01-01" @default.
W4387247768 modified "2023-10-03" @default.
W4387247768 title "Question-Aware Global-Local Video Understanding Network for Audio-Visual Question Answering" @default.
W4387247768 doi "https://doi.org/10.1109/tcsvt.2023.3318220" @default.
W4387247768 hasPublicationYear "2023" @default.
W4387247768 type Work @default.
W4387247768 citedByCount "0" @default.
W4387247768 crossrefType "journal-article" @default.
W4387247768 hasAuthorship W4387247768A5029110813 @default.
W4387247768 hasAuthorship W4387247768A5034617242 @default.
W4387247768 hasAuthorship W4387247768A5041791232 @default.
W4387247768 hasAuthorship W4387247768A5059312754 @default.
W4387247768 hasConcept C12713177 @default.
W4387247768 hasConcept C138885662 @default.
W4387247768 hasConcept C144024400 @default.
W4387247768 hasConcept C154945302 @default.
W4387247768 hasConcept C162324750 @default.
W4387247768 hasConcept C187736073 @default.
W4387247768 hasConcept C23123220 @default.
W4387247768 hasConcept C2776401178 @default.
W4387247768 hasConcept C2779903281 @default.
W4387247768 hasConcept C2780226545 @default.
W4387247768 hasConcept C2780451532 @default.
W4387247768 hasConcept C28490314 @default.
W4387247768 hasConcept C3017588708 @default.
W4387247768 hasConcept C36289849 @default.
W4387247768 hasConcept C41008148 @default.
W4387247768 hasConcept C41895202 @default.
W4387247768 hasConcept C44291984 @default.
W4387247768 hasConcept C49774154 @default.
W4387247768 hasConcept C52622490 @default.
W4387247768 hasConceptScore W4387247768C12713177 @default.
W4387247768 hasConceptScore W4387247768C138885662 @default.
W4387247768 hasConceptScore W4387247768C144024400 @default.
W4387247768 hasConceptScore W4387247768C154945302 @default.
W4387247768 hasConceptScore W4387247768C162324750 @default.
W4387247768 hasConceptScore W4387247768C187736073 @default.
W4387247768 hasConceptScore W4387247768C23123220 @default.
W4387247768 hasConceptScore W4387247768C2776401178 @default.
W4387247768 hasConceptScore W4387247768C2779903281 @default.
W4387247768 hasConceptScore W4387247768C2780226545 @default.
W4387247768 hasConceptScore W4387247768C2780451532 @default.
W4387247768 hasConceptScore W4387247768C28490314 @default.
W4387247768 hasConceptScore W4387247768C3017588708 @default.
W4387247768 hasConceptScore W4387247768C36289849 @default.
W4387247768 hasConceptScore W4387247768C41008148 @default.
W4387247768 hasConceptScore W4387247768C41895202 @default.
W4387247768 hasConceptScore W4387247768C44291984 @default.
W4387247768 hasConceptScore W4387247768C49774154 @default.
W4387247768 hasConceptScore W4387247768C52622490 @default.
W4387247768 hasLocation W43872477681 @default.
W4387247768 hasOpenAccess W4387247768 @default.
W4387247768 hasPrimaryLocation W43872477681 @default.
W4387247768 hasRelatedWork W1976606981 @default.
W4387247768 hasRelatedWork W2012466265 @default.
W4387247768 hasRelatedWork W2051167396 @default.
W4387247768 hasRelatedWork W219090214 @default.
W4387247768 hasRelatedWork W2613123485 @default.
W4387247768 hasRelatedWork W2914599329 @default.
W4387247768 hasRelatedWork W2964067226 @default.
W4387247768 hasRelatedWork W4205137593 @default.
W4387247768 hasRelatedWork W4236838349 @default.
W4387247768 hasRelatedWork W4385373813 @default.
W4387247768 isParatext "false" @default.
W4387247768 isRetracted "false" @default.
W4387247768 workType "article" @default.