Matches in SemOpenAlex for { <https://semopenalex.org/work/W3163456394> ?p ?o ?g. }
Showing items 1 to 86 of
86
with 100 items per page.
- W3163456394 abstract "Asynchronization issue caused by different types of modalities is one of the major problems in audio visual speech recognition (AVSR) research. However, most AVSR systems merely rely on up sampling of video or down sampling of audio to align audio and visual features, assuming that the feature sequences are aligned frame-by-frame. These pre-processing steps oversimplify the asynchrony relation between acoustic signal and lip motion, lacking flexibility and impairing the performance of the system. Although there are systems modeling the asynchrony between the modalities, sometimes they fail to align speech and video precisely over some even all noisy conditions. In this paper, we propose a mutual feature alignment method for AVSR which can make full use of cross modility information to address the asynchronization issue by introducing Mutual Iterative Attention (MIA) mechanism. Our method can automatically learn an alignment in a mutual way by performing mutual attention iteratively between the audio and visual features, relying on the modified encoder structure of Transformer. Experimental results show that our proposed method obtains absolute improvements up to 20.42% over the audio modality alone depending upon the signal-to-noise-ratio (SNR) level. Better recognition performance can also be achieved comparing with the traditional feature concatenation method under both clean and noisy conditions. It is expectable that our proposed mutual feature alignment method can be easily generalized to other multimodal tasks with semantically correlated information." @default.
- W3163456394 created "2021-05-24" @default.
- W3163456394 creator A5057287134 @default.
- W3163456394 creator A5072379029 @default.
- W3163456394 creator A5086513946 @default.
- W3163456394 date "2021-01-10" @default.
- W3163456394 modified "2023-09-27" @default.
- W3163456394 title "Mutual Alignment between Audiovisual Features for End-to-End Audiovisual Speech Recognition" @default.
- W3163456394 cites W1974387177 @default.
- W3163456394 cites W2015394094 @default.
- W3163456394 cites W2048198238 @default.
- W3163456394 cites W2071932093 @default.
- W3163456394 cites W2116258879 @default.
- W3163456394 cites W2143612262 @default.
- W3163456394 cites W2147768505 @default.
- W3163456394 cites W2155765376 @default.
- W3163456394 cites W2157190406 @default.
- W3163456394 cites W2194775991 @default.
- W3163456394 cites W2889624961 @default.
- W3163456394 cites W2897067191 @default.
- W3163456394 cites W2901907199 @default.
- W3163456394 cites W2962824709 @default.
- W3163456394 cites W2963654155 @default.
- W3163456394 doi "https://doi.org/10.1109/icpr48806.2021.9412349" @default.
- W3163456394 hasPublicationYear "2021" @default.
- W3163456394 type Work @default.
- W3163456394 sameAs 3163456394 @default.
- W3163456394 citedByCount "2" @default.
- W3163456394 countsByYear W31634563942022 @default.
- W3163456394 crossrefType "proceedings-article" @default.
- W3163456394 hasAuthorship W3163456394A5057287134 @default.
- W3163456394 hasAuthorship W3163456394A5072379029 @default.
- W3163456394 hasAuthorship W3163456394A5086513946 @default.
- W3163456394 hasConcept C104317684 @default.
- W3163456394 hasConcept C111919701 @default.
- W3163456394 hasConcept C114614502 @default.
- W3163456394 hasConcept C118505674 @default.
- W3163456394 hasConcept C138885662 @default.
- W3163456394 hasConcept C152139883 @default.
- W3163456394 hasConcept C153180895 @default.
- W3163456394 hasConcept C154945302 @default.
- W3163456394 hasConcept C185592680 @default.
- W3163456394 hasConcept C2776401178 @default.
- W3163456394 hasConcept C28490314 @default.
- W3163456394 hasConcept C33923547 @default.
- W3163456394 hasConcept C41008148 @default.
- W3163456394 hasConcept C41895202 @default.
- W3163456394 hasConcept C55493867 @default.
- W3163456394 hasConcept C63479239 @default.
- W3163456394 hasConcept C87619178 @default.
- W3163456394 hasConceptScore W3163456394C104317684 @default.
- W3163456394 hasConceptScore W3163456394C111919701 @default.
- W3163456394 hasConceptScore W3163456394C114614502 @default.
- W3163456394 hasConceptScore W3163456394C118505674 @default.
- W3163456394 hasConceptScore W3163456394C138885662 @default.
- W3163456394 hasConceptScore W3163456394C152139883 @default.
- W3163456394 hasConceptScore W3163456394C153180895 @default.
- W3163456394 hasConceptScore W3163456394C154945302 @default.
- W3163456394 hasConceptScore W3163456394C185592680 @default.
- W3163456394 hasConceptScore W3163456394C2776401178 @default.
- W3163456394 hasConceptScore W3163456394C28490314 @default.
- W3163456394 hasConceptScore W3163456394C33923547 @default.
- W3163456394 hasConceptScore W3163456394C41008148 @default.
- W3163456394 hasConceptScore W3163456394C41895202 @default.
- W3163456394 hasConceptScore W3163456394C55493867 @default.
- W3163456394 hasConceptScore W3163456394C63479239 @default.
- W3163456394 hasConceptScore W3163456394C87619178 @default.
- W3163456394 hasFunder F4320321001 @default.
- W3163456394 hasFunder F4320325571 @default.
- W3163456394 hasLocation W31634563941 @default.
- W3163456394 hasOpenAccess W3163456394 @default.
- W3163456394 hasPrimaryLocation W31634563941 @default.
- W3163456394 hasRelatedWork W1966366482 @default.
- W3163456394 hasRelatedWork W2005071453 @default.
- W3163456394 hasRelatedWork W2382607599 @default.
- W3163456394 hasRelatedWork W2546942002 @default.
- W3163456394 hasRelatedWork W2789928768 @default.
- W3163456394 hasRelatedWork W2938286185 @default.
- W3163456394 hasRelatedWork W2951125461 @default.
- W3163456394 hasRelatedWork W2970216048 @default.
- W3163456394 hasRelatedWork W3206503703 @default.
- W3163456394 hasRelatedWork W4303038597 @default.
- W3163456394 isParatext "false" @default.
- W3163456394 isRetracted "false" @default.
- W3163456394 magId "3163456394" @default.
- W3163456394 workType "article" @default.