Matches in SemOpenAlex for { <https://semopenalex.org/work/W3011234510> ?p ?o ?g. }
- W3011234510 endingPage "1064" @default.
- W3011234510 startingPage "1052" @default.
- W3011234510 abstract "Audio-Visual Speech Recognition (AVSR) seeks to model, and thereby exploit, the dynamic relationship between a human voice and the corresponding mouth movements. A recently proposed multimodal fusion strategy, AV Align, based on state-of-the-art sequence to sequence neural networks, attempts to model this relationship by explicitly aligning the acoustic and visual representations of speech. This study investigates the inner workings of AV Align and visualises the audio-visual alignment patterns. Our experiments are performed on two of the largest publicly available AVSR datasets, TCD-TIMIT and LRS2. We find that AV Align learns to align acoustic and visual representations of speech at the frame level on TCD-TIMIT in a generally monotonic pattern. We also determine the cause of initially seeing no improvement over audio-only speech recognition on the more challenging LRS2. We propose a regularisation method which involves predicting lip-related Action Units from visual representations. Our regularisation method leads to better exploitation of the visual modality, with performance improvements between 7% and 30% depending on the noise level. Furthermore, we show that the alternative Watch, Listen, Attend, and Spell network is affected by the same problem as AV Align, and that our proposed approach can effectively help it learn visual representations. Our findings validate the suitability of the regularisation method to AVSR and encourage researchers to rethink the multimodal convergence problem when having one dominant modality." @default.
- W3011234510 created "2020-03-23" @default.
- W3011234510 creator A5034046565 @default.
- W3011234510 creator A5042231269 @default.
- W3011234510 creator A5042852762 @default.
- W3011234510 date "2020-01-01" @default.
- W3011234510 modified "2023-09-24" @default.
- W3011234510 title "How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition" @default.
- W3011234510 cites W1531976713 @default.
- W3011234510 cites W1583939596 @default.
- W3011234510 cites W1595126664 @default.
- W3011234510 cites W1902237438 @default.
- W3011234510 cites W2029199293 @default.
- W3011234510 cites W2038548569 @default.
- W3011234510 cites W2096391593 @default.
- W3011234510 cites W2116261113 @default.
- W3011234510 cites W2136848157 @default.
- W3011234510 cites W2142075667 @default.
- W3011234510 cites W2184188583 @default.
- W3011234510 cites W2302255633 @default.
- W3011234510 cites W2327501763 @default.
- W3011234510 cites W2402144811 @default.
- W3011234510 cites W2550980560 @default.
- W3011234510 cites W2584992898 @default.
- W3011234510 cites W2594690981 @default.
- W3011234510 cites W2603880695 @default.
- W3011234510 cites W2612405771 @default.
- W3011234510 cites W2612754342 @default.
- W3011234510 cites W2619383789 @default.
- W3011234510 cites W2735762732 @default.
- W3011234510 cites W2739634219 @default.
- W3011234510 cites W2763942302 @default.
- W3011234510 cites W2785608500 @default.
- W3011234510 cites W2790326622 @default.
- W3011234510 cites W2806872492 @default.
- W3011234510 cites W2807126412 @default.
- W3011234510 cites W2883383043 @default.
- W3011234510 cites W2886945201 @default.
- W3011234510 cites W2889448058 @default.
- W3011234510 cites W2889624961 @default.
- W3011234510 cites W2890952074 @default.
- W3011234510 cites W2897067191 @default.
- W3011234510 cites W2901907199 @default.
- W3011234510 cites W2931364255 @default.
- W3011234510 cites W2952746495 @default.
- W3011234510 cites W2963096987 @default.
- W3011234510 cites W2963403868 @default.
- W3011234510 cites W2963528589 @default.
- W3011234510 cites W2963654155 @default.
- W3011234510 cites W2963658982 @default.
- W3011234510 cites W2963744813 @default.
- W3011234510 cites W2963785710 @default.
- W3011234510 cites W2964308564 @default.
- W3011234510 cites W2991391304 @default.
- W3011234510 cites W302213031 @default.
- W3011234510 cites W3127686677 @default.
- W3011234510 cites W3137695714 @default.
- W3011234510 cites W854541894 @default.
- W3011234510 doi "https://doi.org/10.1109/taslp.2020.2980436" @default.
- W3011234510 hasPublicationYear "2020" @default.
- W3011234510 type Work @default.
- W3011234510 sameAs 3011234510 @default.
- W3011234510 citedByCount "20" @default.
- W3011234510 countsByYear W30112345102020 @default.
- W3011234510 countsByYear W30112345102021 @default.
- W3011234510 countsByYear W30112345102022 @default.
- W3011234510 countsByYear W30112345102023 @default.
- W3011234510 crossrefType "journal-article" @default.
- W3011234510 hasAuthorship W3011234510A5034046565 @default.
- W3011234510 hasAuthorship W3011234510A5042231269 @default.
- W3011234510 hasAuthorship W3011234510A5042852762 @default.
- W3011234510 hasBestOaLocation W30112345102 @default.
- W3011234510 hasConcept C144024400 @default.
- W3011234510 hasConcept C154945302 @default.
- W3011234510 hasConcept C19165224 @default.
- W3011234510 hasConcept C23224414 @default.
- W3011234510 hasConcept C2778724510 @default.
- W3011234510 hasConcept C2780226545 @default.
- W3011234510 hasConcept C2780957641 @default.
- W3011234510 hasConcept C28490314 @default.
- W3011234510 hasConcept C41008148 @default.
- W3011234510 hasConceptScore W3011234510C144024400 @default.
- W3011234510 hasConceptScore W3011234510C154945302 @default.
- W3011234510 hasConceptScore W3011234510C19165224 @default.
- W3011234510 hasConceptScore W3011234510C23224414 @default.
- W3011234510 hasConceptScore W3011234510C2778724510 @default.
- W3011234510 hasConceptScore W3011234510C2780226545 @default.
- W3011234510 hasConceptScore W3011234510C2780957641 @default.
- W3011234510 hasConceptScore W3011234510C28490314 @default.
- W3011234510 hasConceptScore W3011234510C41008148 @default.
- W3011234510 hasFunder F4320309480 @default.
- W3011234510 hasFunder F4320335322 @default.
- W3011234510 hasLocation W30112345101 @default.
- W3011234510 hasLocation W30112345102 @default.
- W3011234510 hasOpenAccess W3011234510 @default.
- W3011234510 hasPrimaryLocation W30112345101 @default.
- W3011234510 hasRelatedWork W1586532344 @default.
- W3011234510 hasRelatedWork W1990589093 @default.