Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386907507> ?p ?o ?g. }
Showing items 1 to 55 of
55
with 100 items per page.
- W4386907507 abstract "Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement. This paper proposes the incorporation of ultrasound tongue images to improve the performance of lip-based AV-SE systems further. To address the challenge of acquiring ultrasound tongue images during inference, we first propose to employ knowledge distillation during training to investigate the feasibility of leveraging tongue-related information without directly inputting ultrasound tongue images. Specifically, we guide an audio-lip speech enhancement student model to learn from a pre-trained audio-lip-tongue speech enhancement teacher model, thus transferring tongue-related knowledge. To better model the alignment between the lip and tongue modalities, we further propose the introduction of a lip-tongue key-value memory network into the AV-SE model. This network enables the retrieval of tongue features based on readily available lip features, thereby assisting the subsequent speech enhancement task. Experimental results demonstrate that both methods significantly improve the quality and intelligibility of the enhanced speech compared to traditional lip-based AV-SE baselines. Moreover, both proposed methods exhibit strong generalization performance on unseen speakers and in the presence of unseen noises. Furthermore, phone error rate (PER) analysis of automatic speech recognition (ASR) reveals that while all phonemes benefit from introducing ultrasound tongue images, palatal and velar consonants benefit most." @default.
- W4386907507 created "2023-09-21" @default.
- W4386907507 creator A5014746276 @default.
- W4386907507 creator A5059767940 @default.
- W4386907507 creator A5066498315 @default.
- W4386907507 date "2023-09-19" @default.
- W4386907507 modified "2023-10-16" @default.
- W4386907507 title "Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement" @default.
- W4386907507 doi "https://doi.org/10.48550/arxiv.2309.10455" @default.
- W4386907507 hasPublicationYear "2023" @default.
- W4386907507 type Work @default.
- W4386907507 citedByCount "0" @default.
- W4386907507 crossrefType "posted-content" @default.
- W4386907507 hasAuthorship W4386907507A5014746276 @default.
- W4386907507 hasAuthorship W4386907507A5059767940 @default.
- W4386907507 hasAuthorship W4386907507A5066498315 @default.
- W4386907507 hasBestOaLocation W43869075071 @default.
- W4386907507 hasConcept C111472728 @default.
- W4386907507 hasConcept C138885662 @default.
- W4386907507 hasConcept C154945302 @default.
- W4386907507 hasConcept C163294075 @default.
- W4386907507 hasConcept C2776182073 @default.
- W4386907507 hasConcept C2776214188 @default.
- W4386907507 hasConcept C2779744641 @default.
- W4386907507 hasConcept C28490314 @default.
- W4386907507 hasConcept C41008148 @default.
- W4386907507 hasConcept C41895202 @default.
- W4386907507 hasConcept C60048801 @default.
- W4386907507 hasConceptScore W4386907507C111472728 @default.
- W4386907507 hasConceptScore W4386907507C138885662 @default.
- W4386907507 hasConceptScore W4386907507C154945302 @default.
- W4386907507 hasConceptScore W4386907507C163294075 @default.
- W4386907507 hasConceptScore W4386907507C2776182073 @default.
- W4386907507 hasConceptScore W4386907507C2776214188 @default.
- W4386907507 hasConceptScore W4386907507C2779744641 @default.
- W4386907507 hasConceptScore W4386907507C28490314 @default.
- W4386907507 hasConceptScore W4386907507C41008148 @default.
- W4386907507 hasConceptScore W4386907507C41895202 @default.
- W4386907507 hasConceptScore W4386907507C60048801 @default.
- W4386907507 hasLocation W43869075071 @default.
- W4386907507 hasOpenAccess W4386907507 @default.
- W4386907507 hasPrimaryLocation W43869075071 @default.
- W4386907507 hasRelatedWork W1986772939 @default.
- W4386907507 hasRelatedWork W2037635165 @default.
- W4386907507 hasRelatedWork W2061366489 @default.
- W4386907507 hasRelatedWork W2542098180 @default.
- W4386907507 hasRelatedWork W3080561272 @default.
- W4386907507 hasRelatedWork W3129072390 @default.
- W4386907507 hasRelatedWork W4200562864 @default.
- W4386907507 hasRelatedWork W4221152531 @default.
- W4386907507 hasRelatedWork W4375869276 @default.
- W4386907507 hasRelatedWork W4386858351 @default.
- W4386907507 isParatext "false" @default.
- W4386907507 isRetracted "false" @default.
- W4386907507 workType "article" @default.