Matches in SemOpenAlex for { <https://semopenalex.org/work/W4375868954> ?p ?o ?g. }
Showing items 1 to 93 of
93
with 100 items per page.
- W4375868954 abstract "The audio-visual direction of arrival (DOA) estimation has demonstrated superior performance recently. In this paper, we present a novel audio-visual multi-speaker DOA estimation network, which for the first time incorporates multi-speaker lip features to adapt the complex overlapping and noisy scenarios. Firstly, we encode the multi-channel audio features, the reference angles and the lip Regions of Interest (RoIs) detected from the video respectively to acquire high-level representations. Then the multi-modal embeddings of audio, speaker angles and lips are fused by a tri-modal gated fusion module to balance their contributions to the output. The fused embedding is sent to the backend network to obtain the accurate DOA estimation with the combination of the predicted speaker angular vectors and the speaker activities. Experimental results show that our proposed approach can reduce the localization error by 73.48% compared to the previous work on the 2021 Multi-modal Information based Speech Processing (MISP) Challenge corpus. Meanwhile, the high accuracy and stability of localization results demonstrate the robustness of the proposed model in multi-speaker scenarios." @default.
- W4375868954 created "2023-05-10" @default.
- W4375868954 creator A5038529708 @default.
- W4375868954 creator A5066595711 @default.
- W4375868954 creator A5066868860 @default.
- W4375868954 creator A5072315367 @default.
- W4375868954 creator A5077319251 @default.
- W4375868954 date "2023-06-04" @default.
- W4375868954 modified "2023-09-27" @default.
- W4375868954 title "Incorporating Lip Features into Audio-Visual Multi-Speaker DOA Estimation by Gated Fusion" @default.
- W4375868954 cites W2011396567 @default.
- W4375868954 cites W2046317813 @default.
- W4375868954 cites W2113638573 @default.
- W4375868954 cites W2114219351 @default.
- W4375868954 cites W2128970593 @default.
- W4375868954 cites W2222512263 @default.
- W4375868954 cites W2551990143 @default.
- W4375868954 cites W2810934215 @default.
- W4375868954 cites W2903040254 @default.
- W4375868954 cites W2964342924 @default.
- W4375868954 cites W2977809668 @default.
- W4375868954 cites W3011890046 @default.
- W4375868954 cites W3016011581 @default.
- W4375868954 cites W3035449864 @default.
- W4375868954 cites W3081461453 @default.
- W4375868954 cites W3168662520 @default.
- W4375868954 cites W3198730349 @default.
- W4375868954 cites W4221162997 @default.
- W4375868954 cites W4226301259 @default.
- W4375868954 cites W4312356258 @default.
- W4375868954 cites W4319780182 @default.
- W4375868954 doi "https://doi.org/10.1109/icassp49357.2023.10095549" @default.
- W4375868954 hasPublicationYear "2023" @default.
- W4375868954 type Work @default.
- W4375868954 citedByCount "0" @default.
- W4375868954 crossrefType "proceedings-article" @default.
- W4375868954 hasAuthorship W4375868954A5038529708 @default.
- W4375868954 hasAuthorship W4375868954A5066595711 @default.
- W4375868954 hasAuthorship W4375868954A5066868860 @default.
- W4375868954 hasAuthorship W4375868954A5072315367 @default.
- W4375868954 hasAuthorship W4375868954A5077319251 @default.
- W4375868954 hasBestOaLocation W43758689541 @default.
- W4375868954 hasConcept C104317684 @default.
- W4375868954 hasConcept C138885662 @default.
- W4375868954 hasConcept C153180895 @default.
- W4375868954 hasConcept C154945302 @default.
- W4375868954 hasConcept C158525013 @default.
- W4375868954 hasConcept C185592680 @default.
- W4375868954 hasConcept C188027245 @default.
- W4375868954 hasConcept C28490314 @default.
- W4375868954 hasConcept C3017588708 @default.
- W4375868954 hasConcept C41008148 @default.
- W4375868954 hasConcept C41608201 @default.
- W4375868954 hasConcept C41895202 @default.
- W4375868954 hasConcept C49774154 @default.
- W4375868954 hasConcept C55493867 @default.
- W4375868954 hasConcept C63479239 @default.
- W4375868954 hasConcept C66746571 @default.
- W4375868954 hasConcept C71139939 @default.
- W4375868954 hasConceptScore W4375868954C104317684 @default.
- W4375868954 hasConceptScore W4375868954C138885662 @default.
- W4375868954 hasConceptScore W4375868954C153180895 @default.
- W4375868954 hasConceptScore W4375868954C154945302 @default.
- W4375868954 hasConceptScore W4375868954C158525013 @default.
- W4375868954 hasConceptScore W4375868954C185592680 @default.
- W4375868954 hasConceptScore W4375868954C188027245 @default.
- W4375868954 hasConceptScore W4375868954C28490314 @default.
- W4375868954 hasConceptScore W4375868954C3017588708 @default.
- W4375868954 hasConceptScore W4375868954C41008148 @default.
- W4375868954 hasConceptScore W4375868954C41608201 @default.
- W4375868954 hasConceptScore W4375868954C41895202 @default.
- W4375868954 hasConceptScore W4375868954C49774154 @default.
- W4375868954 hasConceptScore W4375868954C55493867 @default.
- W4375868954 hasConceptScore W4375868954C63479239 @default.
- W4375868954 hasConceptScore W4375868954C66746571 @default.
- W4375868954 hasConceptScore W4375868954C71139939 @default.
- W4375868954 hasFunder F4320321001 @default.
- W4375868954 hasLocation W43758689541 @default.
- W4375868954 hasOpenAccess W4375868954 @default.
- W4375868954 hasPrimaryLocation W43758689541 @default.
- W4375868954 hasRelatedWork W1650988205 @default.
- W4375868954 hasRelatedWork W2055709700 @default.
- W4375868954 hasRelatedWork W2140997121 @default.
- W4375868954 hasRelatedWork W2182112479 @default.
- W4375868954 hasRelatedWork W2275988210 @default.
- W4375868954 hasRelatedWork W2331065455 @default.
- W4375868954 hasRelatedWork W2359640100 @default.
- W4375868954 hasRelatedWork W2389073067 @default.
- W4375868954 hasRelatedWork W2905846897 @default.
- W4375868954 hasRelatedWork W3134175397 @default.
- W4375868954 isParatext "false" @default.
- W4375868954 isRetracted "false" @default.
- W4375868954 workType "article" @default.