SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4312565955> ?p ?o ?g. }

Showing items 1 to 75 of 75 with 100 items per page.

W4312565955 abstract "Most speaker recognition systems presume that the language for enrollment and testing is the same. Cross-lingual speaker recognition is rarely investigated. This study collected trilingual (including Mandarin, English, and Taiwanese) cross-language recordings named MET-40. A total of 40 participants (20 male, 20 female) contribute to the dataset which contains 740 minutes of audio. Spoken texts are mainly taken from elementary school textbooks, and some English texts use TIMIT.We employ ResNet, vision transformer (ViT), and convo-lutional vision transformer (CvT) in combination with three acoustic features, namely, spectrogram, Mel spectrogram, and Mel frequency cepstral coefficient for single, mixed and cross-language speaker recognition tasks. In the mixed-language setting, the language to be tested is included in the training set, while in the cross-language scenario the language to be tested is not used for training. Experimental results show that the highest accuracy is 97.16% for single language models. Mixture of two languages improves the performance to 99.17%. In cross-language situations, the accuracy drops significantly to 79.64%, as the spoken language is not present in the training data. When two languages are employed for training, the accuracy rose to 90.92%. In general, CvT-based models demonstrate the best stability in all cases.The robustness of the model is critical to security in practical applications. Therefore, we analyze how adversarial attacks impact different speaker identification models. The results show that although CvT-based model exhibits excellent performance, it is easily affected by the perturbation caused by the adversarial attack. The effect is less pronounced when more languages are used for training, with an average increase of 5.11% in accuracy. Finally, extra caution needs to be taken when MFCC is chosen to be the acoustic feature, as attacks can still take place without training data, and the recognition rate is reduced by 31.57% using FGSM cross-language attack." @default.
W4312565955 created "2023-01-05" @default.
W4312565955 creator A5005486605 @default.
W4312565955 creator A5034290509 @default.
W4312565955 creator A5053445228 @default.
W4312565955 date "2022-08-21" @default.
W4312565955 modified "2023-09-23" @default.
W4312565955 title "On the Robustness of Cross-lingual Speaker Recognition using Transformer-based Approaches" @default.
W4312565955 cites W1494198834 @default.
W4312565955 cites W2194775991 @default.
W4312565955 cites W2963173190 @default.
W4312565955 cites W3096025930 @default.
W4312565955 cites W3153453329 @default.
W4312565955 cites W3159867998 @default.
W4312565955 cites W3184648662 @default.
W4312565955 cites W3196974791 @default.
W4312565955 cites W4214493665 @default.
W4312565955 doi "https://doi.org/10.1109/icpr56361.2022.9956274" @default.
W4312565955 hasPublicationYear "2022" @default.
W4312565955 type Work @default.
W4312565955 citedByCount "0" @default.
W4312565955 crossrefType "proceedings-article" @default.
W4312565955 hasAuthorship W4312565955A5005486605 @default.
W4312565955 hasAuthorship W4312565955A5034290509 @default.
W4312565955 hasAuthorship W4312565955A5053445228 @default.
W4312565955 hasConcept C104317684 @default.
W4312565955 hasConcept C119599485 @default.
W4312565955 hasConcept C127413603 @default.
W4312565955 hasConcept C129792486 @default.
W4312565955 hasConcept C137293760 @default.
W4312565955 hasConcept C154945302 @default.
W4312565955 hasConcept C165801399 @default.
W4312565955 hasConcept C185592680 @default.
W4312565955 hasConcept C195324797 @default.
W4312565955 hasConcept C204321447 @default.
W4312565955 hasConcept C28490314 @default.
W4312565955 hasConcept C37736160 @default.
W4312565955 hasConcept C41008148 @default.
W4312565955 hasConcept C45273575 @default.
W4312565955 hasConcept C55493867 @default.
W4312565955 hasConcept C63479239 @default.
W4312565955 hasConcept C66322947 @default.
W4312565955 hasConceptScore W4312565955C104317684 @default.
W4312565955 hasConceptScore W4312565955C119599485 @default.
W4312565955 hasConceptScore W4312565955C127413603 @default.
W4312565955 hasConceptScore W4312565955C129792486 @default.
W4312565955 hasConceptScore W4312565955C137293760 @default.
W4312565955 hasConceptScore W4312565955C154945302 @default.
W4312565955 hasConceptScore W4312565955C165801399 @default.
W4312565955 hasConceptScore W4312565955C185592680 @default.
W4312565955 hasConceptScore W4312565955C195324797 @default.
W4312565955 hasConceptScore W4312565955C204321447 @default.
W4312565955 hasConceptScore W4312565955C28490314 @default.
W4312565955 hasConceptScore W4312565955C37736160 @default.
W4312565955 hasConceptScore W4312565955C41008148 @default.
W4312565955 hasConceptScore W4312565955C45273575 @default.
W4312565955 hasConceptScore W4312565955C55493867 @default.
W4312565955 hasConceptScore W4312565955C63479239 @default.
W4312565955 hasConceptScore W4312565955C66322947 @default.
W4312565955 hasLocation W43125659551 @default.
W4312565955 hasOpenAccess W4312565955 @default.
W4312565955 hasPrimaryLocation W43125659551 @default.
W4312565955 hasRelatedWork W1803932089 @default.
W4312565955 hasRelatedWork W2894869510 @default.
W4312565955 hasRelatedWork W3107474891 @default.
W4312565955 hasRelatedWork W3183228686 @default.
W4312565955 hasRelatedWork W4297798711 @default.
W4312565955 hasRelatedWork W4304891817 @default.
W4312565955 hasRelatedWork W4307309570 @default.
W4312565955 hasRelatedWork W4308231854 @default.
W4312565955 hasRelatedWork W4312628544 @default.
W4312565955 hasRelatedWork W4313067245 @default.
W4312565955 isParatext "false" @default.
W4312565955 isRetracted "false" @default.
W4312565955 workType "article" @default.