Matches in SemOpenAlex for { <https://semopenalex.org/work/W2897360952> ?p ?o ?g. }
Showing items 1 to 84 of
84
with 100 items per page.
- W2897360952 endingPage "175" @default.
- W2897360952 startingPage "151" @default.
- W2897360952 abstract "Code-switching is the phenomenon whereby multilingual speakers spontaneously alternate between more than one language during discourse and is widespread in multilingual societies. Current state-of-the-art automatic speech recognition (ASR) systems are optimised for monolingual speech, but performance degrades severely when presented with multiple languages. We address ASR of speech containing switches between English and four South African Bantu languages. No comparable study on code-switched speech for these languages has been conducted before and consequently no directly applicable benchmarks exist. A new and unique corpus containing 14.3 hours of spontaneous speech extracted from South African soap operas was used to perform our study. The varied nature of the code-switching in this data presents many challenges to ASR. We focus specifically on how the language model can be improved to better model bilingual language switches for English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho. Code-switching examples in the corpus transcriptions were extremely sparse, with the majority of code-switched bigrams occurring only once. Furthermore, differences in language typology between English and the Bantu languages and among the Bantu languages themselves contribute further challenges. We propose a new method using word embeddings trained on text data that is both out-of-domain and monolingual for the synthesis of artificial bilingual code-switched bigrams to augment the sparse language modelling training data. This technique has the particular advantage of not requiring any additional training data that includes code-switching. We show that the proposed approach is able to synthesise valid code-switched bigrams not seen in the training set. We also show that, by augmenting the training set with these bigrams, we are able to achieve notable reductions for all language pairs in the overall perplexity and particularly substantial reductions in the perplexity calculated across a language switch boundary (between 5 and 31%). We demonstrate that the proposed approach is able to reduce the unseen code-switched bigram types in the test sets by up to 20.5%. Finally, we show that the augmented language models achieve reductions in the word error rate for three of the four language pairs considered. The gains were larger for language pairs with disjunctive orthography than for those with conjunctive orthography. We conclude that the augmentation of language model training data with code-switched bigrams synthesised using word embeddings trained on out-of-domain monolingual text is a viable means of improving the performance of ASR for code-switched speech." @default.
- W2897360952 created "2018-10-26" @default.
- W2897360952 creator A5039075258 @default.
- W2897360952 creator A5050849172 @default.
- W2897360952 date "2019-03-01" @default.
- W2897360952 modified "2023-10-18" @default.
- W2897360952 title "Synthesised bigrams using word embeddings for code-switched ASR of four South African language pairs" @default.
- W2897360952 cites W1615991656 @default.
- W2897360952 cites W1978400666 @default.
- W2897360952 cites W1981706894 @default.
- W2897360952 cites W1993721840 @default.
- W2897360952 cites W2094655846 @default.
- W2897360952 cites W2164499211 @default.
- W2897360952 doi "https://doi.org/10.1016/j.csl.2018.10.002" @default.
- W2897360952 hasPublicationYear "2019" @default.
- W2897360952 type Work @default.
- W2897360952 sameAs 2897360952 @default.
- W2897360952 citedByCount "9" @default.
- W2897360952 countsByYear W28973609522019 @default.
- W2897360952 countsByYear W28973609522020 @default.
- W2897360952 countsByYear W28973609522021 @default.
- W2897360952 countsByYear W28973609522022 @default.
- W2897360952 countsByYear W28973609522023 @default.
- W2897360952 crossrefType "journal-article" @default.
- W2897360952 hasAuthorship W2897360952A5039075258 @default.
- W2897360952 hasAuthorship W2897360952A5050849172 @default.
- W2897360952 hasConcept C108494575 @default.
- W2897360952 hasConcept C108757681 @default.
- W2897360952 hasConcept C120665830 @default.
- W2897360952 hasConcept C121332964 @default.
- W2897360952 hasConcept C137293760 @default.
- W2897360952 hasConcept C137546455 @default.
- W2897360952 hasConcept C138885662 @default.
- W2897360952 hasConcept C154945302 @default.
- W2897360952 hasConcept C177264268 @default.
- W2897360952 hasConcept C18552078 @default.
- W2897360952 hasConcept C192209626 @default.
- W2897360952 hasConcept C199360897 @default.
- W2897360952 hasConcept C204321447 @default.
- W2897360952 hasConcept C2776760102 @default.
- W2897360952 hasConcept C28490314 @default.
- W2897360952 hasConcept C41008148 @default.
- W2897360952 hasConcept C41895202 @default.
- W2897360952 hasConcept C90805587 @default.
- W2897360952 hasConcept C99878080 @default.
- W2897360952 hasConceptScore W2897360952C108494575 @default.
- W2897360952 hasConceptScore W2897360952C108757681 @default.
- W2897360952 hasConceptScore W2897360952C120665830 @default.
- W2897360952 hasConceptScore W2897360952C121332964 @default.
- W2897360952 hasConceptScore W2897360952C137293760 @default.
- W2897360952 hasConceptScore W2897360952C137546455 @default.
- W2897360952 hasConceptScore W2897360952C138885662 @default.
- W2897360952 hasConceptScore W2897360952C154945302 @default.
- W2897360952 hasConceptScore W2897360952C177264268 @default.
- W2897360952 hasConceptScore W2897360952C18552078 @default.
- W2897360952 hasConceptScore W2897360952C192209626 @default.
- W2897360952 hasConceptScore W2897360952C199360897 @default.
- W2897360952 hasConceptScore W2897360952C204321447 @default.
- W2897360952 hasConceptScore W2897360952C2776760102 @default.
- W2897360952 hasConceptScore W2897360952C28490314 @default.
- W2897360952 hasConceptScore W2897360952C41008148 @default.
- W2897360952 hasConceptScore W2897360952C41895202 @default.
- W2897360952 hasConceptScore W2897360952C90805587 @default.
- W2897360952 hasConceptScore W2897360952C99878080 @default.
- W2897360952 hasLocation W28973609521 @default.
- W2897360952 hasOpenAccess W2897360952 @default.
- W2897360952 hasPrimaryLocation W28973609521 @default.
- W2897360952 hasRelatedWork W1500873938 @default.
- W2897360952 hasRelatedWork W1700330385 @default.
- W2897360952 hasRelatedWork W2002221802 @default.
- W2897360952 hasRelatedWork W2020757772 @default.
- W2897360952 hasRelatedWork W2041167939 @default.
- W2897360952 hasRelatedWork W2105076537 @default.
- W2897360952 hasRelatedWork W2131111393 @default.
- W2897360952 hasRelatedWork W2250909759 @default.
- W2897360952 hasRelatedWork W2330996469 @default.
- W2897360952 hasRelatedWork W2562995433 @default.
- W2897360952 hasVolume "54" @default.
- W2897360952 isParatext "false" @default.
- W2897360952 isRetracted "false" @default.
- W2897360952 magId "2897360952" @default.
- W2897360952 workType "article" @default.