Matches in SemOpenAlex for { <https://semopenalex.org/work/W3197163313> ?p ?o ?g. }
- W3197163313 endingPage "107847" @default.
- W3197163313 startingPage "107847" @default.
- W3197163313 abstract "In this paper, a multilingual end-to-end framework, called ATCSpeechNet, is proposed to tackle the issue of translating communication speech into human-readable text in air traffic control (ATC) systems. In the proposed framework, we focus on integrating multilingual automatic speech recognition (ASR) into one model, in which an end-to-end paradigm is developed to convert speech waveforms into text directly, without any feature engineering or lexicon. To compensate the deficiency of handcrafted feature engineering caused by ATC challenges, including multilingual, multispeaker dialog and unstable speech rates, a speech representation learning (SRL) network is proposed to capture robust and discriminative speech representations from raw waves. The self-supervised training strategy is adopted to optimize the SRL network from unlabeled data, and to further predict the speech features, i.e., wave-to-feature. An end-to-end architecture is improved to complete the ASR task, in which a grapheme-based modeling unit is applied to address the multilingual ASR issue. Facing the problem of small transcribed samples in the ATC domain, an unsupervised approach with mask prediction is applied to pretrain the backbone network of the ASR model on unlabeled data by a feature-to-feature process. Finally, by integrating the SRL with ASR, an end-to-end multilingual ASR framework is formulated in a supervised manner, which is able to translate the raw wave into text in one model, i.e., wave-to-text. Experimental results on the ATCSpeech corpus demonstrate that the proposed approach achieves high performance with a very small labeled corpus and less resource consumption, only a 4.20% label error rate on the 58-hour transcribed corpus. Compared to the baseline model, the proposed approach obtains over 100% relative performance improvement which can be further enhanced with increasing size of the transcribed samples. It is also confirmed that the proposed SRL and training strategies make significant contributions to improving the final performance. In addition, the effectiveness of the proposed framework is also validated on common corpora (AISHELL, LibriSpeech, and cv-fr). More importantly, the proposed multilingual framework not only reduces the system complexity but also obtains higher accuracy compared to that of the independent monolingual ASR models. The proposed approach can also greatly reduce the cost of annotating samples, which benefits to advance the ASR technique to industrial applications." @default.
- W3197163313 created "2021-09-13" @default.
- W3197163313 creator A5006247366 @default.
- W3197163313 creator A5018178311 @default.
- W3197163313 creator A5044544424 @default.
- W3197163313 creator A5057805812 @default.
- W3197163313 creator A5066617875 @default.
- W3197163313 creator A5067325797 @default.
- W3197163313 creator A5079789406 @default.
- W3197163313 date "2021-11-01" @default.
- W3197163313 modified "2023-10-02" @default.
- W3197163313 title "ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems" @default.
- W3197163313 cites W1494198834 @default.
- W3197163313 cites W2091432990 @default.
- W3197163313 cites W2098457679 @default.
- W3197163313 cites W2112739286 @default.
- W3197163313 cites W2113852278 @default.
- W3197163313 cites W2127141656 @default.
- W3197163313 cites W2134587001 @default.
- W3197163313 cites W2143612262 @default.
- W3197163313 cites W2160815625 @default.
- W3197163313 cites W2169415433 @default.
- W3197163313 cites W2327501763 @default.
- W3197163313 cites W2403743561 @default.
- W3197163313 cites W2580596079 @default.
- W3197163313 cites W2914275764 @default.
- W3197163313 cites W2914586490 @default.
- W3197163313 cites W2963242190 @default.
- W3197163313 cites W2964002616 @default.
- W3197163313 cites W2964309797 @default.
- W3197163313 cites W2976884277 @default.
- W3197163313 cites W3036417315 @default.
- W3197163313 cites W3129824274 @default.
- W3197163313 cites W3133953372 @default.
- W3197163313 cites W3174448166 @default.
- W3197163313 cites W4252331534 @default.
- W3197163313 doi "https://doi.org/10.1016/j.asoc.2021.107847" @default.
- W3197163313 hasPublicationYear "2021" @default.
- W3197163313 type Work @default.
- W3197163313 sameAs 3197163313 @default.
- W3197163313 citedByCount "8" @default.
- W3197163313 countsByYear W31971633132021 @default.
- W3197163313 countsByYear W31971633132022 @default.
- W3197163313 countsByYear W31971633132023 @default.
- W3197163313 crossrefType "journal-article" @default.
- W3197163313 hasAuthorship W3197163313A5006247366 @default.
- W3197163313 hasAuthorship W3197163313A5018178311 @default.
- W3197163313 hasAuthorship W3197163313A5044544424 @default.
- W3197163313 hasAuthorship W3197163313A5057805812 @default.
- W3197163313 hasAuthorship W3197163313A5066617875 @default.
- W3197163313 hasAuthorship W3197163313A5067325797 @default.
- W3197163313 hasAuthorship W3197163313A5079789406 @default.
- W3197163313 hasBestOaLocation W31971633132 @default.
- W3197163313 hasConcept C108583219 @default.
- W3197163313 hasConcept C111919701 @default.
- W3197163313 hasConcept C138885662 @default.
- W3197163313 hasConcept C154945302 @default.
- W3197163313 hasConcept C204321447 @default.
- W3197163313 hasConcept C2776401178 @default.
- W3197163313 hasConcept C2778827112 @default.
- W3197163313 hasConcept C28490314 @default.
- W3197163313 hasConcept C41008148 @default.
- W3197163313 hasConcept C41895202 @default.
- W3197163313 hasConcept C59404180 @default.
- W3197163313 hasConcept C61328038 @default.
- W3197163313 hasConcept C74296488 @default.
- W3197163313 hasConcept C97931131 @default.
- W3197163313 hasConcept C98045186 @default.
- W3197163313 hasConceptScore W3197163313C108583219 @default.
- W3197163313 hasConceptScore W3197163313C111919701 @default.
- W3197163313 hasConceptScore W3197163313C138885662 @default.
- W3197163313 hasConceptScore W3197163313C154945302 @default.
- W3197163313 hasConceptScore W3197163313C204321447 @default.
- W3197163313 hasConceptScore W3197163313C2776401178 @default.
- W3197163313 hasConceptScore W3197163313C2778827112 @default.
- W3197163313 hasConceptScore W3197163313C28490314 @default.
- W3197163313 hasConceptScore W3197163313C41008148 @default.
- W3197163313 hasConceptScore W3197163313C41895202 @default.
- W3197163313 hasConceptScore W3197163313C59404180 @default.
- W3197163313 hasConceptScore W3197163313C61328038 @default.
- W3197163313 hasConceptScore W3197163313C74296488 @default.
- W3197163313 hasConceptScore W3197163313C97931131 @default.
- W3197163313 hasConceptScore W3197163313C98045186 @default.
- W3197163313 hasFunder F4320321001 @default.
- W3197163313 hasLocation W31971633131 @default.
- W3197163313 hasLocation W31971633132 @default.
- W3197163313 hasOpenAccess W3197163313 @default.
- W3197163313 hasPrimaryLocation W31971633131 @default.
- W3197163313 hasRelatedWork W2050806332 @default.
- W3197163313 hasRelatedWork W2308097916 @default.
- W3197163313 hasRelatedWork W2952024438 @default.
- W3197163313 hasRelatedWork W2970216048 @default.
- W3197163313 hasRelatedWork W2998168123 @default.
- W3197163313 hasRelatedWork W3164948662 @default.
- W3197163313 hasRelatedWork W3177373753 @default.
- W3197163313 hasRelatedWork W4287995534 @default.
- W3197163313 hasRelatedWork W4298816048 @default.
- W3197163313 hasRelatedWork W4319166497 @default.