Matches in SemOpenAlex for { <https://semopenalex.org/work/W2567339923> ?p ?o ?g. }
Showing items 1 to 77 of
77
with 100 items per page.
- W2567339923 abstract "Speech is a desirable communication method between humans and computers. The major concerns of the automatic speech recognition (ASR) are determining a set of classification features and finding a suitable recognition model for these features. Hidden Markov Models (HMMs) have been demonstrated to be powerful models for representing time varying signals. Artificial Neural Networks (ANNs) have also been widely used for representing time varying quasi-stationary signals. Arabic is one of the oldest living languages and one of the oldest Semitic languages in the world, it is also the fifth most generally used language and is the mother tongue for roughly 200 million people. Arabic speech recognition has been a fertile area of reasearch over the previous two decades, as attested by the various papers that have been published on this subject.This thesis investigates phoneme and acoustic models based on Deep Neural Networks (DNN) and Deep Echo State Networks for multi-dialect Arabic Speech Recognition. Moreover, the TIMIT corpus with a wide variety of American dialects is also aimed to evaluate the proposed models.The availability of speech data that is time-aligned and labelled at phonemic level is a fundamental requirement for building speech recognition systems. A developed Arabic phoneme database (APD) was manually timed and phonetically labelled. This dataset was constructed from the King Abdul-Aziz Arabic Phonetics Database (KAPD) database for Saudi Arabia dialect and the Centre for Spoken Language Understanding (CSLU2002) database for different Arabic dialects. This dataset covers 8148 Arabic phonemes. In addition, a corpus of 120 speakers (13 hours of Arabic speech) randomly selected from the Levantine Arabicdialect database that is used for training and 24 speakers (2.4 hours) for testing are revised and transcription errors were manually corrected. The selected dataset is labelled automatically using the HTK Hidden Markov Model toolkit. TIMIT corpus is also used for phone recognition and acoustic modelling task. We used 462 speakers (3.14 hours) for training and 24 speakers (0.81 hours) for testing. For Automatic Speech Recognition (ASR), a Deep Neural Network (DNN) is used to evaluate its adoption in developing a framewise phoneme recognition and an acoustic modelling system for Arabic speech recognition. Restricted Boltzmann Machines (RBMs) DNN models have not been explored for any Arabic corpora previously. This allows us to claim priority for adopting this RBM DNN model for the Levantine Arabic acoustic models. A post-processing enhancement was also applied to the DNN acoustic model outputs in order to improve the recognition accuracy and to obtain the accuracy at a phoneme level instead of the frame level. This post process has significantly improved the recognition performance. An Echo State Network (ESN) is developed and evaluated for Arabic phoneme recognition with different learning algorithms. This investigated the use of the conventional ESN trained with supervised and forced learning algorithms. A novel combined supervised/forced supervised learning algorithm (unsupervised adaptation) was developed and tested on the proposed optimised Arabic phoneme recognition datasets. This new model is evaluated on the Levantine dataset and empirically compared with the results obtained from the baseline Deep Neural Networks (DNNs). A significant improvement on the recognition performance was achieved when the ESN model was implemented compared to the baseline RBM DNN model’s result. The results show that the ESN model has a better ability for recognizing phonemes sequences than the DNN model for a small vocabulary size dataset. The adoption of the ESNs model for acoustic modeling is seen to be more valid than the adoption of the DNNs model for acoustic modeling speech recognition, as ESNs are recurrent models and expected to support sequence models better than the RBM DNN models even with the contextual input window. The TIMIT corpus is also used to investigate deep learning for framewise phoneme classification and acoustic modelling using Deep Neural Networks (DNNs) and Echo State Networks (ESNs) to allow us to make a direct and valid comparison between the proposed systems investigated in this thesis and the published works in equivalent projects based on framewise phoneme recognition used the TIMIT corpus. Our main finding on this corpus is that ESN network outperform time-windowed RBM DNN ones. However, our developed system ESN-based shows 10% lower performance when it was compared to the other systems recently reported in the literature that used the same corpus. This due to the hardware availability and not applying speaker and noise adaption that can improve the results in this thesis as our aim is to investigate the proposed models for speech recognition and to make a direct comparison between these models." @default.
- W2567339923 created "2017-01-06" @default.
- W2567339923 creator A5007111488 @default.
- W2567339923 date "2015-07-01" @default.
- W2567339923 modified "2023-09-26" @default.
- W2567339923 title "Deep neural network acoustic models for multi-dialect Arabic speech recognition" @default.
- W2567339923 cites W1898467614 @default.
- W2567339923 cites W1979447841 @default.
- W2567339923 cites W2029996593 @default.
- W2567339923 cites W2130607791 @default.
- W2567339923 cites W2401167848 @default.
- W2567339923 cites W88081813 @default.
- W2567339923 hasPublicationYear "2015" @default.
- W2567339923 type Work @default.
- W2567339923 sameAs 2567339923 @default.
- W2567339923 citedByCount "0" @default.
- W2567339923 crossrefType "dissertation" @default.
- W2567339923 hasAuthorship W2567339923A5007111488 @default.
- W2567339923 hasConcept C132165367 @default.
- W2567339923 hasConcept C136197465 @default.
- W2567339923 hasConcept C138885662 @default.
- W2567339923 hasConcept C14999030 @default.
- W2567339923 hasConcept C154945302 @default.
- W2567339923 hasConcept C155635449 @default.
- W2567339923 hasConcept C204321447 @default.
- W2567339923 hasConcept C23224414 @default.
- W2567339923 hasConcept C2778724510 @default.
- W2567339923 hasConcept C28490314 @default.
- W2567339923 hasConcept C41008148 @default.
- W2567339923 hasConcept C41895202 @default.
- W2567339923 hasConcept C50644808 @default.
- W2567339923 hasConcept C61328038 @default.
- W2567339923 hasConcept C91863865 @default.
- W2567339923 hasConcept C96455323 @default.
- W2567339923 hasConceptScore W2567339923C132165367 @default.
- W2567339923 hasConceptScore W2567339923C136197465 @default.
- W2567339923 hasConceptScore W2567339923C138885662 @default.
- W2567339923 hasConceptScore W2567339923C14999030 @default.
- W2567339923 hasConceptScore W2567339923C154945302 @default.
- W2567339923 hasConceptScore W2567339923C155635449 @default.
- W2567339923 hasConceptScore W2567339923C204321447 @default.
- W2567339923 hasConceptScore W2567339923C23224414 @default.
- W2567339923 hasConceptScore W2567339923C2778724510 @default.
- W2567339923 hasConceptScore W2567339923C28490314 @default.
- W2567339923 hasConceptScore W2567339923C41008148 @default.
- W2567339923 hasConceptScore W2567339923C41895202 @default.
- W2567339923 hasConceptScore W2567339923C50644808 @default.
- W2567339923 hasConceptScore W2567339923C61328038 @default.
- W2567339923 hasConceptScore W2567339923C91863865 @default.
- W2567339923 hasConceptScore W2567339923C96455323 @default.
- W2567339923 hasLocation W25673399231 @default.
- W2567339923 hasOpenAccess W2567339923 @default.
- W2567339923 hasPrimaryLocation W25673399231 @default.
- W2567339923 hasRelatedWork W1892788530 @default.
- W2567339923 hasRelatedWork W2050469586 @default.
- W2567339923 hasRelatedWork W2059019626 @default.
- W2567339923 hasRelatedWork W2147590749 @default.
- W2567339923 hasRelatedWork W2161511440 @default.
- W2567339923 hasRelatedWork W2294962864 @default.
- W2567339923 hasRelatedWork W2463237750 @default.
- W2567339923 hasRelatedWork W2558302074 @default.
- W2567339923 hasRelatedWork W2558378600 @default.
- W2567339923 hasRelatedWork W2586614823 @default.
- W2567339923 hasRelatedWork W2587496545 @default.
- W2567339923 hasRelatedWork W2741147977 @default.
- W2567339923 hasRelatedWork W2756127416 @default.
- W2567339923 hasRelatedWork W2793012778 @default.
- W2567339923 hasRelatedWork W2894690744 @default.
- W2567339923 hasRelatedWork W2955075648 @default.
- W2567339923 hasRelatedWork W3091088345 @default.
- W2567339923 hasRelatedWork W3120338682 @default.
- W2567339923 hasRelatedWork W3127644309 @default.
- W2567339923 hasRelatedWork W3164643878 @default.
- W2567339923 isParatext "false" @default.
- W2567339923 isRetracted "false" @default.
- W2567339923 magId "2567339923" @default.
- W2567339923 workType "dissertation" @default.