Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385327158> ?p ?o ?g. }
- W4385327158 endingPage "101550" @default.
- W4385327158 startingPage "101550" @default.
- W4385327158 abstract "Prior studies in the automatic classification of voice quality have mainly studied the use of the acoustic speech signal as input. Recently, a few studies have been carried out by jointly using both speech and neck surface accelerometer (NSA) signals as inputs, and by extracting mel-frequency cepstral coefficients (MFCCs) and glottal source features. This study examines simultaneously-recorded speech and NSA signals in the classification of voice quality (breathy, modal, and pressed) using features derived from three self-supervised pre-trained models (wav2vec2-BASE, wav2vec2-LARGE, and HuBERT) and using a support vector machine (SVM) as well as convolutional neural networks (CNNs) as classifiers. Furthermore, the effectiveness of the pre-trained models is compared in feature extraction between glottal source waveforms and raw signal waveforms for both speech and NSA inputs. Using two signal processing methods (quasi-closed phase (QCP) glottal inverse filtering and zero frequency filtering (ZFF)), glottal source waveforms are estimated from both speech and NSA signals. The study has three main goals: (1) to study whether features derived from pre-trained models improve classification accuracy compared to conventional features (spectrogram, mel-spectrogram, MFCCs, i-vector, and x-vector), (2) to investigate which of the two modalities (speech vs. NSA) is more effective as input in the classification task with pre-trained model-based features, and (3) to evaluate whether the deep learning-based CNN classifier can enhance the classification accuracy in comparison to the SVM classifier. The results revealed that the use of the NSA input showed better classification performance compared to the speech signal. Between the features, the pre-trained model-based features showed better classification accuracies, both for speech and NSA inputs compared to the conventional features. The two classifiers performed equally well for all the pre-trained model-based features for both speech and NSA signals. It was also found that the HuBERT features performed better than the wav2vec2-BASE and wav2vec2-LARGE features for both speech and NSA inputs. In particular, when compared to the conventional features, the HuBERT features showed an absolute accuracy improvement of 3%–6% for speech and NSA signals in the classification of voice quality." @default.
- W4385327158 created "2023-07-28" @default.
- W4385327158 creator A5042186400 @default.
- W4385327158 creator A5078656545 @default.
- W4385327158 creator A5080870546 @default.
- W4385327158 date "2024-01-01" @default.
- W4385327158 modified "2023-09-26" @default.
- W4385327158 title "Investigation of self-supervised pre-trained models for classification of voice quality from speech and neck surface accelerometer signals" @default.
- W4385327158 cites W1546220687 @default.
- W4385327158 cites W1783956044 @default.
- W4385327158 cites W1970278793 @default.
- W4385327158 cites W1970747557 @default.
- W4385327158 cites W1987041627 @default.
- W4385327158 cites W1990496939 @default.
- W4385327158 cites W2002044908 @default.
- W4385327158 cites W2008162323 @default.
- W4385327158 cites W2013374607 @default.
- W4385327158 cites W2014856271 @default.
- W4385327158 cites W2025386793 @default.
- W4385327158 cites W2029874859 @default.
- W4385327158 cites W2038936610 @default.
- W4385327158 cites W2040305186 @default.
- W4385327158 cites W2042723094 @default.
- W4385327158 cites W2051457717 @default.
- W4385327158 cites W2062460420 @default.
- W4385327158 cites W2099076569 @default.
- W4385327158 cites W2109020278 @default.
- W4385327158 cites W2109138290 @default.
- W4385327158 cites W2112844139 @default.
- W4385327158 cites W2115846934 @default.
- W4385327158 cites W2132290247 @default.
- W4385327158 cites W2136879537 @default.
- W4385327158 cites W2146665221 @default.
- W4385327158 cites W2149828546 @default.
- W4385327158 cites W2150769028 @default.
- W4385327158 cites W2153903782 @default.
- W4385327158 cites W2295624080 @default.
- W4385327158 cites W2332809169 @default.
- W4385327158 cites W2397464817 @default.
- W4385327158 cites W2408021097 @default.
- W4385327158 cites W2587150483 @default.
- W4385327158 cites W2683949293 @default.
- W4385327158 cites W2745166798 @default.
- W4385327158 cites W2770890514 @default.
- W4385327158 cites W2807627734 @default.
- W4385327158 cites W2889118666 @default.
- W4385327158 cites W2889254982 @default.
- W4385327158 cites W2889378348 @default.
- W4385327158 cites W2889494795 @default.
- W4385327158 cites W2937510224 @default.
- W4385327158 cites W2972382841 @default.
- W4385327158 cites W2984726089 @default.
- W4385327158 cites W3008479202 @default.
- W4385327158 cites W3014647115 @default.
- W4385327158 cites W3023579893 @default.
- W4385327158 cites W30609886 @default.
- W4385327158 cites W3159416871 @default.
- W4385327158 cites W3198786495 @default.
- W4385327158 cites W3202370288 @default.
- W4385327158 cites W3209059054 @default.
- W4385327158 cites W4200517558 @default.
- W4385327158 cites W4292876226 @default.
- W4385327158 cites W4296069119 @default.
- W4385327158 cites W4319303099 @default.
- W4385327158 cites W4319878709 @default.
- W4385327158 cites W4360584387 @default.
- W4385327158 doi "https://doi.org/10.1016/j.csl.2023.101550" @default.
- W4385327158 hasPublicationYear "2024" @default.
- W4385327158 type Work @default.
- W4385327158 citedByCount "0" @default.
- W4385327158 crossrefType "journal-article" @default.
- W4385327158 hasAuthorship W4385327158A5042186400 @default.
- W4385327158 hasAuthorship W4385327158A5078656545 @default.
- W4385327158 hasAuthorship W4385327158A5080870546 @default.
- W4385327158 hasBestOaLocation W43853271581 @default.
- W4385327158 hasConcept C12267149 @default.
- W4385327158 hasConcept C151989614 @default.
- W4385327158 hasConcept C153180895 @default.
- W4385327158 hasConcept C154945302 @default.
- W4385327158 hasConcept C197424946 @default.
- W4385327158 hasConcept C28490314 @default.
- W4385327158 hasConcept C41008148 @default.
- W4385327158 hasConcept C45273575 @default.
- W4385327158 hasConcept C52622490 @default.
- W4385327158 hasConcept C554190296 @default.
- W4385327158 hasConcept C66905080 @default.
- W4385327158 hasConcept C76155785 @default.
- W4385327158 hasConcept C81363708 @default.
- W4385327158 hasConcept C95623464 @default.
- W4385327158 hasConceptScore W4385327158C12267149 @default.
- W4385327158 hasConceptScore W4385327158C151989614 @default.
- W4385327158 hasConceptScore W4385327158C153180895 @default.
- W4385327158 hasConceptScore W4385327158C154945302 @default.
- W4385327158 hasConceptScore W4385327158C197424946 @default.
- W4385327158 hasConceptScore W4385327158C28490314 @default.
- W4385327158 hasConceptScore W4385327158C41008148 @default.
- W4385327158 hasConceptScore W4385327158C45273575 @default.
- W4385327158 hasConceptScore W4385327158C52622490 @default.