SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4384785384> ?p ?o ?g. }

Showing items 1 to 64 of 64 with 100 items per page.

W4384785384 abstract "Human-computer interaction has seen a paradigm shift from textual or display-based control towards more intuitive control such as voice, gesture and mimicry. Particularly, speech recognition has attracted a lot of attention because it is the most prominent mode of communication. However, performance of speech recognition systems varies significantly according to sources of background noise, types of talkers and listener's hearing ability. Therefore, lip recognition technology which detects spoken words by tracking speaker's lip movements comes into being. It provides an alternative way for scenes with high background noise and people with hearing impaired problems. Also, lip reading technology has widespread application in public safety analysis, animation lip synthesis, identity authentication and other fields. Traditionally, most work in lipreading was based on hand-engineered features, that were usually modeled by HMM-based pipeline. Recently, deep learning methods are deployed either for extracting 'deep' features or for building end-to-end architectures. In this paper, we propose a neural network architecture combining convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) with a plug-in attention mechanism. The model consists of five parts: (1). Input: We use Dlib library for detecting 68 landmarks of the face, crop the lip area and extract 29 consecutive frames from the video sequence. The frames go through a simple C3D network for generic feature extraction. (2). CNN: with neural networks becoming deeper and deeper, computation complexity increases significantly as well, which motivated the appearance of the lightweight model architecture design. Then a lightweight CNN named ShuffleNet pre-trained on ImageNet dataset is used in our method to perform spatial downsampling of a single image. The ShuffleNet mainly uses two new operations, namely, pointwise group convolution and channel shuffle, which greatly reduce the computational cost without affecting recognition accuracy. (3) CBAM: In the field of image processing, a feature map contains a variety of important information. The traditional convolutional neural network performs convolution in the same way on all channels but importance of information varies greatly depending on different channels. To improve the performance of convolutional neural networks for feature extraction, we utilize an attention mechanism named Convolutional Block Attention Module (CBAM), which is a simple and effective attention module for feedforward convolutional neural networks and contains two independent sub-modules, namely, Channel Attention Module (CAM) and Spatial Attention Module (SAM), which perform Channel and Spatial Attention respectively. (4) RNN: The traditional Recurrent Neural Network (RNN) is mainly used to process sequential data, but with the extension of the RNN network, it may be unable to connect to all related information which may cause key information loss. It cannot solve the long-distance dependence problem and the performance may drop significantly. Due to this shortcoming of the traditional RNN network, we select the GRU network in this paper, which is a variant of the LSTM. It has a simpler structure and better performance than the LSTM neural network. (5) Outputs: Lastly, we pass the result of the backend to SoftMax for classifying the final word. In our experiment, we compare several model architectures and find that our model achieves a comparable accuracy to the current state-of-the-art model at a lower computational cost." @default.
W4384785384 created "2023-07-20" @default.
W4384785384 creator A5005015574 @default.
W4384785384 creator A5013039708 @default.
W4384785384 date "2023-01-01" @default.
W4384785384 modified "2023-09-26" @default.
W4384785384 title "Lip-Reading Research Based on ShuffleNet and Attention-GRU" @default.
W4384785384 doi "https://doi.org/10.54941/ahfe1004024" @default.
W4384785384 hasPublicationYear "2023" @default.
W4384785384 type Work @default.
W4384785384 citedByCount "0" @default.
W4384785384 crossrefType "proceedings-article" @default.
W4384785384 hasAuthorship W4384785384A5005015574 @default.
W4384785384 hasAuthorship W4384785384A5013039708 @default.
W4384785384 hasConcept C108583219 @default.
W4384785384 hasConcept C115961682 @default.
W4384785384 hasConcept C138885662 @default.
W4384785384 hasConcept C147168706 @default.
W4384785384 hasConcept C153180895 @default.
W4384785384 hasConcept C154945302 @default.
W4384785384 hasConcept C159437735 @default.
W4384785384 hasConcept C207347870 @default.
W4384785384 hasConcept C23224414 @default.
W4384785384 hasConcept C2776401178 @default.
W4384785384 hasConcept C28490314 @default.
W4384785384 hasConcept C41008148 @default.
W4384785384 hasConcept C41895202 @default.
W4384785384 hasConcept C50644808 @default.
W4384785384 hasConcept C52622490 @default.
W4384785384 hasConcept C81363708 @default.
W4384785384 hasConcept C99498987 @default.
W4384785384 hasConceptScore W4384785384C108583219 @default.
W4384785384 hasConceptScore W4384785384C115961682 @default.
W4384785384 hasConceptScore W4384785384C138885662 @default.
W4384785384 hasConceptScore W4384785384C147168706 @default.
W4384785384 hasConceptScore W4384785384C153180895 @default.
W4384785384 hasConceptScore W4384785384C154945302 @default.
W4384785384 hasConceptScore W4384785384C159437735 @default.
W4384785384 hasConceptScore W4384785384C207347870 @default.
W4384785384 hasConceptScore W4384785384C23224414 @default.
W4384785384 hasConceptScore W4384785384C2776401178 @default.
W4384785384 hasConceptScore W4384785384C28490314 @default.
W4384785384 hasConceptScore W4384785384C41008148 @default.
W4384785384 hasConceptScore W4384785384C41895202 @default.
W4384785384 hasConceptScore W4384785384C50644808 @default.
W4384785384 hasConceptScore W4384785384C52622490 @default.
W4384785384 hasConceptScore W4384785384C81363708 @default.
W4384785384 hasConceptScore W4384785384C99498987 @default.
W4384785384 hasLocation W43847853841 @default.
W4384785384 hasOpenAccess W4384785384 @default.
W4384785384 hasPrimaryLocation W43847853841 @default.
W4384785384 hasRelatedWork W1999635775 @default.
W4384785384 hasRelatedWork W2006347227 @default.
W4384785384 hasRelatedWork W2071640615 @default.
W4384785384 hasRelatedWork W2279398222 @default.
W4384785384 hasRelatedWork W2773120646 @default.
W4384785384 hasRelatedWork W2903018492 @default.
W4384785384 hasRelatedWork W2984615118 @default.
W4384785384 hasRelatedWork W3011074480 @default.
W4384785384 hasRelatedWork W3156786002 @default.
W4384785384 hasRelatedWork W4299822940 @default.
W4384785384 isParatext "false" @default.
W4384785384 isRetracted "false" @default.
W4384785384 workType "article" @default.