SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W2100613388> ?p ?o ?g. }

Showing items 1 to 63 of 63 with 100 items per page.

W2100613388 abstract "In the last two decades we witnessed a rapid increase of the computational power governed by Moore's Law. As a side effect, the affordability of cheaper and faster CPUs increased as well. Therefore, many new “smart” devices flooded the market and made informational systems widely spread. The number of users of information systems has also increased many folds, and the user's characteristics have changed to include not only a small number of initiates but also a majority of non technical people. To make this transition possible systems' developers had to change the computer user interfaces in order to make it simpler and more intuitive. However, the interaction was still based on rather artificial devices such as mouse and keyboard. Since the Moore's Law continues to work over and over again we came to a critical moment when the current systems can easily cope with other input streams such as video and audio, to name the most important, and others. We can, therefore, envision systems with which we can communicate through speech and body movements and that can automatically and transparently adapt to the environment and user. This can be done for instance by recognizing the user affective state, by understanding the task of the user and recognizing the context of the interaction. Automatic speech recognition by capturing and processing the audio signal is one development in this direction. Even though in controlled settings automatic speech recognition has achieved spectacular results, its performance is still dependent on the context (for instance on the level of the background noise). Automatic lip reading has appeared in this context as a way to enhance automatic speech recognition in noisy environments. Even though it is still a relatively novel research domain, other applications were found which employ lip reading as stand alone: interfaces for hearing impaired persons, security applications, speech recovery from mute of deteriorated films, silence interfaces. With the advances in visual signal processing the research in lip reading was also revitalized. However, at the moment of writing of this thesis lip reading was still waiting for its great leap. This thesis investigates several techniques for directing lip reading towards more robust performances. The thesis starts by introducing the relevant methodologies that govern automatic lip reading. Next it introduces all the concepts needed to understand the technologies, experiments, results and discussions presented later on. It is, therefore, one of the most important parts of the thesis. The presentation of the state of the art in lip reading is setting the starting point of the research presented. Before, continuing to follow the lip reading process the thesis introduces the mathematical foundations that give the theoretical support for the analysis. All our systems are based on the Hidden Markov Models approach. This paradigm has proved to be very useful in similar problems and we successfully employed it for lip reading. The main idea behind it is the Bayesian rule which says that starting from some a-priori knowledge we can always improve our understanding of a system through observation. Observation translates into processing the video stream in a set of features that describe what is being said by the speaker. However, in order to appropriately train lip reading systems, a large amount of data is needed. The first important contribution of our research is a large data corpus for the Dutch language. This corpus, named “New Delft University of Technology Audio Visual Speech Corpus”, is at the date of writing this thesis one of the largest corpora for lip reading in Dutch. The corpus contains dual view high speed recordings (i.e. 100Hz) of continuous speech in Dutch. During the building of the corpus, we also produced an incipient set of guidelines for building a data corpus for lip reading which we hope to be useful for other researchers. However, the core of this thesis consists in the data parametrization. Data parametrization is the process that transforms the input video data in a set of features that are used later on for training and testing the resulting recognizer. The parametrization should reduce the size of the input data while preserving the most important information related with what the speaker says. We investigated three data parametrization techniques each coming from a different category of algorithms. We used Active Appearance Models which generate a combined geometric and appearance based set of features, we used optical flow analysis which is an appearance based approach that directly accounts for the apparent movement on the speaker's face and we used a statistical approach which generates the geometry of lips without starting from an a-priori fixed model. During the research presented in this thesis we investigated the performances of these data parametrization techniques and we pointed out their strengths and weaknesses. We also analysed the performance of lip reading starting from other points of view. We analysed the influence of the sampling rate of the video data, the performance of the lip readers as a function of the recognition task but also the performance as a function of the size of the corpus used. Answering to all these questions improved our understanding of the limitations and the possible improvements of lip reading." @default.
W2100613388 created "2016-06-24" @default.
W2100613388 creator A5048401632 @default.
W2100613388 date "2010-11-02" @default.
W2100613388 modified "2023-09-24" @default.
W2100613388 title "Towards Robust Visual Speech Recognition : Automatic Systems for Lip Reading of Dutch" @default.
W2100613388 hasPublicationYear "2010" @default.
W2100613388 type Work @default.
W2100613388 sameAs 2100613388 @default.
W2100613388 citedByCount "0" @default.
W2100613388 crossrefType "journal-article" @default.
W2100613388 hasAuthorship W2100613388A5048401632 @default.
W2100613388 hasConcept C107457646 @default.
W2100613388 hasConcept C127413603 @default.
W2100613388 hasConcept C151730666 @default.
W2100613388 hasConcept C17744445 @default.
W2100613388 hasConcept C199539241 @default.
W2100613388 hasConcept C201995342 @default.
W2100613388 hasConcept C2779343474 @default.
W2100613388 hasConcept C2780451532 @default.
W2100613388 hasConcept C28490314 @default.
W2100613388 hasConcept C41008148 @default.
W2100613388 hasConcept C554936623 @default.
W2100613388 hasConcept C86803240 @default.
W2100613388 hasConceptScore W2100613388C107457646 @default.
W2100613388 hasConceptScore W2100613388C127413603 @default.
W2100613388 hasConceptScore W2100613388C151730666 @default.
W2100613388 hasConceptScore W2100613388C17744445 @default.
W2100613388 hasConceptScore W2100613388C199539241 @default.
W2100613388 hasConceptScore W2100613388C201995342 @default.
W2100613388 hasConceptScore W2100613388C2779343474 @default.
W2100613388 hasConceptScore W2100613388C2780451532 @default.
W2100613388 hasConceptScore W2100613388C28490314 @default.
W2100613388 hasConceptScore W2100613388C41008148 @default.
W2100613388 hasConceptScore W2100613388C554936623 @default.
W2100613388 hasConceptScore W2100613388C86803240 @default.
W2100613388 hasLocation W21006133881 @default.
W2100613388 hasOpenAccess W2100613388 @default.
W2100613388 hasPrimaryLocation W21006133881 @default.
W2100613388 hasRelatedWork W1001076335 @default.
W2100613388 hasRelatedWork W1588565267 @default.
W2100613388 hasRelatedWork W2101973992 @default.
W2100613388 hasRelatedWork W2298869102 @default.
W2100613388 hasRelatedWork W2403440562 @default.
W2100613388 hasRelatedWork W2490668544 @default.
W2100613388 hasRelatedWork W2612030847 @default.
W2100613388 hasRelatedWork W2612754342 @default.
W2100613388 hasRelatedWork W2613634895 @default.
W2100613388 hasRelatedWork W2740883964 @default.
W2100613388 hasRelatedWork W2809767522 @default.
W2100613388 hasRelatedWork W2963290645 @default.
W2100613388 hasRelatedWork W3012945836 @default.
W2100613388 hasRelatedWork W3034514362 @default.
W2100613388 hasRelatedWork W3041917307 @default.
W2100613388 hasRelatedWork W3105763085 @default.
W2100613388 hasRelatedWork W3164394706 @default.
W2100613388 hasRelatedWork W3167917117 @default.
W2100613388 hasRelatedWork W816280 @default.
W2100613388 hasRelatedWork W1584278479 @default.
W2100613388 isParatext "false" @default.
W2100613388 isRetracted "false" @default.
W2100613388 magId "2100613388" @default.
W2100613388 workType "article" @default.