Matches in SemOpenAlex for { <https://semopenalex.org/work/W3184890083> ?p ?o ?g. }
Showing items 1 to 74 of
74
with 100 items per page.
- W3184890083 abstract "This thesis is concerned with multimodal machine learning for digital humanities. Multimodal machine learning integrates vision, speech, and language to solve a particular set of tasks, such as sentiment analysis, emotion recognition, personality recognition, and deceptive behaviour detection. The usage of other modalities benefited these tasks since human communication is multimodal by its nature. The intersection between humanities and computational methods defines the so-called digital humanities, i.e., a subset in the humanities and social sciences, which leverages digital mechanisms to conduct their research. This thesis supports the claim that using audiovisual modalities when training computational models in digital humanities can benefit the performance of any cumbersome task where annotators use audiovisual sources of information to annotate the data. We hypothesise that audiovisual content studied by some areas from humanities and social sciences, such as psychology, pedagogy, and communication sciences, can be explained and categorised by audiovisual processing techniques. These techniques can increase humanities and socials sciences researchers' productivity by bootstrapping their analysis using machine learning techniques and allowing their research to scale to more massive amounts of data. Besides that, these methods could also implement more socially aware virtual agents. This kind of technology enables a more sophisticated computer-human interaction, which can enrich commercial applications' user experience.Problems tackled by natural language processing techniques sometimes reach an upper bound due to the limitations of the knowledge present in textual information. Humans use prosody to convey meaning. Machine learning models trying to predict the sentiment present in transcribed speech can lose much information if dealing solely with the text modality. Persuasiveness prediction is another excellent example since factors beyond argumentation, such as prosody, visual appearance, and body language, can persuade people. Previous work in opinion mining and persuasiveness prediction have shown that multimodal approaches are quite successful when combining multiple modalities. However, textual transcripts and visual information might not be available due to technical constraints, so one may ask how accurately machine learning models predict people's opinion using only prosodic information. Most of the work in computational paralinguistics rely on cumbersome feature-engineering approaches, so another question is whether domain-agnostic methods work in this field. Our results show that relying on a simple recurrent neural architecture trained on Mel-Frequency Cepstral Coefficients can predict speakers' opinion.Speech is not the only channel besides the textual one that signals critical information. The visual channel is also significant. Humans can express several expressions, defined as cues under the Lens Model of Brunswik. Researchers from humanities and social sciences try to understand how relevant those signals are by manually annotating information that might be present in the facial expressions of subjects under analysis. However, these tasks are very time-consuming and prone to human errors due to fatigue or lack of training. We present that automatically extracted low and high-level features using the latest computer vision methods can explain visual data from researchers of humanities and social sciences, especially from areas like pedagogy and communication sciences. We also demonstrate that an end-to-end approach can automatically predict the psychological construct of intrinsic motivation.Another problem widely studied in political sciences is the understanding of persuasive factors in speeches and debates. For instance, Nagel et al. (2012) have evaluated which features in all three modalities (text, speech, and vision) were forming an audience's impression in the national election debate between Angela Merkel and Gerhard Schroeder. However, there is no previous work in the literature, which presents an automated approach to predict what impression a politician is forming during a debate. Our results reveal that high-level features automatically extracted in a multimodal approach can indicate what elements in political communication mould an audience's impression and are also useful for training machine learning models to predict it.We run the experiments in this thesis with data from psychology, pedagogy, and communication science research, providing empirical evidence to the raised hypothesis that audiovisual content coming from humanities and social sciences can be explained and automatically classified by audiovisual processing methods. This thesis presents new applications of multimodal machine learning in digital humanities, presenting different ways of modelling the tasks, besides reinforcing the well-known issue of fairness in artificial intelligence. In conclusion, this thesis strengthens the notion that audiovisual modalities are primary communication channels which should be carefully analysed and explored in multimodal machine learning for digital humanities." @default.
- W3184890083 created "2021-08-02" @default.
- W3184890083 creator A5013196391 @default.
- W3184890083 creator A5069325178 @default.
- W3184890083 date "2021-01-01" @default.
- W3184890083 modified "2023-09-24" @default.
- W3184890083 title "Multimodal Classification of Audiovisual Content" @default.
- W3184890083 doi "https://doi.org/10.26083/tuprints-00018590" @default.
- W3184890083 hasPublicationYear "2021" @default.
- W3184890083 type Work @default.
- W3184890083 sameAs 3184890083 @default.
- W3184890083 citedByCount "0" @default.
- W3184890083 crossrefType "dissertation" @default.
- W3184890083 hasAuthorship W3184890083A5013196391 @default.
- W3184890083 hasAuthorship W3184890083A5069325178 @default.
- W3184890083 hasConcept C106159729 @default.
- W3184890083 hasConcept C144024400 @default.
- W3184890083 hasConcept C154945302 @default.
- W3184890083 hasConcept C15744967 @default.
- W3184890083 hasConcept C162324750 @default.
- W3184890083 hasConcept C204321447 @default.
- W3184890083 hasConcept C207609745 @default.
- W3184890083 hasConcept C2522767166 @default.
- W3184890083 hasConcept C2779903281 @default.
- W3184890083 hasConcept C2780660688 @default.
- W3184890083 hasConcept C2780876879 @default.
- W3184890083 hasConcept C36289849 @default.
- W3184890083 hasConcept C41008148 @default.
- W3184890083 hasConcept C49774154 @default.
- W3184890083 hasConcept C542102704 @default.
- W3184890083 hasConcept C66402592 @default.
- W3184890083 hasConceptScore W3184890083C106159729 @default.
- W3184890083 hasConceptScore W3184890083C144024400 @default.
- W3184890083 hasConceptScore W3184890083C154945302 @default.
- W3184890083 hasConceptScore W3184890083C15744967 @default.
- W3184890083 hasConceptScore W3184890083C162324750 @default.
- W3184890083 hasConceptScore W3184890083C204321447 @default.
- W3184890083 hasConceptScore W3184890083C207609745 @default.
- W3184890083 hasConceptScore W3184890083C2522767166 @default.
- W3184890083 hasConceptScore W3184890083C2779903281 @default.
- W3184890083 hasConceptScore W3184890083C2780660688 @default.
- W3184890083 hasConceptScore W3184890083C2780876879 @default.
- W3184890083 hasConceptScore W3184890083C36289849 @default.
- W3184890083 hasConceptScore W3184890083C41008148 @default.
- W3184890083 hasConceptScore W3184890083C49774154 @default.
- W3184890083 hasConceptScore W3184890083C542102704 @default.
- W3184890083 hasConceptScore W3184890083C66402592 @default.
- W3184890083 hasLocation W31848900831 @default.
- W3184890083 hasOpenAccess W3184890083 @default.
- W3184890083 hasPrimaryLocation W31848900831 @default.
- W3184890083 hasRelatedWork W1608059655 @default.
- W3184890083 hasRelatedWork W2032355794 @default.
- W3184890083 hasRelatedWork W205723309 @default.
- W3184890083 hasRelatedWork W2082456776 @default.
- W3184890083 hasRelatedWork W2170436435 @default.
- W3184890083 hasRelatedWork W2294672650 @default.
- W3184890083 hasRelatedWork W2565775862 @default.
- W3184890083 hasRelatedWork W2803125506 @default.
- W3184890083 hasRelatedWork W2912982029 @default.
- W3184890083 hasRelatedWork W2945563815 @default.
- W3184890083 hasRelatedWork W2946072937 @default.
- W3184890083 hasRelatedWork W2948937448 @default.
- W3184890083 hasRelatedWork W2969662548 @default.
- W3184890083 hasRelatedWork W3012082522 @default.
- W3184890083 hasRelatedWork W3088420106 @default.
- W3184890083 hasRelatedWork W3096143076 @default.
- W3184890083 hasRelatedWork W3130558314 @default.
- W3184890083 hasRelatedWork W3185972086 @default.
- W3184890083 hasRelatedWork W3193573834 @default.
- W3184890083 hasRelatedWork W2462156405 @default.
- W3184890083 isParatext "false" @default.
- W3184890083 isRetracted "false" @default.
- W3184890083 magId "3184890083" @default.
- W3184890083 workType "dissertation" @default.