Matches in SemOpenAlex for { <https://semopenalex.org/work/W1579239598> ?p ?o ?g. }
- W1579239598 abstract "In this thesis, we developed and assessed a novel robust and unsupervised framework for semantic inference from composite audio signals. We focused on the problem of detecting audio scenes and grouping them into meaningful clusters. Our approach addressed all major steps in a general process of composite audio analysis, from low-level signal processing (feature extraction), via mid-level content representation (audio element extraction and weighting), to high-level semantic inference (audio scene detection and clustering). We showed experimentally that our proposed content discovery scheme involving mid-level semantic descriptors as an intermediate inference result can lead to more robustness, compared to the classical content-based audio indexing approach, where the semantics is inferred from the features directly. To the best of our knowledge, this is the first proposal exploring the possibilities for a realization of an entirely unsupervised audio content discovery system aiming at high-level semantic inference results. The first major algorithmic contribution of the thesis consists of an unsupervised approach to decompose an audio stream into (key) audio elements, based on a set of extracted audio signal features. Similar to speech recognition that transcribes a speech signal into text words, our proposed approach “transcribes” a composite audio signal into audio “words”, where each word corresponds to a short temporal segment with coherent signal properties (e.g. music, speech, noise or any combination of these). We refer to these audio words as audio elements. To extract audio elements, we deployed an iterative spectral clustering method with context-dependent scaling factors. In this process, the elementary audio segments with similar features are grouped together into clusters. Then, all audio segments belonging to the same cluster are said to represent the same audio element. We now see an audio signal as a concatenation of audio segments corresponding to different audio elements, and develop an approach similar to those known from the text document segmentation field to divide the signal into meaningful longer segments. We refer to these segments as audio scenes. To develop such an approach, we computed the weights indicating the potential of each obtained audio element to help detect an audio scene boundary. To compute these weights, again the concepts from text information retrieval have been adopted, such as the term frequency (TF) and inverse document frequency (IDF), based on which a number of their equivalents in the audio segmentation context have been introduced. As the second major algorithmic contribution of the thesis, we presented a novel approach to audio scene segmentation and clustering. We first proposed a semantic affinity measure to determine whether two audio segments are likely to belong to the same audio scene. This measure considers the audio elements contained in the analyzed segments, their importance weights and their co-occurrence statistics. Then, the presence of an audio scene boundary at a given time stamp is investigated by jointly considering the values of the semantic affinity computed for a representative number of segment pairs surrounding the observed time stamp. Once the audio scenes are detected, a scheme based on the co-clustering concept was deployed to exploit the grouping tendency among audio elements when searching for optimal audio scene clusters. Here a method based on the Bayesian information criterion (BIC) was adopted to select the numbers of clusters in the co-clustering process. Experimental evaluations on a large and representative audio data set have shown that the proposed approach can achieve encouraging results and outperform the existing related approaches. The obtained results show a relatively high purity of the obtained audio elements. The number of the obtained elements, the type of sounds they represent and the importance weights assigned to them were shown to largely correspond to the judgment of our test user panel. Moreover, for audio scene segmentation and clustering, we obtained a 70% recall of audio scene boundaries with a 80% precision, based on the ground-truth annotation obtained using a panel of human annotators. Our co-clustering based approach achieved better performance than a traditional one-directional clustering, regarding both the clustering accuracy and cluster number estimation. We completed the thesis by making an attempt to envision a possible expansion of the proposed approach towards an application scope broader than the one considered in the thesis. We first considered the applications where domain knowledge is available. For such an application we investigated the possibilities to combine our unsupervised approach with a supervised one to benefit from the available domain knowledge and so improve the content discovery performance for that domain. Then, we also performed preliminary experiments to extrapolate the applicability of the proposed approach from a single document context to a collection of (long) audio documents. This involved a shift from the concept of document-specific audio elements to an anchor space representing a large collection of audio documents." @default.
- W1579239598 created "2016-06-24" @default.
- W1579239598 creator A5051926349 @default.
- W1579239598 date "2009-12-02" @default.
- W1579239598 modified "2023-09-28" @default.
- W1579239598 title "Content Discovery from Composite Audio: An unsupervised approach" @default.
- W1579239598 cites W143678519 @default.
- W1579239598 cites W1545046063 @default.
- W1579239598 cites W1548802052 @default.
- W1579239598 cites W1557074680 @default.
- W1579239598 cites W1560013842 @default.
- W1579239598 cites W1575829986 @default.
- W1579239598 cites W1576922930 @default.
- W1579239598 cites W1585610988 @default.
- W1579239598 cites W1607155129 @default.
- W1579239598 cites W1615454278 @default.
- W1579239598 cites W1660390307 @default.
- W1579239598 cites W1861596447 @default.
- W1579239598 cites W1940377967 @default.
- W1579239598 cites W1943968395 @default.
- W1579239598 cites W1954587310 @default.
- W1579239598 cites W1968491588 @default.
- W1579239598 cites W1971784203 @default.
- W1579239598 cites W1985333512 @default.
- W1579239598 cites W1985593448 @default.
- W1579239598 cites W2006180404 @default.
- W1579239598 cites W2039275978 @default.
- W1579239598 cites W2048390151 @default.
- W1579239598 cites W2049073556 @default.
- W1579239598 cites W2058189943 @default.
- W1579239598 cites W2062170755 @default.
- W1579239598 cites W2066636486 @default.
- W1579239598 cites W2075662881 @default.
- W1579239598 cites W2078306367 @default.
- W1579239598 cites W2083837083 @default.
- W1579239598 cites W2095892407 @default.
- W1579239598 cites W2097419036 @default.
- W1579239598 cites W2098707568 @default.
- W1579239598 cites W2098981776 @default.
- W1579239598 cites W2100813490 @default.
- W1579239598 cites W2102453655 @default.
- W1579239598 cites W2106055371 @default.
- W1579239598 cites W2106339924 @default.
- W1579239598 cites W2110215948 @default.
- W1579239598 cites W2111331420 @default.
- W1579239598 cites W2114025269 @default.
- W1579239598 cites W2114762973 @default.
- W1579239598 cites W2115453400 @default.
- W1579239598 cites W2119577488 @default.
- W1579239598 cites W2121947440 @default.
- W1579239598 cites W2124000984 @default.
- W1579239598 cites W2124660252 @default.
- W1579239598 cites W2125838338 @default.
- W1579239598 cites W2126109423 @default.
- W1579239598 cites W2127097022 @default.
- W1579239598 cites W2131981197 @default.
- W1579239598 cites W2132124894 @default.
- W1579239598 cites W2132603077 @default.
- W1579239598 cites W2133576408 @default.
- W1579239598 cites W2134584261 @default.
- W1579239598 cites W2135674549 @default.
- W1579239598 cites W2137343183 @default.
- W1579239598 cites W2137918516 @default.
- W1579239598 cites W2139855016 @default.
- W1579239598 cites W2141282920 @default.
- W1579239598 cites W2142524229 @default.
- W1579239598 cites W2144544802 @default.
- W1579239598 cites W2144577430 @default.
- W1579239598 cites W2147174722 @default.
- W1579239598 cites W2148898881 @default.
- W1579239598 cites W2149022377 @default.
- W1579239598 cites W2151299225 @default.
- W1579239598 cites W2152222281 @default.
- W1579239598 cites W2152322845 @default.
- W1579239598 cites W2154318594 @default.
- W1579239598 cites W2155754954 @default.
- W1579239598 cites W2156336347 @default.
- W1579239598 cites W2157933833 @default.
- W1579239598 cites W2158449659 @default.
- W1579239598 cites W2160149277 @default.
- W1579239598 cites W2160167256 @default.
- W1579239598 cites W2161755617 @default.
- W1579239598 cites W2165874743 @default.
- W1579239598 cites W2170798597 @default.
- W1579239598 cites W2339897949 @default.
- W1579239598 cites W2542529521 @default.
- W1579239598 cites W658559791 @default.
- W1579239598 hasPublicationYear "2009" @default.
- W1579239598 type Work @default.
- W1579239598 sameAs 1579239598 @default.
- W1579239598 citedByCount "0" @default.
- W1579239598 crossrefType "journal-article" @default.
- W1579239598 hasAuthorship W1579239598A5051926349 @default.
- W1579239598 hasConcept C104317684 @default.
- W1579239598 hasConcept C127220857 @default.
- W1579239598 hasConcept C13895895 @default.
- W1579239598 hasConcept C153180895 @default.
- W1579239598 hasConcept C154945302 @default.
- W1579239598 hasConcept C155635449 @default.
- W1579239598 hasConcept C157968479 @default.