Matches in SemOpenAlex for { <https://semopenalex.org/work/W2238333665> ?p ?o ?g. }
- W2238333665 endingPage "151" @default.
- W2238333665 startingPage "150" @default.
- W2238333665 abstract "The process whereby inferences are made from textual data is broadly referred to as text mining. In order to ensure the quality and effectiveness of the derived inferences, several approaches have been proposed for different text mining applications. Among these applications, classifying a piece of text into pre-defined classes through the utilisation of training data falls into supervised approaches while arranging related documents or terms into clusters falls into unsupervised approaches. In both these approaches, processing is undertaken at the level of documents to make sense of text within those documents. Recent research efforts have begun exploring the role of knowledge bases in solving the various problems that arise in the domain of text mining. Of all the knowledge bases, Wikipedia on account of being one of the largest human-curated, online encyclopaedia has proven to be one of the most valuable resources in dealing with various problems in the domain of text mining. However, previous Wikipedia-based research efforts have not taken both Wikipedia categories and Wikipedia articles together as a source of information. This thesis serves as a first step in eliminating this gap and throughout the contributions made in this thesis, we have shown the effectiveness of Wikipedia category-article structure for various text mining tasks. Wikipedia categories are organized in a taxonomical manner serving as semantic tags for Wikipedia articles and this provides a strong abstraction and expressive mode of knowledge representation. In this thesis, we explore the effectiveness of this mode of Wikipedia's expression (i.e., the category-article structure) via its application in the domains of text classification, subjectivity analysis (via a notion of perspective in news search), and keyword extraction. First, we show the effectiveness of exploiting Wikipedia for two classification tasks i.e., 1- classifying the tweets1 being relevant/irrelevant to an entity or brand, 2- classifying the tweets into different topical dimensions such as tweets related with workplace, innovation, etc. To do so, we define the notion of relatedness between the text in tweet and the information embedded within the Wikipedia category-article structure. Then, we present an application in the area of news search by using the same notion of relatedness to show more information related to each search result highlighting the amount perspective or subjective bias in each returned result towards a certain opinion, topical drift, etc. Finally, we present a keyword extraction strategy using community detection over the Wikipedia categories to discover related keywords arranged in different communities. The relationship between Wikipedia categories and articles is explored via a textual phrase matching framework whereby the starting point is textual phrases that match Wikipedia articles' titles/redirects. The Wikipedia articles for which a match occurs are then utilised by extraction of their associated categories, and these Wikipedia categories are used to derive various structural measures such as those relating to taxonomical depth and Wikipedia articles they contain. These measures are utilised in our proposed text classification, subjectivity analysis, and keyword extraction framework and the performance is analysed via extensive experimental evaluations. These experimental evaluations undertake comparisons with standard text mining approaches in the literature and our Wikipedia framework based on its category-article structure outperforms the standard text mining techniques." @default.
- W2238333665 created "2016-06-24" @default.
- W2238333665 creator A5070809975 @default.
- W2238333665 date "2016-01-29" @default.
- W2238333665 modified "2023-10-16" @default.
- W2238333665 title "Utilising Wikipedia for Text Mining Applications" @default.
- W2238333665 cites W102708294 @default.
- W2238333665 cites W103965747 @default.
- W2238333665 cites W107935384 @default.
- W2238333665 cites W1119807432 @default.
- W2238333665 cites W14574270 @default.
- W2238333665 cites W1482214997 @default.
- W2238333665 cites W1486865875 @default.
- W2238333665 cites W1490343430 @default.
- W2238333665 cites W1505083828 @default.
- W2238333665 cites W1520377376 @default.
- W2238333665 cites W1521908097 @default.
- W2238333665 cites W1525595230 @default.
- W2238333665 cites W1526703372 @default.
- W2238333665 cites W1533510585 @default.
- W2238333665 cites W1533642089 @default.
- W2238333665 cites W1537118343 @default.
- W2238333665 cites W1544240449 @default.
- W2238333665 cites W1548663377 @default.
- W2238333665 cites W1549343721 @default.
- W2238333665 cites W158057341 @default.
- W2238333665 cites W1593239840 @default.
- W2238333665 cites W1598683382 @default.
- W2238333665 cites W1646006088 @default.
- W2238333665 cites W1654905138 @default.
- W2238333665 cites W167355512 @default.
- W2238333665 cites W1724344851 @default.
- W2238333665 cites W1743429370 @default.
- W2238333665 cites W177984263 @default.
- W2238333665 cites W1782572861 @default.
- W2238333665 cites W1788602 @default.
- W2238333665 cites W1812489009 @default.
- W2238333665 cites W184397588 @default.
- W2238333665 cites W1854214752 @default.
- W2238333665 cites W187228978 @default.
- W2238333665 cites W1880262756 @default.
- W2238333665 cites W1897880214 @default.
- W2238333665 cites W1907578970 @default.
- W2238333665 cites W1958077162 @default.
- W2238333665 cites W1964424111 @default.
- W2238333665 cites W1967345182 @default.
- W2238333665 cites W1970544520 @default.
- W2238333665 cites W1971421925 @default.
- W2238333665 cites W1971784203 @default.
- W2238333665 cites W1972644898 @default.
- W2238333665 cites W197270748 @default.
- W2238333665 cites W1974339500 @default.
- W2238333665 cites W1974820763 @default.
- W2238333665 cites W1975879668 @default.
- W2238333665 cites W1978394996 @default.
- W2238333665 cites W1979068940 @default.
- W2238333665 cites W1983315846 @default.
- W2238333665 cites W1983873791 @default.
- W2238333665 cites W1992914835 @default.
- W2238333665 cites W1993320088 @default.
- W2238333665 cites W1994081067 @default.
- W2238333665 cites W1996359281 @default.
- W2238333665 cites W1997486487 @default.
- W2238333665 cites W2000569744 @default.
- W2238333665 cites W2004384146 @default.
- W2238333665 cites W2012561700 @default.
- W2238333665 cites W2013416264 @default.
- W2238333665 cites W2013579020 @default.
- W2238333665 cites W2013988990 @default.
- W2238333665 cites W2015953751 @default.
- W2238333665 cites W2017729405 @default.
- W2238333665 cites W2020278455 @default.
- W2238333665 cites W2022166150 @default.
- W2238333665 cites W2023408138 @default.
- W2238333665 cites W2024278742 @default.
- W2238333665 cites W2026439336 @default.
- W2238333665 cites W2026487812 @default.
- W2238333665 cites W2027491471 @default.
- W2238333665 cites W2028007965 @default.
- W2238333665 cites W2028009320 @default.
- W2238333665 cites W2030903088 @default.
- W2238333665 cites W2031046392 @default.
- W2238333665 cites W2031160476 @default.
- W2238333665 cites W2032218291 @default.
- W2238333665 cites W2034721576 @default.
- W2238333665 cites W2035569891 @default.
- W2238333665 cites W2037802158 @default.
- W2238333665 cites W2045181608 @default.
- W2238333665 cites W2048585890 @default.
- W2238333665 cites W2050125880 @default.
- W2238333665 cites W2050331639 @default.
- W2238333665 cites W2051442094 @default.
- W2238333665 cites W2053606794 @default.
- W2238333665 cites W2055518963 @default.
- W2238333665 cites W2060704337 @default.
- W2238333665 cites W2060772621 @default.
- W2238333665 cites W2061834489 @default.
- W2238333665 cites W2063142656 @default.