Matches in SemOpenAlex for { <https://semopenalex.org/work/W1812489009> ?p ?o ?g. }
- W1812489009 abstract "With the rise of social media channels such as Twitter �the most popular microblogging service� the control of what is said about entities �companies, people or products� online has been shifted from them to users and consumers. This has generated the necessity of monitoring the reputation of those entities online. In this context, it is only natural to witness a significant growth of demand for text mining software for Online Reputation Monitoring: automatic tools that help processing, understanding and aggregating large streams of facts and opinions about a company or individual. Despite the variety of Online Reputation Monitoring tools on the market, there is no standard evaluation framework yet �a widely accepted set of task definitions, evaluation measures and reusable test collections to tackle this problem. In fact, there is even no consensus on what the tasks carried out during the Online Reputation Monitoring process are, on which a system should minimize the effort of the user. In the context of a collective effort to identify and formalize the main challenges in the Online Reputation Monitoring process in Twitter, we have participated in the definition of tasks and subsequent creation of suitable test collections (WePS-3, RepLab 2012 and RepLab 2013 evaluation campaigns) and we have studied in depth two of the identified challenges: filtering (Is a tweet related to a given entity of interest?) �modeled as a binary classification task� and topic detection (What is being said about an entity in a given tweet stream?), that consists of clustering tweets according to topics. Compared to previous studies on Twitter, our problem lies in its long tail: except for a few exceptions, the volume of information related to a specific entity (organization or company) at a given time is orders of magnitude smaller than Twitter trending topics, making the problem much more challenging than identifying Twitter trends. We rely on three building blocks to propose different approaches to tackle these two tasks : the use of filter keywords, external resources (such as Wikipedia, representative pages of the entity of interest, etc.) and the use of entity-specific training data when available. We have found that the notion of filter keywords �expressions that, if present in a tweet, indicate a high probability that it is either related or unrelated to the entity of interest� can be effectively used to tackle the filtering task. Here, (i) specificity of a term to the tweet stream of the entity is a useful feature to identify keywords, and (ii) the association between a term and the entity�s Wikipedia page is useful to differentiate positive vs. negative filter keywords, especially when it is averaged by considering its most co-occurrent terms. In addition, exploring the nature of filter keywords also led us to the conclusion that there is a gap between the vocabulary that characterizes a company in Twitter and the vocabulary associated to the company in its homepage, in Wikipedia, and even in the Web at large. We have also found that, when entity-specific training data is available �as in the known-entity scenario� it is more cost effective to use a simple Bag-of-Words classifier. When enough training data is available (around 700 tweets per entity), Bag-of-Words classifiers can be effectively used for the filtering task. Moreover, they can be used effectively in an active learning scenario, where the system updates its classification model with the stream of annotations and interactions with the system made by the reputation expert along the monitoring process. In this context, we found that by selecting the tweets to be labeled as those on which the classifier is less confident (margin sampling), the cost of creating a bulk training set can be reduced by 90% after inspecting 10% of test data. Unlike many other applications of active learning on Natural Language Processing tasks, margin sampling works better than random sampling. As for the topic detection problem, we considered two main strategies: the first is inspired on the notion of filter keywords and works by clustering terms as an intermediate step towards document clustering. The second � and most successful � learns a pairwise tweet similarity function from previously annotated data, using all kinds of content-based and Twitter-based features; and then applies a clustering algorithm on the previously learned similarity function. Our experiments indicate that (i) Twitter signals can be used to improve the topic detection process with respect to using content signals only; (ii) learning a similarity function is a flexible and efficient way of introducing supervision in the topic detection clustering process. The performance of our best system is substantially better than state-of-the-art approaches and gets close to the inter-annotator agreement rate of topic detection annotations in the RepLab 2013 dataset �to our knowledge, the largest dataset available for Online Reputation Monitoring. A detailed qualitative inspection of the data further reveals two types of topics detected by reputation experts: reputation alerts / issues (which usually spike in time) and organizational topics (which are usually stable across time). Along with our contribution to building a standard evaluation framework to study the Online Reputation Monitoring problem from a scientific perspective, we believe that the outcome of our research has practical implications and may help the development of semi-automatic tools to assist reputation experts in their daily work." @default.
- W1812489009 created "2016-06-24" @default.
- W1812489009 creator A5066915873 @default.
- W1812489009 date "2014-01-01" @default.
- W1812489009 modified "2023-09-24" @default.
- W1812489009 title "Entity-based filtering and topic detection For online reputation monitoring in Twitter" @default.
- W1812489009 cites W11244355 @default.
- W1812489009 cites W123226248 @default.
- W1812489009 cites W1257366642 @default.
- W1812489009 cites W1447066968 @default.
- W1812489009 cites W147379627 @default.
- W1812489009 cites W1494273468 @default.
- W1812489009 cites W1496393218 @default.
- W1812489009 cites W1498797055 @default.
- W1812489009 cites W1502876877 @default.
- W1812489009 cites W1510293779 @default.
- W1812489009 cites W1525595230 @default.
- W1812489009 cites W1527181724 @default.
- W1812489009 cites W1532325895 @default.
- W1812489009 cites W1533510585 @default.
- W1812489009 cites W1534625513 @default.
- W1812489009 cites W1537118343 @default.
- W1812489009 cites W1541691357 @default.
- W1812489009 cites W1548209984 @default.
- W1812489009 cites W1548663377 @default.
- W1812489009 cites W1555354714 @default.
- W1812489009 cites W1558753140 @default.
- W1812489009 cites W1567663560 @default.
- W1812489009 cites W1570448133 @default.
- W1812489009 cites W158057341 @default.
- W1812489009 cites W1599935123 @default.
- W1812489009 cites W1601068082 @default.
- W1812489009 cites W1601302864 @default.
- W1812489009 cites W1647729745 @default.
- W1812489009 cites W171888312 @default.
- W1812489009 cites W1728658630 @default.
- W1812489009 cites W1752870744 @default.
- W1812489009 cites W1800296434 @default.
- W1812489009 cites W1814023381 @default.
- W1812489009 cites W1828219555 @default.
- W1812489009 cites W1828830618 @default.
- W1812489009 cites W184373350 @default.
- W1812489009 cites W1880262756 @default.
- W1812489009 cites W1907578970 @default.
- W1812489009 cites W1922017469 @default.
- W1812489009 cites W1965555277 @default.
- W1812489009 cites W1965751147 @default.
- W1812489009 cites W1967274749 @default.
- W1812489009 cites W1972978214 @default.
- W1812489009 cites W1973897992 @default.
- W1812489009 cites W1974877536 @default.
- W1812489009 cites W1977004736 @default.
- W1812489009 cites W1977931290 @default.
- W1812489009 cites W1978394996 @default.
- W1812489009 cites W1978588883 @default.
- W1812489009 cites W1990709028 @default.
- W1812489009 cites W199708266 @default.
- W1812489009 cites W2000200507 @default.
- W1812489009 cites W2001653897 @default.
- W1812489009 cites W2013579020 @default.
- W1812489009 cites W201361503 @default.
- W1812489009 cites W2014902591 @default.
- W1812489009 cites W2018165284 @default.
- W1812489009 cites W2018277822 @default.
- W1812489009 cites W2020278455 @default.
- W1812489009 cites W2021314079 @default.
- W1812489009 cites W2024278742 @default.
- W1812489009 cites W2037140704 @default.
- W1812489009 cites W2046804949 @default.
- W1812489009 cites W2051442094 @default.
- W1812489009 cites W2053968437 @default.
- W1812489009 cites W2060772621 @default.
- W1812489009 cites W2062467277 @default.
- W1812489009 cites W2063904635 @default.
- W1812489009 cites W2065866960 @default.
- W1812489009 cites W2068882115 @default.
- W1812489009 cites W2069870183 @default.
- W1812489009 cites W2072240081 @default.
- W1812489009 cites W2097388131 @default.
- W1812489009 cites W2097606805 @default.
- W1812489009 cites W2097726431 @default.
- W1812489009 cites W2098647075 @default.
- W1812489009 cites W2100341149 @default.
- W1812489009 cites W2101196063 @default.
- W1812489009 cites W2102733276 @default.
- W1812489009 cites W2103759455 @default.
- W1812489009 cites W2105400363 @default.
- W1812489009 cites W2105745072 @default.
- W1812489009 cites W2107610218 @default.
- W1812489009 cites W2107743791 @default.
- W1812489009 cites W2108646579 @default.
- W1812489009 cites W2112699412 @default.
- W1812489009 cites W2113125055 @default.
- W1812489009 cites W2113227740 @default.
- W1812489009 cites W2113586662 @default.
- W1812489009 cites W2113889316 @default.
- W1812489009 cites W2114134660 @default.
- W1812489009 cites W2115352105 @default.
- W1812489009 cites W2116216325 @default.
- W1812489009 cites W2116235440 @default.