Matches in SemOpenAlex for { <https://semopenalex.org/work/W359945596> ?p ?o ?g. }
- W359945596 abstract "Depending on the needs of a user, searching on Twitter streams has proven to be very useful for a wide range of applications. Twitter however, has large volumes of information available in the form of tweets, that are shared by mil- lions of users. We have observed that many of these tweets contain mutually redundant information and due to this, there is a consequent impact on the search and retrieval process on Twitter streams. Redundancy in the information pre- sented to a user, hinders with the objective of a search engine to limit a user’s effort, by presenting content that has already been seen. Detecting redundant information in tweets is all the more challenging, due to the colloquial nature of the language that is used. In addition to this, the presence of tweet-specific constructs like hash-tags, shortened URLs and @-mentions further complicates the task of detecting mutually redundant tweets or duplicate tweets. Although it is relatively easy to automatically detect duplicate tweets which are syntactically identical (i.e., the text in the tweets is identical), it is challenging to detect tweets that are semantically equivalent (i.e., tweets have the same underlying meaning) but syntactically different. By analyzing a large Twitter benchmark dataset (provided as a part of the TREC microblog search challenge) used for the micro-blog search task in 2011, we ob- served that there is a varying extent to which a pair of tweets can be duplicates. We developed a scale that can be used to measure the extent of semantic over- lapping in a pair of tweets. Through our analysis, we identified the key aspects around which strategies for the detection of duplicate tweets can be designed, namely, terminological, semantic and linguistic aspects. We designed and imple- mented strategies for the detection of duplicate tweets from that perspective. Our main contributions are four-fold. Firstly, we developed a scale that can be used to measure the degree of a pair of duplicate tweets. Secondly, we analyzed the top-k search results on Twitter and observed the extent to which duplicates appear in top-k retrieval. Thirdly, we designed and implemented the strategies for the detection of duplicates. Our best strategy combination results in a commend- able F-measure of just over 80% (0.82). Finally, we develop a prediction model that can predict the duplicity order of a tweet-pair with an average accuracy of over 80%." @default.
- W359945596 created "2016-06-24" @default.
- W359945596 creator A5038081564 @default.
- W359945596 date "2012-08-31" @default.
- W359945596 modified "2023-09-22" @default.
- W359945596 title "Detection of Duplicate Content on Twitter" @default.
- W359945596 cites W1532325895 @default.
- W359945596 cites W155754185 @default.
- W359945596 cites W1600782690 @default.
- W359945596 cites W1654173042 @default.
- W359945596 cites W1660390307 @default.
- W359945596 cites W1801078307 @default.
- W359945596 cites W1887296892 @default.
- W359945596 cites W1901600440 @default.
- W359945596 cites W1972594216 @default.
- W359945596 cites W1975583660 @default.
- W359945596 cites W1978052141 @default.
- W359945596 cites W1993318811 @default.
- W359945596 cites W1995672491 @default.
- W359945596 cites W2004214228 @default.
- W359945596 cites W2004902747 @default.
- W359945596 cites W2015186536 @default.
- W359945596 cites W2019177758 @default.
- W359945596 cites W2046804949 @default.
- W359945596 cites W2051323285 @default.
- W359945596 cites W2081580037 @default.
- W359945596 cites W2093842354 @default.
- W359945596 cites W2098162425 @default.
- W359945596 cites W2101196063 @default.
- W359945596 cites W2102428892 @default.
- W359945596 cites W2110166424 @default.
- W359945596 cites W2124156373 @default.
- W359945596 cites W2127785456 @default.
- W359945596 cites W2128509431 @default.
- W359945596 cites W2133990480 @default.
- W359945596 cites W2138801182 @default.
- W359945596 cites W2139398774 @default.
- W359945596 cites W2140173168 @default.
- W359945596 cites W2145111356 @default.
- W359945596 cites W2146341589 @default.
- W359945596 cites W2146867136 @default.
- W359945596 cites W2150449434 @default.
- W359945596 cites W2153848201 @default.
- W359945596 cites W2156037541 @default.
- W359945596 cites W2156496568 @default.
- W359945596 cites W2157765050 @default.
- W359945596 cites W215984515 @default.
- W359945596 cites W2166322082 @default.
- W359945596 cites W2172000360 @default.
- W359945596 cites W2308071406 @default.
- W359945596 cites W2406682228 @default.
- W359945596 cites W2603519596 @default.
- W359945596 cites W2898897959 @default.
- W359945596 cites W3122139608 @default.
- W359945596 cites W7530263 @default.
- W359945596 hasPublicationYear "2012" @default.
- W359945596 type Work @default.
- W359945596 sameAs 359945596 @default.
- W359945596 citedByCount "0" @default.
- W359945596 crossrefType "journal-article" @default.
- W359945596 hasAuthorship W359945596A5038081564 @default.
- W359945596 hasConcept C111919701 @default.
- W359945596 hasConcept C13280743 @default.
- W359945596 hasConcept C136764020 @default.
- W359945596 hasConcept C143275388 @default.
- W359945596 hasConcept C152124472 @default.
- W359945596 hasConcept C162324750 @default.
- W359945596 hasConcept C185798385 @default.
- W359945596 hasConcept C187736073 @default.
- W359945596 hasConcept C205649164 @default.
- W359945596 hasConcept C23123220 @default.
- W359945596 hasConcept C2780451532 @default.
- W359945596 hasConcept C41008148 @default.
- W359945596 hasConcept C518677369 @default.
- W359945596 hasConceptScore W359945596C111919701 @default.
- W359945596 hasConceptScore W359945596C13280743 @default.
- W359945596 hasConceptScore W359945596C136764020 @default.
- W359945596 hasConceptScore W359945596C143275388 @default.
- W359945596 hasConceptScore W359945596C152124472 @default.
- W359945596 hasConceptScore W359945596C162324750 @default.
- W359945596 hasConceptScore W359945596C185798385 @default.
- W359945596 hasConceptScore W359945596C187736073 @default.
- W359945596 hasConceptScore W359945596C205649164 @default.
- W359945596 hasConceptScore W359945596C23123220 @default.
- W359945596 hasConceptScore W359945596C2780451532 @default.
- W359945596 hasConceptScore W359945596C41008148 @default.
- W359945596 hasConceptScore W359945596C518677369 @default.
- W359945596 hasLocation W3599455961 @default.
- W359945596 hasOpenAccess W359945596 @default.
- W359945596 hasPrimaryLocation W3599455961 @default.
- W359945596 hasRelatedWork W1485941188 @default.
- W359945596 hasRelatedWork W1897246731 @default.
- W359945596 hasRelatedWork W1906699800 @default.
- W359945596 hasRelatedWork W1966421434 @default.
- W359945596 hasRelatedWork W1984629433 @default.
- W359945596 hasRelatedWork W2137520313 @default.
- W359945596 hasRelatedWork W2187232411 @default.
- W359945596 hasRelatedWork W2188514829 @default.
- W359945596 hasRelatedWork W2250992972 @default.
- W359945596 hasRelatedWork W2487843204 @default.