Matches in SemOpenAlex for { <https://semopenalex.org/work/W602665507> ?p ?o ?g. }
- W602665507 abstract "This dissertation studies the problem of preparing good-quality social network data for data analysis and mining. Modern online social networks such as Twitter, Facebook, and LinkedIn have rapidly grown in popularity. The consequent availability of a wealth of social network data provides an unprecedented opportunity for data analysis and mining researchers to determine useful and actionable information in a wide variety of fields such as social sciences, marketing, management, and security. However, raw social network data are vast, noisy, distributed, and sensitive in nature, which challenge data mining and analysis tasks in storage, efficiency, accuracy, etc. Many mining algorithms cannot operate or generate accurate results on the vast and messy data. Thus social network data preparation deserves special attention as it processes raw data and transforms them into usable forms for data mining and analysis tasks. Data preparation consists of four main steps, namely data collection, data cleaning, data reduction, and data conversion, each of which deals with different challenges of the raw data. In this dissertation, we consider three important problems related to the data collection and data conversion steps in social network data preparation. The first problem is the sampling issue for social network data collection. Restricted by processing power and resources, most research that analyzes user-generated content from social networks relies on samples obtained via social network APIs. But the lack of consideration for the quality and potential bias of the samples reduces the effectiveness and validity of the analysis results. To fill this gap, in the first work of the dissertation, we perform an exploratory analysis of data samples obtained from social network stream APIs to understand the representativeness of the samples to the corresponding complete data and their potential for use in various data mining tasks. The second problem is the privacy protection issue at the data conversion step. We discover a new type of attacks in which malicious adversaries utilize the connection information of a victim (anonymous) user to some known public users in a social network to re-identify the user and compromise identity privacy. We name this type of attacks connection fingerprint (CFP) attacks. In the second work of the dissertation, we investigate the potential risk of CFP attacks on social networks and propose two efficient k-anonymity-based network conversion algorithms to protect social networks against CFP attacks and preserve the utility of converted networks. The third problem is the utility issue in privacy preserving data conversion. Existing k-anonymization algorithms convert networks to protect privacy via modifying edges, and they preserve utility by minimizing the number of edges modified. We find this simple utility model cannot reflect real utility changes of networks with complex structure. Thus, existing k-anonymization algorithms designed based on this simple utility model cannot guarantee generating social networks with high utility. To solve this problem, in the third work of this dissertation, we propose a new utility benchmark that directly measures the change on network community structure caused by a network conversion algorithm. We also design a general kanonymization algorithm framework based on this new utility model. Our algorithm can significantly improve the utility of generated networks compared with existing algorithms. Our work in this dissertation emphasizes the importance of data preparation for social network analysis and mining tasks. Our study of the sampling issue in social network collection provides guidelines for people to use or not to use sampled social network content data for their research. Our work on privacy preserving social network conversion provides methods to better protect the identity privacy of social network users and maintain the utility of social network data." @default.
- W602665507 created "2016-06-24" @default.
- W602665507 creator A5033238983 @default.
- W602665507 date "2014-01-01" @default.
- W602665507 modified "2023-09-27" @default.
- W602665507 title "Data Preparation for Social Network Mining and Analysis" @default.
- W602665507 cites W1491317234 @default.
- W602665507 cites W1517590677 @default.
- W602665507 cites W1526460642 @default.
- W602665507 cites W153069569 @default.
- W602665507 cites W1533841329 @default.
- W602665507 cites W1549390321 @default.
- W602665507 cites W1581750077 @default.
- W602665507 cites W1587629899 @default.
- W602665507 cites W1669474105 @default.
- W602665507 cites W1873763122 @default.
- W602665507 cites W1896743484 @default.
- W602665507 cites W1964878798 @default.
- W602665507 cites W1965936846 @default.
- W602665507 cites W1967579779 @default.
- W602665507 cites W1969890816 @default.
- W602665507 cites W1976320242 @default.
- W602665507 cites W1977843657 @default.
- W602665507 cites W1986678144 @default.
- W602665507 cites W1986697120 @default.
- W602665507 cites W1986828474 @default.
- W602665507 cites W1992263322 @default.
- W602665507 cites W1998091733 @default.
- W602665507 cites W1998801249 @default.
- W602665507 cites W2000200507 @default.
- W602665507 cites W2010273307 @default.
- W602665507 cites W2017509273 @default.
- W602665507 cites W2018165284 @default.
- W602665507 cites W2019331423 @default.
- W602665507 cites W2029852131 @default.
- W602665507 cites W2032186932 @default.
- W602665507 cites W2033995706 @default.
- W602665507 cites W204323566 @default.
- W602665507 cites W2046804949 @default.
- W602665507 cites W2047624089 @default.
- W602665507 cites W2047940964 @default.
- W602665507 cites W2060009247 @default.
- W602665507 cites W2061901927 @default.
- W602665507 cites W2065228400 @default.
- W602665507 cites W20667085 @default.
- W602665507 cites W2071776923 @default.
- W602665507 cites W2084012970 @default.
- W602665507 cites W2089077498 @default.
- W602665507 cites W2089458547 @default.
- W602665507 cites W2096296626 @default.
- W602665507 cites W2101196063 @default.
- W602665507 cites W2108614537 @default.
- W602665507 cites W2109369020 @default.
- W602665507 cites W2113401427 @default.
- W602665507 cites W2114734373 @default.
- W602665507 cites W2115022330 @default.
- W602665507 cites W2115209166 @default.
- W602665507 cites W2117410972 @default.
- W602665507 cites W2118519969 @default.
- W602665507 cites W2119404697 @default.
- W602665507 cites W2121761994 @default.
- W602665507 cites W2122710250 @default.
- W602665507 cites W2124499489 @default.
- W602665507 cites W2124849257 @default.
- W602665507 cites W2128248866 @default.
- W602665507 cites W2136486572 @default.
- W602665507 cites W2137135938 @default.
- W602665507 cites W2137349054 @default.
- W602665507 cites W2137845970 @default.
- W602665507 cites W2139575250 @default.
- W602665507 cites W2140096141 @default.
- W602665507 cites W2143445293 @default.
- W602665507 cites W2144143615 @default.
- W602665507 cites W2146008005 @default.
- W602665507 cites W2147734964 @default.
- W602665507 cites W2149510050 @default.
- W602665507 cites W2150611147 @default.
- W602665507 cites W2151961366 @default.
- W602665507 cites W2152284345 @default.
- W602665507 cites W2153910905 @default.
- W602665507 cites W2155058903 @default.
- W602665507 cites W2158908968 @default.
- W602665507 cites W2159024459 @default.
- W602665507 cites W2159397589 @default.
- W602665507 cites W2163263459 @default.
- W602665507 cites W2165081249 @default.
- W602665507 cites W2165515835 @default.
- W602665507 cites W2165971212 @default.
- W602665507 cites W2168332560 @default.
- W602665507 cites W2168433392 @default.
- W602665507 cites W2169861334 @default.
- W602665507 cites W2171468534 @default.
- W602665507 cites W2171935404 @default.
- W602665507 cites W2256977455 @default.
- W602665507 cites W2406036028 @default.
- W602665507 cites W2481169411 @default.
- W602665507 cites W2542727820 @default.
- W602665507 cites W2579030831 @default.
- W602665507 cites W3122139608 @default.
- W602665507 hasPublicationYear "2014" @default.