Matches in SemOpenAlex for { <https://semopenalex.org/work/W2104975219> ?p ?o ?g. }
- W2104975219 abstract "Explosive growth in data size, data complexity, and data rates, triggered by emergence of high-throughput technologies such as remote sensing, crowd-sourcing, social networks, or computational advertising, in recent years has led to an increasing availability of data sets of unprecedented scales, with billions of high-dimensional data examples stored on hundreds of terabytes of memory. In order to make use of this large-scale data and extract useful knowledge, researchers in machine learning and data mining communities are faced with numerous challenges, since the data mining and machine learning tools designed for standard desktop computers are not capable of addressing these problems due to memory and time constraints. As a result, there exists an evident need for development of novel, scalable algorithms for big data. In this thesis we address these important problems, and propose both supervised and unsupervised tools for handling large-scale data. First, we consider unsupervised approach to big data analysis, and explore scalable, efficient visualization method that allows fast knowledge extraction. Next, we consider supervised learning setting and propose algorithms for fast training of accurate classification models on large data sets, capable of learning state-of-the-art classifiers on data sets with millions of examples and features within minutes. Data visualization have been used for hundreds of years in scientific research, as it allows humans to easily get a better insight into complex data they are studying. Despite its long history, there is a clear need for further development of visualization methods when working with large-scale, high-dimensional data, where commonly used visualization tools are either too simplistic to gain a deeper insight into the data properties, or are too cumbersome or computationally costly. We present a novel method for data ordering and visualization. By combining efficient clustering using k-means algorithm and near-optimal ordering of found clusters using state-of-the-art TSP-solver, we obtain efficient algorithm that achieves performance better than existing, computationally intensive methods. In addition, we present visualization method for smaller-scale problems based on object matching. The experiments show that the methods allow for fast detection of hidden patterns, even by users without expertise in the areas of data mining and machine learning. Supervised learning is another important task, often intractable in many modern applications due to time and memory constraints, considering prohibitively large scales of the data sets. To address this issue, we first consider Multi-hyperplane Machine (MM) classification model, and propose online Adaptive MM algorithm which represents a trade-off between linear and kernel Support Vector Machines (SVMs), as it trains MMs in linear time on limited memory while achieving competitive accuracies on large-scale non-linear problems. Moreover, we present a C++ toolbox for developing scalable classification models, which provides an Application Programming Interface (API) for training of large-scale classifiers, as well as highly-optimized implementations of several state-of-the-art SVM approximators. Lastly, we consider parallelization and distributed learning approaches to large-scale supervised learning, and propose AROW-MapReduce, a distributed learning algorithm for confidence-weighted models using MapReduce framework. Experimental evaluation of the proposed methods shows state-of-the-art performance on a number of synthetic and real-world data sets, further paving a way for efficient and effective knowledge extraction from big data problems." @default.
- W2104975219 created "2016-06-24" @default.
- W2104975219 creator A5059847153 @default.
- W2104975219 creator A5082498217 @default.
- W2104975219 date "2013-01-01" @default.
- W2104975219 modified "2023-09-24" @default.
- W2104975219 title "Big data algorithms for visualization and supervised learning" @default.
- W2104975219 cites W1512098439 @default.
- W2104975219 cites W1516981301 @default.
- W2104975219 cites W1521843029 @default.
- W2104975219 cites W1525460779 @default.
- W2104975219 cites W1571087346 @default.
- W2104975219 cites W1581799170 @default.
- W2104975219 cites W1596354426 @default.
- W2104975219 cites W1604938182 @default.
- W2104975219 cites W1605479404 @default.
- W2104975219 cites W1669813703 @default.
- W2104975219 cites W1923046654 @default.
- W2104975219 cites W1924623148 @default.
- W2104975219 cites W1946137962 @default.
- W2104975219 cites W1966771059 @default.
- W2104975219 cites W1966815444 @default.
- W2104975219 cites W1970722248 @default.
- W2104975219 cites W1975442866 @default.
- W2104975219 cites W1978954664 @default.
- W2104975219 cites W1979081881 @default.
- W2104975219 cites W1985419898 @default.
- W2104975219 cites W1991174119 @default.
- W2104975219 cites W1992907670 @default.
- W2104975219 cites W1999608852 @default.
- W2104975219 cites W2001141328 @default.
- W2104975219 cites W2003447360 @default.
- W2104975219 cites W2009683816 @default.
- W2104975219 cites W2010150441 @default.
- W2104975219 cites W2014044260 @default.
- W2104975219 cites W2017708378 @default.
- W2104975219 cites W2022179164 @default.
- W2104975219 cites W2025341678 @default.
- W2104975219 cites W2028678069 @default.
- W2104975219 cites W2035720976 @default.
- W2104975219 cites W2042986967 @default.
- W2104975219 cites W2044221681 @default.
- W2104975219 cites W2047046780 @default.
- W2104975219 cites W2050883273 @default.
- W2104975219 cites W2053186076 @default.
- W2104975219 cites W2054322519 @default.
- W2104975219 cites W2058937865 @default.
- W2104975219 cites W205960552 @default.
- W2104975219 cites W2066356146 @default.
- W2104975219 cites W2069840066 @default.
- W2104975219 cites W2071467620 @default.
- W2104975219 cites W2081347936 @default.
- W2104975219 cites W2085955040 @default.
- W2104975219 cites W2091825929 @default.
- W2104975219 cites W2095895508 @default.
- W2104975219 cites W2096544401 @default.
- W2104975219 cites W2097156270 @default.
- W2104975219 cites W2097308346 @default.
- W2104975219 cites W2099262739 @default.
- W2104975219 cites W2100439325 @default.
- W2104975219 cites W2101577816 @default.
- W2104975219 cites W2103633133 @default.
- W2104975219 cites W2109436073 @default.
- W2104975219 cites W2109722477 @default.
- W2104975219 cites W2115364117 @default.
- W2104975219 cites W2115440939 @default.
- W2104975219 cites W2117990954 @default.
- W2104975219 cites W2118585731 @default.
- W2104975219 cites W2119430004 @default.
- W2104975219 cites W2119738171 @default.
- W2104975219 cites W2119821739 @default.
- W2104975219 cites W2121990650 @default.
- W2104975219 cites W2125564169 @default.
- W2104975219 cites W2125993116 @default.
- W2104975219 cites W2126321334 @default.
- W2104975219 cites W2129379475 @default.
- W2104975219 cites W2131456395 @default.
- W2104975219 cites W2133120537 @default.
- W2104975219 cites W2135106139 @default.
- W2104975219 cites W2137515395 @default.
- W2104975219 cites W2137983211 @default.
- W2104975219 cites W2138754805 @default.
- W2104975219 cites W2139224857 @default.
- W2104975219 cites W2139688603 @default.
- W2104975219 cites W2140406733 @default.
- W2104975219 cites W2140541004 @default.
- W2104975219 cites W2141642784 @default.
- W2104975219 cites W2142248489 @default.
- W2104975219 cites W2142623206 @default.
- W2104975219 cites W2143570267 @default.
- W2104975219 cites W2144902422 @default.
- W2104975219 cites W2145646037 @default.
- W2104975219 cites W2146077544 @default.
- W2104975219 cites W2150621701 @default.
- W2104975219 cites W2150926065 @default.
- W2104975219 cites W2151530263 @default.
- W2104975219 cites W2151922881 @default.
- W2104975219 cites W2152132697 @default.
- W2104975219 cites W2153635508 @default.
- W2104975219 cites W2155319834 @default.