Matches in SemOpenAlex for { <https://semopenalex.org/work/W2471104003> ?p ?o ?g. }
- W2471104003 abstract "Support vector machines (SVMs) are a very popular method for binary classification. Traditional training algorithms for SVMs, such as chunking and SMO, scale superlinearly with the number of examples, which quickly becomes infeasible for large training sets. Since it has been commonly observed that dataset sizes have been growing steadily larger over the past few years, this necessitates the development of training algorithms that scale at worst linearly with the number of examples. We survey work on SVM training methods that target this large-scale learning regime. Most of these algorithms use either (1) variants of primal stochastic gradient descent (SGD), or (2) quadratic programming in the dual. For (1), we discuss why SGD generalizes well even though it is poor at optimization, and describe algorithms such as Pegasos and FOLOS that extend basic SGD to quickly solve the SVM problem. For (2), we survey recent methods such as dual coordinate-descent and BMRM, which have proven competitive with the SGD-based solvers. We also discuss the recent work of [Shalev-Shwartz and Srebro, 2008] that concludes that training time for SVMs should actually decrease as the training set size increases, and explain why SGD-based algorithms are able to satisfy this desideratum. 1. WHY LARGE-SCALE LEARNING? Supervised learning involves analyzing a given set of labelled observations (the training set) so as to predict the labels of unlabelled future data (the test set). Specifically, the goal is to learn some function that describes the relationship between observations and their labels. Archetypal examples of supervised learning include recognizing handwritten digits and spam classification. One parameter of interest for a supervised learning problem is the size of the training set. We call a learning problem large-scale if its training set cannot be stored in a modern computer’s memory [Langford, 2008]. A deeper definition of large-scale learning is that it consists of problems where the main computational constraint is the amount of time available, rather than the number of examples [Bottou and Bousquet, 2007]. A large training set poses a challenge for the computational complexity of a learning algorithm: in order for algorithms to be feasible on such datasets, they must scale at worst linearly with the number of examples. Most learning problems that have been studied thus far are mediumscale, in that they assume that the training set can be stored in memory and repeatedly scanned. However, with the growing volume of data in the last few years, we have started to see problems that are large-scale. An example of this is ad-click data for search engines. When most modern search engines produce results for a query, they also display a number of (hopefully) relevant ads. When the user clicks on an ad, the search engine receives some commission from the ad sponsor. This means that to price the ad reasonably, the search company needs to have a good estimate of whether, for a given query, an ad is likely to be clicked or not. One way to formulate this as a learning problem is to have training examples consisting of an ad and its corresponding search query, and a label denoting whether or not the ad was clicked. We wish to learn a classifier that tells us whether a given ad is likely to be clicked if it were generated for a given query. Given the volume of queries search engines process (Google processes around 7.5 billion queries a month [Searchenginewatch.com, 2008]), the potential size of such a training set can far exceed the memory capacity of a modern system. Conventional learning algorithms cannot handle such problems, because we can no longer store and have ready access to the data in memory. This necessitates the development of new algorithms, and a careful study of the challenges posed by this scale of problem. An extra motivation for studying such algorithms is that they can also be applied to medium-scale problems, which are still of immediate practical interest currently. Our focus in this document is how a support vector machine (SVM), a popular method for binary classification that is based on strong theory and enjoys good practical performance, can be scaled to work with large training sets. There have been two strands of work in the literature on this topic. The first is a theoretical analysis of the problem, in an attempt to understand how learning algorithms need to be changed to adapt to a large-scale setting. The other is the design of training algorithms for SVMs that work well for these large datasets, including the recent Pegasos solver [Shalev-Shwartz et al., 2007], which leverages the theoretical results on large-scale learning to actually decrease its runtime when given more examples. We discuss both strands, and attempt to identify the limitations of current solvers. First, let us define more precisely the large-scale setting that we are considering, and describe some general approaches to solving such problems. 1.1 Batch and online algorithms When we discuss supervised learning problems with a large training set, we are implicitly assuming that the learning is done in the batch framework. We do not focus on the online learning scenario, which consists of a potentially infinite stream of training examples presented one at a time, although such a setting can certainly be thought of as large-scale learning. However, it is possible for an online algorithm to solve a batch problem, and in fact this might be desirable in the large-scale setting, as we discuss below. More generally, an intermediate between batch and online algorithms is what we call an online-style algorithm. This is an algorithm that assumes a batch setting, but only uses a sublinear amount of memory, and whose computational complexity scales only sublinearly with the number of examples. This precludes batch algorithms that repeatedly process the training set at each iteration. A standard online algorithm can be converted into an online-style algorithm" @default.
- W2471104003 created "2016-07-22" @default.
- W2471104003 creator A5049656925 @default.
- W2471104003 date "2009-01-01" @default.
- W2471104003 modified "2023-09-26" @default.
- W2471104003 title "Large-Scale Support Vector Machines: Algorithms and Theory" @default.
- W2471104003 cites W1491622225 @default.
- W2471104003 cites W1512098439 @default.
- W2471104003 cites W1514940655 @default.
- W2471104003 cites W1530699444 @default.
- W2471104003 cites W1574862351 @default.
- W2471104003 cites W1585021961 @default.
- W2471104003 cites W1601740268 @default.
- W2471104003 cites W1604585277 @default.
- W2471104003 cites W19621276 @default.
- W2471104003 cites W1965059296 @default.
- W2471104003 cites W1975588358 @default.
- W2471104003 cites W2009593947 @default.
- W2471104003 cites W2015904350 @default.
- W2471104003 cites W2025732832 @default.
- W2471104003 cites W2030811966 @default.
- W2471104003 cites W2035720976 @default.
- W2471104003 cites W2051381803 @default.
- W2471104003 cites W2051434435 @default.
- W2471104003 cites W2070771761 @default.
- W2471104003 cites W2091825929 @default.
- W2471104003 cites W2102486516 @default.
- W2471104003 cites W2105636360 @default.
- W2471104003 cites W2105867876 @default.
- W2471104003 cites W2112530506 @default.
- W2471104003 cites W2113651538 @default.
- W2471104003 cites W2114690085 @default.
- W2471104003 cites W2117990954 @default.
- W2471104003 cites W2119821739 @default.
- W2471104003 cites W2120286392 @default.
- W2471104003 cites W2128097790 @default.
- W2471104003 cites W2138682935 @default.
- W2471104003 cites W2138745909 @default.
- W2471104003 cites W2139432235 @default.
- W2471104003 cites W2142623206 @default.
- W2471104003 cites W2147079026 @default.
- W2471104003 cites W2147898188 @default.
- W2471104003 cites W2149684865 @default.
- W2471104003 cites W2150621701 @default.
- W2471104003 cites W2155319834 @default.
- W2471104003 cites W2165966284 @default.
- W2471104003 cites W2296319761 @default.
- W2471104003 cites W3150049578 @default.
- W2471104003 cites W56743589 @default.
- W2471104003 hasPublicationYear "2009" @default.
- W2471104003 type Work @default.
- W2471104003 sameAs 2471104003 @default.
- W2471104003 citedByCount "14" @default.
- W2471104003 countsByYear W24711040032012 @default.
- W2471104003 countsByYear W24711040032013 @default.
- W2471104003 countsByYear W24711040032014 @default.
- W2471104003 countsByYear W24711040032016 @default.
- W2471104003 countsByYear W24711040032018 @default.
- W2471104003 countsByYear W24711040032019 @default.
- W2471104003 crossrefType "journal-article" @default.
- W2471104003 hasAuthorship W2471104003A5049656925 @default.
- W2471104003 hasConcept C11413529 @default.
- W2471104003 hasConcept C119857082 @default.
- W2471104003 hasConcept C121332964 @default.
- W2471104003 hasConcept C12267149 @default.
- W2471104003 hasConcept C126255220 @default.
- W2471104003 hasConcept C154945302 @default.
- W2471104003 hasConcept C177264268 @default.
- W2471104003 hasConcept C199360897 @default.
- W2471104003 hasConcept C203357204 @default.
- W2471104003 hasConcept C206688291 @default.
- W2471104003 hasConcept C2778755073 @default.
- W2471104003 hasConcept C33923547 @default.
- W2471104003 hasConcept C41008148 @default.
- W2471104003 hasConcept C50644808 @default.
- W2471104003 hasConcept C62520636 @default.
- W2471104003 hasConcept C66905080 @default.
- W2471104003 hasConcept C81845259 @default.
- W2471104003 hasConceptScore W2471104003C11413529 @default.
- W2471104003 hasConceptScore W2471104003C119857082 @default.
- W2471104003 hasConceptScore W2471104003C121332964 @default.
- W2471104003 hasConceptScore W2471104003C12267149 @default.
- W2471104003 hasConceptScore W2471104003C126255220 @default.
- W2471104003 hasConceptScore W2471104003C154945302 @default.
- W2471104003 hasConceptScore W2471104003C177264268 @default.
- W2471104003 hasConceptScore W2471104003C199360897 @default.
- W2471104003 hasConceptScore W2471104003C203357204 @default.
- W2471104003 hasConceptScore W2471104003C206688291 @default.
- W2471104003 hasConceptScore W2471104003C2778755073 @default.
- W2471104003 hasConceptScore W2471104003C33923547 @default.
- W2471104003 hasConceptScore W2471104003C41008148 @default.
- W2471104003 hasConceptScore W2471104003C50644808 @default.
- W2471104003 hasConceptScore W2471104003C62520636 @default.
- W2471104003 hasConceptScore W2471104003C66905080 @default.
- W2471104003 hasConceptScore W2471104003C81845259 @default.
- W2471104003 hasLocation W24711040031 @default.
- W2471104003 hasOpenAccess W2471104003 @default.
- W2471104003 hasPrimaryLocation W24711040031 @default.
- W2471104003 hasRelatedWork W1512098439 @default.
- W2471104003 hasRelatedWork W1563716968 @default.