Matches in SemOpenAlex for { <https://semopenalex.org/work/W129487613> ?p ?o ?g. }
Showing items 1 to 82 of
82
with 100 items per page.
- W129487613 abstract "This thesis studies the problem of learning mixtures of distributions, a natural formalization of clustering. A mixture of distributions is a collection of distributions D = {D1, ... DT}, and mixing weights, {w1,..., wT} such that Σiwi = 1. A sample from a mixture is generated by choosing i with probability wi and choosing a sample from distribution Di. Given samples from a mixture of distributions, the problem of learning the mixture is that of finding the parameters of the distributions comprising D and grouping the samples according to source distribution. A common theoretical framework for addressing the problem also assumes that we are given a separation condition, which is a promise that any two distributions in the mixture are sufficiently different. In this thesis, we study three aspects of the problem. First, in Chapter 3, we focus on optimizing the separation condition while learning mixtures of distributions. The most common algorithms in practice are singular value decomposition based algorithms, which work when the separation is Θ pswmin p , where σ is the maximum directional standard deviation of any distribution in the mixture, and wmin is the minimum mixing weight. We show an algorithm which successfully learns mixtures of distributions with a separation condition that depends only logarithmically on the skewed mixing weights. In particular, it succeeds for a separation between the centers that is Θ pTlogLp , where T is the number of distributions, and Λ is polynomial in T and the imbalance in the mixing weights. We require that the distance between the centers be spread across Θ(T log Λ) coordinates. In addition, we show that if every vector in the subspace spanned by the centers has a small projection, of the order of 1TlogL on each coordinate vector, then, our algorithm succeeds for a separation of only O ps*TlogL p , where σ* is the maximum directional standard deviation in the space containing the centers. Our algorithm works for Binary Product Distributions and Axis-Aligned Gaussians. The spreading condition above is implied by the separation condition for binary product distributions, and is necessary for algorithms that rely on linear correlations. Motivated by the application in population genetics, in Chapter 4, we study the sample complexity of learning mixtures of binary product distributions. In this thesis, we take a step towards learning mixtures of binary product distributions with optimal sample complexity by providing an algorithm which learns a mixture of two binary product distributions with uniform mixing weights and low sample complexity. Our algorithm clusters all the samples correctly with high probability, so long as d(μ1, μ 2) the square of the Euclidean distance between the centers of distributions is at least polylogarithmic in s, the number of samples and the following trade-off holds between the separation and the number of samples: sdd2pm1,m 2p≥adnlogslogpnsp for some constant a. Finally, in Chapter 5, we study the problem of learning mixtures of heavy-tailed product distributions. To this end, we provide an embedding from R n to {0, 1}n', which maps random samples from distributions with medians that are far apart to random samples from distributions on {0, 1}n', with centers that are far apart. The main application of our embedding is in designing an algorithm for learning mixtures of heavy-tailed distributions. We provide a polynomial-time algorithm, which learns mixtures of general product distributions, as long as the distribution of each coordinate satisfies two properties: symmetry about the median and ¾-radius upper-bounded by R. The separation required by our algorithm to correctly classify a 1–δ fraction of the samples is that the distance between the medians of any two distributions in the mixture should be O pRTlogL+R TlogTd p , and this distance should be spread across O(T log Λ + T log Td ) coordinates. A second application of our embedding is in designing algorithms for learning mixtures of distributions with finite variance, which work under a separation requirement of O ps*TlogL p and a spreading requirement of O(T log Λ + T log Td ). This algorithm does not require the more stringent spreading condition needed by the algorithm which offers similar guarantees in Chapter 3." @default.
- W129487613 created "2016-06-24" @default.
- W129487613 creator A5010790447 @default.
- W129487613 creator A5082369624 @default.
- W129487613 date "2007-01-01" @default.
- W129487613 modified "2023-09-27" @default.
- W129487613 title "Learning mixtures of distributions" @default.
- W129487613 cites W1530239281 @default.
- W129487613 cites W1549189261 @default.
- W129487613 cites W1554772789 @default.
- W129487613 cites W1573820523 @default.
- W129487613 cites W1574816920 @default.
- W129487613 cites W1605711022 @default.
- W129487613 cites W1772739125 @default.
- W129487613 cites W1956647075 @default.
- W129487613 cites W1969015668 @default.
- W129487613 cites W1976238508 @default.
- W129487613 cites W1989274820 @default.
- W129487613 cites W2001536670 @default.
- W129487613 cites W2098126593 @default.
- W129487613 cites W2146756121 @default.
- W129487613 cites W2798909945 @default.
- W129487613 cites W305865050 @default.
- W129487613 hasPublicationYear "2007" @default.
- W129487613 type Work @default.
- W129487613 sameAs 129487613 @default.
- W129487613 citedByCount "4" @default.
- W129487613 countsByYear W1294876132014 @default.
- W129487613 crossrefType "journal-article" @default.
- W129487613 hasAuthorship W129487613A5010790447 @default.
- W129487613 hasAuthorship W129487613A5082369624 @default.
- W129487613 hasConcept C105795698 @default.
- W129487613 hasConcept C110121322 @default.
- W129487613 hasConcept C121332964 @default.
- W129487613 hasConcept C134306372 @default.
- W129487613 hasConcept C138777275 @default.
- W129487613 hasConcept C149441793 @default.
- W129487613 hasConcept C197055811 @default.
- W129487613 hasConcept C33923547 @default.
- W129487613 hasConcept C56672385 @default.
- W129487613 hasConcept C61224824 @default.
- W129487613 hasConcept C62520636 @default.
- W129487613 hasConcept C73555534 @default.
- W129487613 hasConceptScore W129487613C105795698 @default.
- W129487613 hasConceptScore W129487613C110121322 @default.
- W129487613 hasConceptScore W129487613C121332964 @default.
- W129487613 hasConceptScore W129487613C134306372 @default.
- W129487613 hasConceptScore W129487613C138777275 @default.
- W129487613 hasConceptScore W129487613C149441793 @default.
- W129487613 hasConceptScore W129487613C197055811 @default.
- W129487613 hasConceptScore W129487613C33923547 @default.
- W129487613 hasConceptScore W129487613C56672385 @default.
- W129487613 hasConceptScore W129487613C61224824 @default.
- W129487613 hasConceptScore W129487613C62520636 @default.
- W129487613 hasConceptScore W129487613C73555534 @default.
- W129487613 hasLocation W1294876131 @default.
- W129487613 hasOpenAccess W129487613 @default.
- W129487613 hasPrimaryLocation W1294876131 @default.
- W129487613 hasRelatedWork W12339747 @default.
- W129487613 hasRelatedWork W1595074123 @default.
- W129487613 hasRelatedWork W1650252864 @default.
- W129487613 hasRelatedWork W1956647075 @default.
- W129487613 hasRelatedWork W2034868258 @default.
- W129487613 hasRelatedWork W2051333815 @default.
- W129487613 hasRelatedWork W2112274965 @default.
- W129487613 hasRelatedWork W2275003932 @default.
- W129487613 hasRelatedWork W2399300752 @default.
- W129487613 hasRelatedWork W2411951720 @default.
- W129487613 hasRelatedWork W2419768696 @default.
- W129487613 hasRelatedWork W2593705081 @default.
- W129487613 hasRelatedWork W2905417051 @default.
- W129487613 hasRelatedWork W2950228521 @default.
- W129487613 hasRelatedWork W2952425406 @default.
- W129487613 hasRelatedWork W2952875270 @default.
- W129487613 hasRelatedWork W2963414662 @default.
- W129487613 hasRelatedWork W2977226414 @default.
- W129487613 hasRelatedWork W3042717943 @default.
- W129487613 hasRelatedWork W305865050 @default.
- W129487613 isParatext "false" @default.
- W129487613 isRetracted "false" @default.
- W129487613 magId "129487613" @default.
- W129487613 workType "article" @default.