Matches in SemOpenAlex for { <https://semopenalex.org/work/W2950700385> ?p ?o ?g. }
Showing items 1 to 92 of
92
with 100 items per page.
- W2950700385 abstract "Topic models, such as Latent Dirichlet Allocation (LDA), posit that documents are drawn from admixtures of distributions over words, known as topics. The inference problem of recovering topics from admixtures, is NP-hard. Assuming separability, a strong assumption, [4] gave the first provable algorithm for inference. For LDA model, [6] gave a provable algorithm using tensor-methods. But [4,6] do not learn topic vectors with bounded $l_1$ error (a natural measure for probability vectors). Our aim is to develop a model which makes intuitive and empirically supported assumptions and to design an algorithm with natural, simple components such as SVD, which provably solves the inference problem for the model with bounded $l_1$ error. A topic in LDA and other models is essentially characterized by a group of co-occurring words. Motivated by this, we introduce topic specific Catchwords, group of words which occur with strictly greater frequency in a topic than any other topic individually and are required to have high frequency together rather than individually. A major contribution of the paper is to show that under this more realistic assumption, which is empirically verified on real corpora, a singular value decomposition (SVD) based algorithm with a crucial pre-processing step of thresholding, can provably recover the topics from a collection of documents drawn from Dominant admixtures. Dominant admixtures are convex combination of distributions in which one distribution has a significantly higher contribution than others. Apart from the simplicity of the algorithm, the sample complexity has near optimal dependence on $w_0$, the lowest probability that a topic is dominant, and is better than [4]. Empirical evidence shows that on several real world corpora, both Catchwords and Dominant admixture assumptions hold and the proposed algorithm substantially outperforms the state of the art [5]." @default.
- W2950700385 created "2019-06-27" @default.
- W2950700385 creator A5007922392 @default.
- W2950700385 creator A5032650028 @default.
- W2950700385 creator A5084271903 @default.
- W2950700385 date "2014-10-26" @default.
- W2950700385 modified "2023-09-27" @default.
- W2950700385 title "A provable SVD-based algorithm for learning topics in dominant admixture corpus" @default.
- W2950700385 cites W1880262756 @default.
- W2950700385 cites W2063392856 @default.
- W2950700385 cites W2073459066 @default.
- W2950700385 cites W2130339025 @default.
- W2950700385 cites W2131172946 @default.
- W2950700385 cites W2147152072 @default.
- W2950700385 cites W2150593711 @default.
- W2950700385 cites W2950342785 @default.
- W2950700385 cites W2952389066 @default.
- W2950700385 cites W2953266361 @default.
- W2950700385 cites W2953337630 @default.
- W2950700385 cites W2963625764 @default.
- W2950700385 cites W2989661724 @default.
- W2950700385 hasPublicationYear "2014" @default.
- W2950700385 type Work @default.
- W2950700385 sameAs 2950700385 @default.
- W2950700385 citedByCount "12" @default.
- W2950700385 countsByYear W29507003852015 @default.
- W2950700385 countsByYear W29507003852016 @default.
- W2950700385 countsByYear W29507003852017 @default.
- W2950700385 countsByYear W29507003852018 @default.
- W2950700385 countsByYear W29507003852019 @default.
- W2950700385 countsByYear W29507003852020 @default.
- W2950700385 crossrefType "posted-content" @default.
- W2950700385 hasAuthorship W2950700385A5007922392 @default.
- W2950700385 hasAuthorship W2950700385A5032650028 @default.
- W2950700385 hasAuthorship W2950700385A5084271903 @default.
- W2950700385 hasConcept C111472728 @default.
- W2950700385 hasConcept C11413529 @default.
- W2950700385 hasConcept C134306372 @default.
- W2950700385 hasConcept C138885662 @default.
- W2950700385 hasConcept C154945302 @default.
- W2950700385 hasConcept C169214877 @default.
- W2950700385 hasConcept C171686336 @default.
- W2950700385 hasConcept C182310444 @default.
- W2950700385 hasConcept C22789450 @default.
- W2950700385 hasConcept C2776214188 @default.
- W2950700385 hasConcept C2780586882 @default.
- W2950700385 hasConcept C33923547 @default.
- W2950700385 hasConcept C34388435 @default.
- W2950700385 hasConcept C41008148 @default.
- W2950700385 hasConcept C500882744 @default.
- W2950700385 hasConceptScore W2950700385C111472728 @default.
- W2950700385 hasConceptScore W2950700385C11413529 @default.
- W2950700385 hasConceptScore W2950700385C134306372 @default.
- W2950700385 hasConceptScore W2950700385C138885662 @default.
- W2950700385 hasConceptScore W2950700385C154945302 @default.
- W2950700385 hasConceptScore W2950700385C169214877 @default.
- W2950700385 hasConceptScore W2950700385C171686336 @default.
- W2950700385 hasConceptScore W2950700385C182310444 @default.
- W2950700385 hasConceptScore W2950700385C22789450 @default.
- W2950700385 hasConceptScore W2950700385C2776214188 @default.
- W2950700385 hasConceptScore W2950700385C2780586882 @default.
- W2950700385 hasConceptScore W2950700385C33923547 @default.
- W2950700385 hasConceptScore W2950700385C34388435 @default.
- W2950700385 hasConceptScore W2950700385C41008148 @default.
- W2950700385 hasConceptScore W2950700385C500882744 @default.
- W2950700385 hasLocation W29507003851 @default.
- W2950700385 hasOpenAccess W2950700385 @default.
- W2950700385 hasPrimaryLocation W29507003851 @default.
- W2950700385 hasRelatedWork W1755468095 @default.
- W2950700385 hasRelatedWork W1880262756 @default.
- W2950700385 hasRelatedWork W2119008358 @default.
- W2950700385 hasRelatedWork W2125477383 @default.
- W2950700385 hasRelatedWork W2130186484 @default.
- W2950700385 hasRelatedWork W2150731624 @default.
- W2950700385 hasRelatedWork W2174706414 @default.
- W2950700385 hasRelatedWork W2201177080 @default.
- W2950700385 hasRelatedWork W2275370112 @default.
- W2950700385 hasRelatedWork W2339987761 @default.
- W2950700385 hasRelatedWork W2395737356 @default.
- W2950700385 hasRelatedWork W2581504556 @default.
- W2950700385 hasRelatedWork W2804794314 @default.
- W2950700385 hasRelatedWork W2943476805 @default.
- W2950700385 hasRelatedWork W2953337630 @default.
- W2950700385 hasRelatedWork W2963625764 @default.
- W2950700385 hasRelatedWork W2963626317 @default.
- W2950700385 hasRelatedWork W3015293812 @default.
- W2950700385 hasRelatedWork W3046005725 @default.
- W2950700385 hasRelatedWork W3171327347 @default.
- W2950700385 isParatext "false" @default.
- W2950700385 isRetracted "false" @default.
- W2950700385 magId "2950700385" @default.
- W2950700385 workType "article" @default.