Matches in SemOpenAlex for { <https://semopenalex.org/work/W2531644019> ?p ?o ?g. }
Showing items 1 to 95 of
95
with 100 items per page.
- W2531644019 abstract "Information technologies have recently led to a surge of electronic documents in the form of emails, webpages, blogs, news articles, etc. To help users decide which documents may be interesting to read, it is common practice to organize documents by categories/topics. A wide range of supervised and unsupervised learning techniques already exist for automated text classification and text clustering. However, supervised learning requires a training set of documents already labeled with topics/categories, which is not always readily available. In contrast, unsupervised learning techniques do not require labeled documents, but assigning a suitable category to each resulting cluster remains a difficult problem. The state of the art consists of extracting keywords based on word frequency (or related heuristics). In this thesis, we improve the extraction of keywords for unsupervised labeling of document clusters by designing a Bayesian approach based on topic modeling. More precisely, we describe an approach that uses a large side corpus to infer a language model that implicitly encodes the semantic relatedness of different words. This language model is then used to build a generative model of the cluster in such a way that the probability of generating each word depends on its frequency in the cluster as well as the frequency of its semantically related words. The words with the highest probability of generation are then extracted to label the cluster. In this approach, the side corpus can be thought as a source of domain knowledge or context. However, there are two potential problems: processing a large side corpus can be time consuming and if the content of this corpus is not similar enough to the cluster, the resulting language model may be biased. We deal with those issues by designing a Bayesian transfer learning framework that allows us to process the side corpus just once offline and to weigh its importance based on the degree of similarity with the cluster." @default.
- W2531644019 created "2016-10-21" @default.
- W2531644019 creator A5063089557 @default.
- W2531644019 date "2011-08-30" @default.
- W2531644019 modified "2023-09-24" @default.
- W2531644019 title "Bayesian Unsupervised Labeling of Web Document Clusters" @default.
- W2531644019 cites W1489119587 @default.
- W2531644019 cites W1511468840 @default.
- W2531644019 cites W1574901103 @default.
- W2531644019 cites W159230833 @default.
- W2531644019 cites W1597533204 @default.
- W2531644019 cites W1602667807 @default.
- W2531644019 cites W1880262756 @default.
- W2531644019 cites W1973646734 @default.
- W2531644019 cites W2001082470 @default.
- W2531644019 cites W2013059333 @default.
- W2531644019 cites W2045656233 @default.
- W2531644019 cites W2069429561 @default.
- W2531644019 cites W2100163972 @default.
- W2531644019 cites W2104210067 @default.
- W2531644019 cites W2126163471 @default.
- W2531644019 cites W2126736494 @default.
- W2531644019 cites W2962762626 @default.
- W2531644019 cites W3211848854 @default.
- W2531644019 hasPublicationYear "2011" @default.
- W2531644019 type Work @default.
- W2531644019 sameAs 2531644019 @default.
- W2531644019 citedByCount "0" @default.
- W2531644019 crossrefType "dissertation" @default.
- W2531644019 hasAuthorship W2531644019A5063089557 @default.
- W2531644019 hasConcept C111919701 @default.
- W2531644019 hasConcept C119857082 @default.
- W2531644019 hasConcept C127705205 @default.
- W2531644019 hasConcept C136764020 @default.
- W2531644019 hasConcept C137293760 @default.
- W2531644019 hasConcept C151730666 @default.
- W2531644019 hasConcept C154945302 @default.
- W2531644019 hasConcept C171686336 @default.
- W2531644019 hasConcept C177264268 @default.
- W2531644019 hasConcept C177937566 @default.
- W2531644019 hasConcept C199360897 @default.
- W2531644019 hasConcept C204321447 @default.
- W2531644019 hasConcept C21959979 @default.
- W2531644019 hasConcept C23123220 @default.
- W2531644019 hasConcept C2779343474 @default.
- W2531644019 hasConcept C41008148 @default.
- W2531644019 hasConcept C73555534 @default.
- W2531644019 hasConcept C8038995 @default.
- W2531644019 hasConcept C86803240 @default.
- W2531644019 hasConceptScore W2531644019C111919701 @default.
- W2531644019 hasConceptScore W2531644019C119857082 @default.
- W2531644019 hasConceptScore W2531644019C127705205 @default.
- W2531644019 hasConceptScore W2531644019C136764020 @default.
- W2531644019 hasConceptScore W2531644019C137293760 @default.
- W2531644019 hasConceptScore W2531644019C151730666 @default.
- W2531644019 hasConceptScore W2531644019C154945302 @default.
- W2531644019 hasConceptScore W2531644019C171686336 @default.
- W2531644019 hasConceptScore W2531644019C177264268 @default.
- W2531644019 hasConceptScore W2531644019C177937566 @default.
- W2531644019 hasConceptScore W2531644019C199360897 @default.
- W2531644019 hasConceptScore W2531644019C204321447 @default.
- W2531644019 hasConceptScore W2531644019C21959979 @default.
- W2531644019 hasConceptScore W2531644019C23123220 @default.
- W2531644019 hasConceptScore W2531644019C2779343474 @default.
- W2531644019 hasConceptScore W2531644019C41008148 @default.
- W2531644019 hasConceptScore W2531644019C73555534 @default.
- W2531644019 hasConceptScore W2531644019C8038995 @default.
- W2531644019 hasConceptScore W2531644019C86803240 @default.
- W2531644019 hasLocation W25316440191 @default.
- W2531644019 hasOpenAccess W2531644019 @default.
- W2531644019 hasPrimaryLocation W25316440191 @default.
- W2531644019 hasRelatedWork W1160799 @default.
- W2531644019 hasRelatedWork W1484943052 @default.
- W2531644019 hasRelatedWork W1487059419 @default.
- W2531644019 hasRelatedWork W1596589983 @default.
- W2531644019 hasRelatedWork W1601236548 @default.
- W2531644019 hasRelatedWork W1614742790 @default.
- W2531644019 hasRelatedWork W1822425923 @default.
- W2531644019 hasRelatedWork W2186489521 @default.
- W2531644019 hasRelatedWork W2187800623 @default.
- W2531644019 hasRelatedWork W2285503065 @default.
- W2531644019 hasRelatedWork W2296747845 @default.
- W2531644019 hasRelatedWork W2348322200 @default.
- W2531644019 hasRelatedWork W2922081688 @default.
- W2531644019 hasRelatedWork W2980231952 @default.
- W2531644019 hasRelatedWork W3006644969 @default.
- W2531644019 hasRelatedWork W3017765460 @default.
- W2531644019 hasRelatedWork W3104970540 @default.
- W2531644019 hasRelatedWork W3122168634 @default.
- W2531644019 hasRelatedWork W3179163952 @default.
- W2531644019 hasRelatedWork W328659180 @default.
- W2531644019 isParatext "false" @default.
- W2531644019 isRetracted "false" @default.
- W2531644019 magId "2531644019" @default.
- W2531644019 workType "dissertation" @default.