Matches in SemOpenAlex for { <https://semopenalex.org/work/W3036585> ?p ?o ?g. }
Showing items 1 to 91 of
91
with 100 items per page.
- W3036585 abstract "In text stream analysis one of the main problems is finding an effective method to classify documents fast and correctly. This is the reason why dimensionality reduction and related methods of representation of significant information are critical to develop a good text classifier. In this report we describe a novel purely combinatorial approach to obtain a meaningful representation of text data. There are two basic ideas that we realized in the current development of this approach. Namely, (1) Layered Clusters which induce over the entire data a stratification in a tower structure like a nesting doll (Russian Matreshka) [1][2], and, (2) parallel clustering of documents and their features (frequencies of words in our case). The clusters are sub-matrices of data which include each other according to the ordering given by the clustering model: the deepest cluster-matrix represents the largest weighted quasi-clique if the input data-matrix would be interpreted as a hypergraph; its effective weight is also the largest possible; the second cluster includes the first one and represents the second level of a quasi-clique with less value of the effective weight in it, etc. The effective weight is used as the objective function whose optimization gives the above clustering structure for the data. Figure 1 shows usual changes of effective values of weights along the above mentioned stratification (for one of the data-matrices used in our analysis. In table 1 we show an example of a small matrix and its sub-matrices-clusters found by our method. It is clear that this tower structure gives an ordinal scale for both documents and their features. The scale of documents points to which documents contain the most frequent words (of course, after having filtered stopwords), and, which include really rare words; similarly, a related feature scale shows which words are most frequent, and, what is their “location” in documents. In the present study we used for the purpose of learning, only the ordinal scale for document features. As an improvement, we plan to use a similar scale for documents in the near future. So, if one gets a chain of nesting set of words (parts of our nesting clusters) presented in the considered data-matrix, one can follow the order of the chain" @default.
- W3036585 created "2016-06-24" @default.
- W3036585 creator A5006005511 @default.
- W3036585 creator A5067035670 @default.
- W3036585 date "2003-01-01" @default.
- W3036585 modified "2023-09-27" @default.
- W3036585 title "Combinatorial Clustering for Textual Data Representation in Machine Learning Models" @default.
- W3036585 cites W2049187521 @default.
- W3036585 cites W2056571852 @default.
- W3036585 cites W2156305210 @default.
- W3036585 cites W313965908 @default.
- W3036585 hasPublicationYear "2003" @default.
- W3036585 type Work @default.
- W3036585 sameAs 3036585 @default.
- W3036585 citedByCount "0" @default.
- W3036585 crossrefType "journal-article" @default.
- W3036585 hasAuthorship W3036585A5006005511 @default.
- W3036585 hasAuthorship W3036585A5067035670 @default.
- W3036585 hasConcept C103275481 @default.
- W3036585 hasConcept C106487976 @default.
- W3036585 hasConcept C111030470 @default.
- W3036585 hasConcept C11413529 @default.
- W3036585 hasConcept C114614502 @default.
- W3036585 hasConcept C124101348 @default.
- W3036585 hasConcept C153180895 @default.
- W3036585 hasConcept C154945302 @default.
- W3036585 hasConcept C159985019 @default.
- W3036585 hasConcept C17744445 @default.
- W3036585 hasConcept C178790620 @default.
- W3036585 hasConcept C185592680 @default.
- W3036585 hasConcept C192562407 @default.
- W3036585 hasConcept C199539241 @default.
- W3036585 hasConcept C2776359362 @default.
- W3036585 hasConcept C2781221856 @default.
- W3036585 hasConcept C2781311116 @default.
- W3036585 hasConcept C33923547 @default.
- W3036585 hasConcept C41008148 @default.
- W3036585 hasConcept C70518039 @default.
- W3036585 hasConcept C73555534 @default.
- W3036585 hasConcept C80444323 @default.
- W3036585 hasConcept C94625758 @default.
- W3036585 hasConceptScore W3036585C103275481 @default.
- W3036585 hasConceptScore W3036585C106487976 @default.
- W3036585 hasConceptScore W3036585C111030470 @default.
- W3036585 hasConceptScore W3036585C11413529 @default.
- W3036585 hasConceptScore W3036585C114614502 @default.
- W3036585 hasConceptScore W3036585C124101348 @default.
- W3036585 hasConceptScore W3036585C153180895 @default.
- W3036585 hasConceptScore W3036585C154945302 @default.
- W3036585 hasConceptScore W3036585C159985019 @default.
- W3036585 hasConceptScore W3036585C17744445 @default.
- W3036585 hasConceptScore W3036585C178790620 @default.
- W3036585 hasConceptScore W3036585C185592680 @default.
- W3036585 hasConceptScore W3036585C192562407 @default.
- W3036585 hasConceptScore W3036585C199539241 @default.
- W3036585 hasConceptScore W3036585C2776359362 @default.
- W3036585 hasConceptScore W3036585C2781221856 @default.
- W3036585 hasConceptScore W3036585C2781311116 @default.
- W3036585 hasConceptScore W3036585C33923547 @default.
- W3036585 hasConceptScore W3036585C41008148 @default.
- W3036585 hasConceptScore W3036585C70518039 @default.
- W3036585 hasConceptScore W3036585C73555534 @default.
- W3036585 hasConceptScore W3036585C80444323 @default.
- W3036585 hasConceptScore W3036585C94625758 @default.
- W3036585 hasLocation W30365851 @default.
- W3036585 hasOpenAccess W3036585 @default.
- W3036585 hasPrimaryLocation W30365851 @default.
- W3036585 hasRelatedWork W1499870548 @default.
- W3036585 hasRelatedWork W1512553849 @default.
- W3036585 hasRelatedWork W1526769643 @default.
- W3036585 hasRelatedWork W1565664995 @default.
- W3036585 hasRelatedWork W1879046967 @default.
- W3036585 hasRelatedWork W1892228081 @default.
- W3036585 hasRelatedWork W1970167598 @default.
- W3036585 hasRelatedWork W2038411710 @default.
- W3036585 hasRelatedWork W2054886094 @default.
- W3036585 hasRelatedWork W2108939895 @default.
- W3036585 hasRelatedWork W2118731835 @default.
- W3036585 hasRelatedWork W2128808847 @default.
- W3036585 hasRelatedWork W2141587979 @default.
- W3036585 hasRelatedWork W2157522502 @default.
- W3036585 hasRelatedWork W2294137921 @default.
- W3036585 hasRelatedWork W2312960312 @default.
- W3036585 hasRelatedWork W2509477067 @default.
- W3036585 hasRelatedWork W2542652489 @default.
- W3036585 hasRelatedWork W41665960 @default.
- W3036585 hasRelatedWork W87822204 @default.
- W3036585 isParatext "false" @default.
- W3036585 isRetracted "false" @default.
- W3036585 magId "3036585" @default.
- W3036585 workType "article" @default.