Matches in SemOpenAlex for { <https://semopenalex.org/work/W1488854477> ?p ?o ?g. }
Showing items 1 to 52 of
52
with 100 items per page.
- W1488854477 endingPage "256" @default.
- W1488854477 startingPage "235" @default.
- W1488854477 abstract "Automatic content-based classification of text documents is highly important for information filtering, searching, and hyperscripting. State-of-the-art text mining tools are based on statistical pattern recognition working from relatively basic document features such as term frequency histograms. Since term lists are high-dimensional and we typically have access to rather limited labeled databases, representation becomes an important issue. The problem of high dimensions has been approached with principal component analysis (PCA) — in text mining called latent semantic indexing (LSI) [4]. In this chapter we will argue that PCA should be replaced by the closely related independent component analysis (ICA). We will apply the ICA algorithm presented in Chapter 9 which is able to identify a generalizable low-dimensional basis set in the face of high-dimensional noisy data. The major benefit of using ICA is that the representation is better aligned with the content group structure than PCA. We apply our ICA technology to two public domain data sets: a subset of the MED medical abstracts database and the CRAN set of aerodynamics abstracts. In the first set we find that the unsupervised classification based on the ICA conforms well with the associated labels, while in the second set we find that the independent text components are stable but show less agreement with the given labels." @default.
- W1488854477 created "2016-06-24" @default.
- W1488854477 creator A5002081827 @default.
- W1488854477 creator A5018292103 @default.
- W1488854477 creator A5072123143 @default.
- W1488854477 date "2000-01-01" @default.
- W1488854477 modified "2023-10-18" @default.
- W1488854477 title "Independent Components in Text" @default.
- W1488854477 cites W1969839048 @default.
- W1488854477 cites W1974339500 @default.
- W1488854477 cites W2076118331 @default.
- W1488854477 cites W2143132653 @default.
- W1488854477 cites W2147152072 @default.
- W1488854477 cites W4310638450 @default.
- W1488854477 doi "https://doi.org/10.1007/978-1-4471-0443-8_13" @default.
- W1488854477 hasPublicationYear "2000" @default.
- W1488854477 type Work @default.
- W1488854477 sameAs 1488854477 @default.
- W1488854477 citedByCount "52" @default.
- W1488854477 countsByYear W14888544772012 @default.
- W1488854477 countsByYear W14888544772014 @default.
- W1488854477 countsByYear W14888544772015 @default.
- W1488854477 countsByYear W14888544772016 @default.
- W1488854477 countsByYear W14888544772018 @default.
- W1488854477 countsByYear W14888544772020 @default.
- W1488854477 crossrefType "book-chapter" @default.
- W1488854477 hasAuthorship W1488854477A5002081827 @default.
- W1488854477 hasAuthorship W1488854477A5018292103 @default.
- W1488854477 hasAuthorship W1488854477A5072123143 @default.
- W1488854477 hasConcept C204321447 @default.
- W1488854477 hasConcept C41008148 @default.
- W1488854477 hasConceptScore W1488854477C204321447 @default.
- W1488854477 hasConceptScore W1488854477C41008148 @default.
- W1488854477 hasLocation W14888544771 @default.
- W1488854477 hasOpenAccess W1488854477 @default.
- W1488854477 hasPrimaryLocation W14888544771 @default.
- W1488854477 hasRelatedWork W1552159754 @default.
- W1488854477 hasRelatedWork W2131420137 @default.
- W1488854477 hasRelatedWork W2148757832 @default.
- W1488854477 hasRelatedWork W2293457016 @default.
- W1488854477 hasRelatedWork W2368651715 @default.
- W1488854477 hasRelatedWork W2611614995 @default.
- W1488854477 hasRelatedWork W2789919619 @default.
- W1488854477 hasRelatedWork W3107474891 @default.
- W1488854477 hasRelatedWork W3169305685 @default.
- W1488854477 hasRelatedWork W4321496520 @default.
- W1488854477 isParatext "false" @default.
- W1488854477 isRetracted "false" @default.
- W1488854477 magId "1488854477" @default.
- W1488854477 workType "book-chapter" @default.