Matches in SemOpenAlex for { <https://semopenalex.org/work/W2078144193> ?p ?o ?g. }
- W2078144193 endingPage "104" @default.
- W2078144193 startingPage "91" @default.
- W2078144193 abstract "Designing a good feature selection (FS) algorithm is of utmost importance especially for text classification (TC), wherein a large number of features representing terms or words pose serious challenges to the effectiveness and efficiency of classifiers. FS algorithms are divided into two broad categories, namely, feature ranking (FR) and feature subset selection (FSS) algorithms. Unlike FSS, FR algorithms select those features that are individually highly relevant for the class or category without taking the feature interactions into account. This makes FR algorithms simple and computationally more efficient than FSS and thus, mostly a preferred choice for TC. Bi-normal separation (BNS) (Forman, 2003) and information gain (IG) (Yang and Pedersen, 1997) are well-known FR metrics. However, FR algorithms output a set of highly relevant features or terms which can possibly be redundant and can thus, deteriorate a classifier׳s performance. This paper suggests taking the interactions of words into account in order to eliminate redundant terms. Stand-alone FSS algorithms can be computationally expensive for the high-dimensional text data. We therefore suggest a two-stage FS algorithm, which employs an FR metric such as BNS or IG in the first stage and an FSS algorithm such as the Markov blanket filter (MBF) (Koller and Sahami, 1996) in the second stage. Most of the two-stage algorithms proposed in the literature for TC combine feature ranking and feature transformation such as principal component analysis (PCA) algorithms. To estimate the statistical significance of our two-stage algorithm, we carry out experiments on 10 different splits of training and test sets of each of the three (Reuters-21578, TREC, OHSUMED) data sets with naive Bayes׳ and support vector machines. Our results based on a paired two-sided t-test show that the macro F1 performance of BNS+MBF is statistically significant than that of stand-alone BNS in 69% of the total experimental trials. The macro F1 values of IG get enhanced in 72% of the trials when MBF is used in the second stage. We also compare our two-stage algorithm against two recently proposed FS algorithms, namely, distinguishing feature selector (DFS) (Uysal and Gunal, 2012) and a two stage algorithm consisting of IG and PCA algorithms (Uguz, 2011). BNS+MBF is found to be significantly better than DFS and IG+PCA in 74 and 78% of the trials respectively. IG+MBF outperforms DFS and IG+PCA in 93 and 80% of the experimental trials respectively. Similar results are observed for BNS+MBF and IG+MBF when the performances are evaluated in terms of balanced error rate." @default.
- W2078144193 created "2016-06-24" @default.
- W2078144193 creator A5042973836 @default.
- W2078144193 creator A5045662541 @default.
- W2078144193 creator A5047979646 @default.
- W2078144193 date "2015-06-01" @default.
- W2078144193 modified "2023-09-27" @default.
- W2078144193 title "A two-stage Markov blanket based feature selection algorithm for text classification" @default.
- W2078144193 cites W1972640883 @default.
- W2078144193 cites W1978394996 @default.
- W2078144193 cites W1979622093 @default.
- W2078144193 cites W1982589161 @default.
- W2078144193 cites W1983223005 @default.
- W2078144193 cites W1989127062 @default.
- W2078144193 cites W2009272467 @default.
- W2078144193 cites W2014527343 @default.
- W2078144193 cites W2017337590 @default.
- W2078144193 cites W2053608218 @default.
- W2078144193 cites W2068833644 @default.
- W2078144193 cites W2088937912 @default.
- W2078144193 cites W2092782467 @default.
- W2078144193 cites W2109676405 @default.
- W2078144193 cites W2118020653 @default.
- W2078144193 cites W2125109223 @default.
- W2078144193 cites W2133462743 @default.
- W2078144193 cites W2134090438 @default.
- W2078144193 cites W2149684865 @default.
- W2078144193 cites W2150874198 @default.
- W2078144193 cites W2154053567 @default.
- W2078144193 cites W2156758690 @default.
- W2078144193 cites W80229024 @default.
- W2078144193 doi "https://doi.org/10.1016/j.neucom.2015.01.031" @default.
- W2078144193 hasPublicationYear "2015" @default.
- W2078144193 type Work @default.
- W2078144193 sameAs 2078144193 @default.
- W2078144193 citedByCount "49" @default.
- W2078144193 countsByYear W20781441932015 @default.
- W2078144193 countsByYear W20781441932016 @default.
- W2078144193 countsByYear W20781441932017 @default.
- W2078144193 countsByYear W20781441932018 @default.
- W2078144193 countsByYear W20781441932019 @default.
- W2078144193 countsByYear W20781441932020 @default.
- W2078144193 countsByYear W20781441932021 @default.
- W2078144193 countsByYear W20781441932022 @default.
- W2078144193 countsByYear W20781441932023 @default.
- W2078144193 crossrefType "journal-article" @default.
- W2078144193 hasAuthorship W2078144193A5042973836 @default.
- W2078144193 hasAuthorship W2078144193A5045662541 @default.
- W2078144193 hasAuthorship W2078144193A5047979646 @default.
- W2078144193 hasConcept C106131492 @default.
- W2078144193 hasConcept C11413529 @default.
- W2078144193 hasConcept C119857082 @default.
- W2078144193 hasConcept C123867240 @default.
- W2078144193 hasConcept C138885662 @default.
- W2078144193 hasConcept C148483581 @default.
- W2078144193 hasConcept C153180895 @default.
- W2078144193 hasConcept C154945302 @default.
- W2078144193 hasConcept C163836022 @default.
- W2078144193 hasConcept C189430467 @default.
- W2078144193 hasConcept C189973286 @default.
- W2078144193 hasConcept C2776401178 @default.
- W2078144193 hasConcept C31972630 @default.
- W2078144193 hasConcept C41008148 @default.
- W2078144193 hasConcept C41895202 @default.
- W2078144193 hasConcept C95623464 @default.
- W2078144193 hasConcept C98763669 @default.
- W2078144193 hasConceptScore W2078144193C106131492 @default.
- W2078144193 hasConceptScore W2078144193C11413529 @default.
- W2078144193 hasConceptScore W2078144193C119857082 @default.
- W2078144193 hasConceptScore W2078144193C123867240 @default.
- W2078144193 hasConceptScore W2078144193C138885662 @default.
- W2078144193 hasConceptScore W2078144193C148483581 @default.
- W2078144193 hasConceptScore W2078144193C153180895 @default.
- W2078144193 hasConceptScore W2078144193C154945302 @default.
- W2078144193 hasConceptScore W2078144193C163836022 @default.
- W2078144193 hasConceptScore W2078144193C189430467 @default.
- W2078144193 hasConceptScore W2078144193C189973286 @default.
- W2078144193 hasConceptScore W2078144193C2776401178 @default.
- W2078144193 hasConceptScore W2078144193C31972630 @default.
- W2078144193 hasConceptScore W2078144193C41008148 @default.
- W2078144193 hasConceptScore W2078144193C41895202 @default.
- W2078144193 hasConceptScore W2078144193C95623464 @default.
- W2078144193 hasConceptScore W2078144193C98763669 @default.
- W2078144193 hasLocation W20781441931 @default.
- W2078144193 hasOpenAccess W2078144193 @default.
- W2078144193 hasPrimaryLocation W20781441931 @default.
- W2078144193 hasRelatedWork W2141009080 @default.
- W2078144193 hasRelatedWork W2356853483 @default.
- W2078144193 hasRelatedWork W2374344280 @default.
- W2078144193 hasRelatedWork W2563096758 @default.
- W2078144193 hasRelatedWork W3010923102 @default.
- W2078144193 hasRelatedWork W3200179079 @default.
- W2078144193 hasRelatedWork W4293525103 @default.
- W2078144193 hasRelatedWork W4386053843 @default.
- W2078144193 hasRelatedWork W2345184372 @default.
- W2078144193 hasRelatedWork W3158004940 @default.
- W2078144193 hasVolume "157" @default.
- W2078144193 isParatext "false" @default.