Matches in SemOpenAlex for { <https://semopenalex.org/work/W2244473952> ?p ?o ?g. }
- W2244473952 abstract "Naive Bayes (NB) classifiers are well-suited to several applications owing to their easy interpretability and maintainability. However, text classification is often hampered by the lack of adequate training data. This motivates the question: how can we train NB more effectively whentraining data is very scarce?In this paper, we introduce an established subsampling techniquefrom statistics -- the jackknife -- into machine learning. Our approachjackknifes documents themselves to create new pseudo-documents. Theunderlying idea is that although these pseudo-documents do not havesemantic meaning, they are equally representative of the underlyingdistribution of terms. Therefore, they could be used to train any classifierthat learns this underlying distribution, namely, any parametric classifiersuch as NB (but not, for example, non-parametric classifiers such as SVMand k-NN). Furthermore, the marginal value of this additional trainingdata should be the highest precisely when the original data is inadequate. We then show that our jackknife technique is related to the questionof additively smoothing NB via an appropriately defined notion ofadjointness. This relation is surprising since it connects a statisticaltechnique for handling scarce data to a question about the NB model. Accordingly, we are able to shed light on optimal values of the smoothingparameter for NB in the very scarce data regime. We validate our approach on a wide array of standard benchmarks -- both binary and multi-class -- for two event models of multinomial NB. Weshow that the jackknife technique can dramatically improve the accuracyfor both event models of NB in the regime of very scarce training data. Inparticular, our experiments show that the jackknife can make NB moreaccurate than SVM for binary problems in the very scarce training dataregime. We also provide a comprehensive characterization of the accuracyof these important classifiers (for both binary and multiclass) in the veryscarce data regime for benchmark text datasets, without feature selectionand class imbalance." @default.
- W2244473952 created "2016-06-24" @default.
- W2244473952 creator A5077747661 @default.
- W2244473952 date "2015-11-01" @default.
- W2244473952 modified "2023-09-28" @default.
- W2244473952 title "Jackknifing Documents and Additive Smoothing for Naive Bayes with Scarce Data" @default.
- W2244473952 cites W1481632895 @default.
- W2244473952 cites W1495837896 @default.
- W2244473952 cites W1524688041 @default.
- W2244473952 cites W1532325895 @default.
- W2244473952 cites W1550206324 @default.
- W2244473952 cites W1827453818 @default.
- W2244473952 cites W1858775570 @default.
- W2244473952 cites W2005422315 @default.
- W2244473952 cites W2076237237 @default.
- W2244473952 cites W2097089247 @default.
- W2244473952 cites W2097927681 @default.
- W2244473952 cites W2118020653 @default.
- W2244473952 cites W2140336868 @default.
- W2244473952 cites W2148143831 @default.
- W2244473952 cites W2149684865 @default.
- W2244473952 cites W2151752770 @default.
- W2244473952 cites W2163614729 @default.
- W2244473952 cites W2165744911 @default.
- W2244473952 cites W2912934387 @default.
- W2244473952 cites W3148472308 @default.
- W2244473952 cites W65718221 @default.
- W2244473952 doi "https://doi.org/10.1109/icdm.2015.94" @default.
- W2244473952 hasPublicationYear "2015" @default.
- W2244473952 type Work @default.
- W2244473952 sameAs 2244473952 @default.
- W2244473952 citedByCount "2" @default.
- W2244473952 countsByYear W22444739522018 @default.
- W2244473952 countsByYear W22444739522019 @default.
- W2244473952 crossrefType "proceedings-article" @default.
- W2244473952 hasAuthorship W2244473952A5077747661 @default.
- W2244473952 hasConcept C105795698 @default.
- W2244473952 hasConcept C117251300 @default.
- W2244473952 hasConcept C119857082 @default.
- W2244473952 hasConcept C121332964 @default.
- W2244473952 hasConcept C124101348 @default.
- W2244473952 hasConcept C154945302 @default.
- W2244473952 hasConcept C185429906 @default.
- W2244473952 hasConcept C199360897 @default.
- W2244473952 hasConcept C25343380 @default.
- W2244473952 hasConcept C2777212361 @default.
- W2244473952 hasConcept C2778565505 @default.
- W2244473952 hasConcept C2779662365 @default.
- W2244473952 hasConcept C2781067378 @default.
- W2244473952 hasConcept C31972630 @default.
- W2244473952 hasConcept C33923547 @default.
- W2244473952 hasConcept C3770464 @default.
- W2244473952 hasConcept C41008148 @default.
- W2244473952 hasConcept C62520636 @default.
- W2244473952 hasConcept C81790035 @default.
- W2244473952 hasConceptScore W2244473952C105795698 @default.
- W2244473952 hasConceptScore W2244473952C117251300 @default.
- W2244473952 hasConceptScore W2244473952C119857082 @default.
- W2244473952 hasConceptScore W2244473952C121332964 @default.
- W2244473952 hasConceptScore W2244473952C124101348 @default.
- W2244473952 hasConceptScore W2244473952C154945302 @default.
- W2244473952 hasConceptScore W2244473952C185429906 @default.
- W2244473952 hasConceptScore W2244473952C199360897 @default.
- W2244473952 hasConceptScore W2244473952C25343380 @default.
- W2244473952 hasConceptScore W2244473952C2777212361 @default.
- W2244473952 hasConceptScore W2244473952C2778565505 @default.
- W2244473952 hasConceptScore W2244473952C2779662365 @default.
- W2244473952 hasConceptScore W2244473952C2781067378 @default.
- W2244473952 hasConceptScore W2244473952C31972630 @default.
- W2244473952 hasConceptScore W2244473952C33923547 @default.
- W2244473952 hasConceptScore W2244473952C3770464 @default.
- W2244473952 hasConceptScore W2244473952C41008148 @default.
- W2244473952 hasConceptScore W2244473952C62520636 @default.
- W2244473952 hasConceptScore W2244473952C81790035 @default.
- W2244473952 hasLocation W22444739521 @default.
- W2244473952 hasOpenAccess W2244473952 @default.
- W2244473952 hasPrimaryLocation W22444739521 @default.
- W2244473952 hasRelatedWork W1488264517 @default.
- W2244473952 hasRelatedWork W1517032226 @default.
- W2244473952 hasRelatedWork W1551753208 @default.
- W2244473952 hasRelatedWork W1558666744 @default.
- W2244473952 hasRelatedWork W158127209 @default.
- W2244473952 hasRelatedWork W1669748612 @default.
- W2244473952 hasRelatedWork W1783422035 @default.
- W2244473952 hasRelatedWork W1981655323 @default.
- W2244473952 hasRelatedWork W2000655290 @default.
- W2244473952 hasRelatedWork W2048041872 @default.
- W2244473952 hasRelatedWork W2111702745 @default.
- W2244473952 hasRelatedWork W2114881090 @default.
- W2244473952 hasRelatedWork W2136787748 @default.
- W2244473952 hasRelatedWork W2403597675 @default.
- W2244473952 hasRelatedWork W2495342377 @default.
- W2244473952 hasRelatedWork W2757523618 @default.
- W2244473952 hasRelatedWork W2805208759 @default.
- W2244473952 hasRelatedWork W2920790122 @default.
- W2244473952 hasRelatedWork W3085378621 @default.
- W2244473952 hasRelatedWork W2021593743 @default.
- W2244473952 isParatext "false" @default.
- W2244473952 isRetracted "false" @default.
- W2244473952 magId "2244473952" @default.