Matches in SemOpenAlex for { <https://semopenalex.org/work/W197697942> ?p ?o ?g. }
Showing items 1 to 70 of
70
with 100 items per page.
- W197697942 endingPage "317" @default.
- W197697942 startingPage "316" @default.
- W197697942 abstract "Text categorization is one of the key functions for utilizing vast amount of documents. It can be seen as a classificauon problem, which has been studmd in pattern recogmtion and machine learning fields for a long time and several classffication methods have been developed such as staustical classfficauon, decision tree, support vector machines and so on. Many researchers applied those classification methods to text categorization and reported their performance (e.g., decision tree[3/, Bayes classifier[2/, support vector macbane[1]). Yang conducted comprehensive study of comparison of text categorization and reported that k nearest neighbor and support vector machines works well for text categorization[4/. In the previous studies, classification methods were usually compared using single pair of training and test data However, classification method with more complex family of classffiers requires more training data and small training data may result in deriving unreliable classifier, that is, the performance of the derived classifier vanes much depending on training data. Therefore, we need to take the size of traamng data into account when comparing and selecting a classification method. In this paper, we discuss how to select a classifier from those derived by various classification methods and how the size of training data affects the performance of the denved classifier. In order to evaluate the reliability of classfficatlon method, we consider the variance of accuracy of derived classffier. We first construct a statistical model. In the text categorization, each document is usually represented with a feature vector that consists of weighted frequencies of terms. In the vector space model, document is a point in high dimensional feature space and a classifier separates the feature space into subspaces each of which is labeled with a category. Let us consider the problem of classifying documents into c categories, and suppose we obtain a classffier which separate the feature space into m subspaces s~, s 2 , . . . , s,~. In the case of Rocchio's classification method, the number m of the future subspaces is the number c of categories, while it is the number of leaves for decision trees. Let p, denote occurrence probability, that is, the probability that a document vector is in subspace s,. Notice that ~ = 1 P, = 1 holds. Suppose the category of a subspace si is c~, then the accuracy of si, denoted by cq, is" @default.
- W197697942 created "2016-06-24" @default.
- W197697942 creator A5030583821 @default.
- W197697942 creator A5087434029 @default.
- W197697942 date "2000-01-01" @default.
- W197697942 modified "2023-09-26" @default.
- W197697942 title "Variance Based Classifier Comparison in Text Categorization" @default.
- W197697942 cites W1620204465 @default.
- W197697942 cites W2005422315 @default.
- W197697942 cites W2149684865 @default.
- W197697942 hasPublicationYear "2000" @default.
- W197697942 type Work @default.
- W197697942 sameAs 197697942 @default.
- W197697942 citedByCount "0" @default.
- W197697942 crossrefType "proceedings-article" @default.
- W197697942 hasAuthorship W197697942A5030583821 @default.
- W197697942 hasAuthorship W197697942A5087434029 @default.
- W197697942 hasConcept C119857082 @default.
- W197697942 hasConcept C12267149 @default.
- W197697942 hasConcept C124101348 @default.
- W197697942 hasConcept C153180895 @default.
- W197697942 hasConcept C154945302 @default.
- W197697942 hasConcept C185207860 @default.
- W197697942 hasConcept C2986744138 @default.
- W197697942 hasConcept C41008148 @default.
- W197697942 hasConcept C52001869 @default.
- W197697942 hasConcept C84525736 @default.
- W197697942 hasConcept C94124525 @default.
- W197697942 hasConcept C95623464 @default.
- W197697942 hasConceptScore W197697942C119857082 @default.
- W197697942 hasConceptScore W197697942C12267149 @default.
- W197697942 hasConceptScore W197697942C124101348 @default.
- W197697942 hasConceptScore W197697942C153180895 @default.
- W197697942 hasConceptScore W197697942C154945302 @default.
- W197697942 hasConceptScore W197697942C185207860 @default.
- W197697942 hasConceptScore W197697942C2986744138 @default.
- W197697942 hasConceptScore W197697942C41008148 @default.
- W197697942 hasConceptScore W197697942C52001869 @default.
- W197697942 hasConceptScore W197697942C84525736 @default.
- W197697942 hasConceptScore W197697942C94124525 @default.
- W197697942 hasConceptScore W197697942C95623464 @default.
- W197697942 hasLocation W1976979421 @default.
- W197697942 hasOpenAccess W197697942 @default.
- W197697942 hasPrimaryLocation W1976979421 @default.
- W197697942 hasRelatedWork W142552274 @default.
- W197697942 hasRelatedWork W1491139408 @default.
- W197697942 hasRelatedWork W1965229441 @default.
- W197697942 hasRelatedWork W2005088791 @default.
- W197697942 hasRelatedWork W2139676742 @default.
- W197697942 hasRelatedWork W2188562394 @default.
- W197697942 hasRelatedWork W2215066417 @default.
- W197697942 hasRelatedWork W2322557215 @default.
- W197697942 hasRelatedWork W2345944844 @default.
- W197697942 hasRelatedWork W2732999188 @default.
- W197697942 hasRelatedWork W2784025567 @default.
- W197697942 hasRelatedWork W2791065414 @default.
- W197697942 hasRelatedWork W2810175598 @default.
- W197697942 hasRelatedWork W2914428932 @default.
- W197697942 hasRelatedWork W2964751504 @default.
- W197697942 hasRelatedWork W3183487064 @default.
- W197697942 hasRelatedWork W2840427028 @default.
- W197697942 hasRelatedWork W2861715009 @default.
- W197697942 hasRelatedWork W3001229373 @default.
- W197697942 hasRelatedWork W3142597132 @default.
- W197697942 isParatext "false" @default.
- W197697942 isRetracted "false" @default.
- W197697942 magId "197697942" @default.
- W197697942 workType "article" @default.