Matches in SemOpenAlex for { <https://semopenalex.org/work/W55373057> ?p ?o ?g. }
Showing items 1 to 94 of
94
with 100 items per page.
- W55373057 abstract "Given a number of documents, we are interested in automatically classifying documents or document sections into a number of predefined classes as efficiently as possible with as little computational requirements as possible. This is done by using Natural Language Processing (NLP) Techniques in combination with traditional high-dimensional document representation techniques such as a Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF) and machine learning techniques such as Support Vector Machines (SVM). Despite the availability of various statistical feature-selection techniques, the high-dimensionality of the feature spaces causes computational problems, especially in collections containing old-spelling and Optical Character Recognition (OCR) errors which leads to exploding feature spaces. As a result, feature extraction, feature selection, training a supervised machine learning algorithm, or clustering can no longer practically be used because it is too slow and the memory requirements are too large. We show that by applying a variety of Natural Language Processing (NLP) techniques as pre-processing, it is possible to significantly increase the discrimination between the classes. In this paper, we report f1-measures that are up to 11,3% compared to a baseline performance model which does not use NLP techniques. At the same time, the dimensionality of the feature space is reduced by up to 54%, leading to highly reduced computational requirements and better responds times in building the model of the feature space as well as in the machine learning and classification. Further experiments resulted in vector reductions up to 80%, with results being only 4% worse than the baseline model." @default.
- W55373057 created "2016-06-24" @default.
- W55373057 creator A5015391401 @default.
- W55373057 creator A5050488447 @default.
- W55373057 date "2013-11-07" @default.
- W55373057 modified "2023-09-26" @default.
- W55373057 title "Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing" @default.
- W55373057 cites W1490760466 @default.
- W55373057 cites W1521626219 @default.
- W55373057 cites W1532325895 @default.
- W55373057 cites W1561015309 @default.
- W55373057 cites W1574901103 @default.
- W55373057 cites W1587362683 @default.
- W55373057 cites W1647671624 @default.
- W55373057 cites W1996430422 @default.
- W55373057 cites W2024228866 @default.
- W55373057 cites W2034190452 @default.
- W55373057 cites W2087347434 @default.
- W55373057 cites W2098162425 @default.
- W55373057 cites W2098368939 @default.
- W55373057 cites W2119821739 @default.
- W55373057 cites W2147218300 @default.
- W55373057 cites W2150102617 @default.
- W55373057 cites W2251252397 @default.
- W55373057 cites W2402510845 @default.
- W55373057 cites W2503576123 @default.
- W55373057 cites W2911685943 @default.
- W55373057 cites W2915323653 @default.
- W55373057 cites W2993383518 @default.
- W55373057 cites W41404523 @default.
- W55373057 hasPublicationYear "2013" @default.
- W55373057 type Work @default.
- W55373057 sameAs 55373057 @default.
- W55373057 citedByCount "0" @default.
- W55373057 crossrefType "journal-article" @default.
- W55373057 hasAuthorship W55373057A5015391401 @default.
- W55373057 hasAuthorship W55373057A5050488447 @default.
- W55373057 hasConcept C111030470 @default.
- W55373057 hasConcept C119857082 @default.
- W55373057 hasConcept C12267149 @default.
- W55373057 hasConcept C138885662 @default.
- W55373057 hasConcept C148483581 @default.
- W55373057 hasConcept C153180895 @default.
- W55373057 hasConcept C154945302 @default.
- W55373057 hasConcept C204321447 @default.
- W55373057 hasConcept C2776401178 @default.
- W55373057 hasConcept C41008148 @default.
- W55373057 hasConcept C41895202 @default.
- W55373057 hasConcept C52622490 @default.
- W55373057 hasConcept C70518039 @default.
- W55373057 hasConcept C73555534 @default.
- W55373057 hasConcept C83665646 @default.
- W55373057 hasConceptScore W55373057C111030470 @default.
- W55373057 hasConceptScore W55373057C119857082 @default.
- W55373057 hasConceptScore W55373057C12267149 @default.
- W55373057 hasConceptScore W55373057C138885662 @default.
- W55373057 hasConceptScore W55373057C148483581 @default.
- W55373057 hasConceptScore W55373057C153180895 @default.
- W55373057 hasConceptScore W55373057C154945302 @default.
- W55373057 hasConceptScore W55373057C204321447 @default.
- W55373057 hasConceptScore W55373057C2776401178 @default.
- W55373057 hasConceptScore W55373057C41008148 @default.
- W55373057 hasConceptScore W55373057C41895202 @default.
- W55373057 hasConceptScore W55373057C52622490 @default.
- W55373057 hasConceptScore W55373057C70518039 @default.
- W55373057 hasConceptScore W55373057C73555534 @default.
- W55373057 hasConceptScore W55373057C83665646 @default.
- W55373057 hasLocation W553730571 @default.
- W55373057 hasOpenAccess W55373057 @default.
- W55373057 hasPrimaryLocation W553730571 @default.
- W55373057 hasRelatedWork W1492130155 @default.
- W55373057 hasRelatedWork W1516805180 @default.
- W55373057 hasRelatedWork W1540238861 @default.
- W55373057 hasRelatedWork W1992747827 @default.
- W55373057 hasRelatedWork W2026974139 @default.
- W55373057 hasRelatedWork W2120559187 @default.
- W55373057 hasRelatedWork W2136921057 @default.
- W55373057 hasRelatedWork W2159681823 @default.
- W55373057 hasRelatedWork W2413329758 @default.
- W55373057 hasRelatedWork W2585367509 @default.
- W55373057 hasRelatedWork W2596799992 @default.
- W55373057 hasRelatedWork W2789162741 @default.
- W55373057 hasRelatedWork W2798387376 @default.
- W55373057 hasRelatedWork W2883439553 @default.
- W55373057 hasRelatedWork W2954375764 @default.
- W55373057 hasRelatedWork W2982115308 @default.
- W55373057 hasRelatedWork W3019363731 @default.
- W55373057 hasRelatedWork W3023933756 @default.
- W55373057 hasRelatedWork W3091713579 @default.
- W55373057 hasRelatedWork W2187978162 @default.
- W55373057 isParatext "false" @default.
- W55373057 isRetracted "false" @default.
- W55373057 magId "55373057" @default.
- W55373057 workType "article" @default.