Matches in SemOpenAlex for { <https://semopenalex.org/work/W3022371575> ?p ?o ?g. }
- W3022371575 endingPage "102263" @default.
- W3022371575 startingPage "102263" @default.
- W3022371575 abstract "Text Classification pipelines are a sequence of tasks needed to be performed to classify documents into a set of predefined categories. The pre-processing phase (before training) of these pipelines involve different ways of transforming and manipulating the documents for the next (learning) phase. In this paper, we introduce three new steps into the pre-processing phase of text classification pipelines to improve effectiveness while reducing the associated costs. The distance-based Meta-Features (MFs) generation step aims at reducing the dimensionality of the original term-document matrix while producing a potentially more informative space that explicitly exploits discriminative labeled information. The second step is a sparsification one aimed at making the MF representation less dense to reduce training costs and noise. The third step is a selective sampling (SS) aimed at removing lines (documents) of the matrix obtained in the previous step, by carefully selecting the “best” documents for the learning phase. Our experiments show that the proposed extended pre-processing pipeline can achieve significant gains in effectiveness when compared to the original TF-IDF (up to 52%) and embedding-based representations (up to 46%), at a much lower cost (up to 9.7x faster in some datasets). Other main contributions of our work include a thorough and rigorous evaluation of the trade-offs between cost and effectiveness associated with the introduction of these new steps into the pipeline as well as a comprehensive comparative experimental evaluation of many alternatives in terms of representations, approaches, etc." @default.
- W3022371575 created "2020-05-13" @default.
- W3022371575 creator A5001156386 @default.
- W3022371575 creator A5005630018 @default.
- W3022371575 creator A5008060404 @default.
- W3022371575 creator A5018690278 @default.
- W3022371575 creator A5024201468 @default.
- W3022371575 creator A5024659734 @default.
- W3022371575 creator A5046370637 @default.
- W3022371575 creator A5046683090 @default.
- W3022371575 creator A5057646226 @default.
- W3022371575 creator A5087565388 @default.
- W3022371575 date "2020-07-01" @default.
- W3022371575 modified "2023-10-14" @default.
- W3022371575 title "Extended pre-processing pipeline for text classification: On the role of meta-feature representations, sparsification and selective sampling" @default.
- W3022371575 cites W1437335841 @default.
- W3022371575 cites W1604460927 @default.
- W3022371575 cites W1978394996 @default.
- W3022371575 cites W2015738394 @default.
- W3022371575 cites W2015887370 @default.
- W3022371575 cites W2091118421 @default.
- W3022371575 cites W2153758664 @default.
- W3022371575 cites W2165388595 @default.
- W3022371575 cites W2170505850 @default.
- W3022371575 cites W2396382682 @default.
- W3022371575 cites W2752946155 @default.
- W3022371575 cites W2795240784 @default.
- W3022371575 cites W2910577570 @default.
- W3022371575 cites W2911964244 @default.
- W3022371575 cites W2946185384 @default.
- W3022371575 doi "https://doi.org/10.1016/j.ipm.2020.102263" @default.
- W3022371575 hasPublicationYear "2020" @default.
- W3022371575 type Work @default.
- W3022371575 sameAs 3022371575 @default.
- W3022371575 citedByCount "20" @default.
- W3022371575 countsByYear W30223715752021 @default.
- W3022371575 countsByYear W30223715752022 @default.
- W3022371575 countsByYear W30223715752023 @default.
- W3022371575 crossrefType "journal-article" @default.
- W3022371575 hasAuthorship W3022371575A5001156386 @default.
- W3022371575 hasAuthorship W3022371575A5005630018 @default.
- W3022371575 hasAuthorship W3022371575A5008060404 @default.
- W3022371575 hasAuthorship W3022371575A5018690278 @default.
- W3022371575 hasAuthorship W3022371575A5024201468 @default.
- W3022371575 hasAuthorship W3022371575A5024659734 @default.
- W3022371575 hasAuthorship W3022371575A5046370637 @default.
- W3022371575 hasAuthorship W3022371575A5046683090 @default.
- W3022371575 hasAuthorship W3022371575A5057646226 @default.
- W3022371575 hasAuthorship W3022371575A5087565388 @default.
- W3022371575 hasConcept C111030470 @default.
- W3022371575 hasConcept C115961682 @default.
- W3022371575 hasConcept C119857082 @default.
- W3022371575 hasConcept C124101348 @default.
- W3022371575 hasConcept C127413603 @default.
- W3022371575 hasConcept C138885662 @default.
- W3022371575 hasConcept C154945302 @default.
- W3022371575 hasConcept C165696696 @default.
- W3022371575 hasConcept C175309249 @default.
- W3022371575 hasConcept C177264268 @default.
- W3022371575 hasConcept C17744445 @default.
- W3022371575 hasConcept C199360897 @default.
- W3022371575 hasConcept C199539241 @default.
- W3022371575 hasConcept C23123220 @default.
- W3022371575 hasConcept C2776359362 @default.
- W3022371575 hasConcept C2776401178 @default.
- W3022371575 hasConcept C38652104 @default.
- W3022371575 hasConcept C41008148 @default.
- W3022371575 hasConcept C41608201 @default.
- W3022371575 hasConcept C41895202 @default.
- W3022371575 hasConcept C43521106 @default.
- W3022371575 hasConcept C87717796 @default.
- W3022371575 hasConcept C94625758 @default.
- W3022371575 hasConcept C97931131 @default.
- W3022371575 hasConcept C99498987 @default.
- W3022371575 hasConceptScore W3022371575C111030470 @default.
- W3022371575 hasConceptScore W3022371575C115961682 @default.
- W3022371575 hasConceptScore W3022371575C119857082 @default.
- W3022371575 hasConceptScore W3022371575C124101348 @default.
- W3022371575 hasConceptScore W3022371575C127413603 @default.
- W3022371575 hasConceptScore W3022371575C138885662 @default.
- W3022371575 hasConceptScore W3022371575C154945302 @default.
- W3022371575 hasConceptScore W3022371575C165696696 @default.
- W3022371575 hasConceptScore W3022371575C175309249 @default.
- W3022371575 hasConceptScore W3022371575C177264268 @default.
- W3022371575 hasConceptScore W3022371575C17744445 @default.
- W3022371575 hasConceptScore W3022371575C199360897 @default.
- W3022371575 hasConceptScore W3022371575C199539241 @default.
- W3022371575 hasConceptScore W3022371575C23123220 @default.
- W3022371575 hasConceptScore W3022371575C2776359362 @default.
- W3022371575 hasConceptScore W3022371575C2776401178 @default.
- W3022371575 hasConceptScore W3022371575C38652104 @default.
- W3022371575 hasConceptScore W3022371575C41008148 @default.
- W3022371575 hasConceptScore W3022371575C41608201 @default.
- W3022371575 hasConceptScore W3022371575C41895202 @default.
- W3022371575 hasConceptScore W3022371575C43521106 @default.
- W3022371575 hasConceptScore W3022371575C87717796 @default.
- W3022371575 hasConceptScore W3022371575C94625758 @default.
- W3022371575 hasConceptScore W3022371575C97931131 @default.