Matches in SemOpenAlex for { <https://semopenalex.org/work/W3021330142> ?p ?o ?g. }
Showing items 1 to 100 of
100
with 100 items per page.
- W3021330142 endingPage "87928" @default.
- W3021330142 startingPage "87918" @default.
- W3021330142 abstract "It is recognized the importance of knowing the descriptive properties of a dataset when tackling a data science problem. Having information about the redundancy, complexity and density of a problem allows us to make decisions as to which data preprocessing and machine learning techniques are most suitable. In classification problems, there are multiple metrics to describe the overlapping of the features between classes, class imbalances or separability, among others. However, these metrics may not scale up well when dealing with big datasets, or may not simply be sufficiently informative in this context. In this paper, we provide a package of metrics for big data classification problems. In particular, we propose two new big data metrics: Neighborhood Density and Decision Tree Progression, which study density and accuracy progression by discarding half of the samples. In addition, we enable a number of basic metrics to handle big data. The experimental study carried out in standard big data classification problems shows that our metrics can quickly characterize big datasets. We identified a clear redundancy of information in most datasets, so that, discarding randomly 75% of the samples does not drastically affect the accuracy of the classifiers used. Thus, the proposed big data metrics, which are available as a Spark-Package, provide a fast assessment of the shape of a classification dataset prior to applying big data preprocessing, toward smart data." @default.
- W3021330142 created "2020-05-13" @default.
- W3021330142 creator A5012082459 @default.
- W3021330142 creator A5045016749 @default.
- W3021330142 creator A5075541426 @default.
- W3021330142 date "2020-01-01" @default.
- W3021330142 modified "2023-10-02" @default.
- W3021330142 title "Redundancy and Complexity Metrics for Big Data Classification: Towards Smart Data" @default.
- W3021330142 cites W1575103269 @default.
- W3021330142 cites W1964402642 @default.
- W3021330142 cites W1990165991 @default.
- W3021330142 cites W2003984511 @default.
- W3021330142 cites W2082302018 @default.
- W3021330142 cites W2122111042 @default.
- W3021330142 cites W2157355837 @default.
- W3021330142 cites W2162210260 @default.
- W3021330142 cites W2398841771 @default.
- W3021330142 cites W2432436793 @default.
- W3021330142 cites W2510026128 @default.
- W3021330142 cites W2527889223 @default.
- W3021330142 cites W2555986604 @default.
- W3021330142 cites W2560528144 @default.
- W3021330142 cites W2620975687 @default.
- W3021330142 cites W2884090605 @default.
- W3021330142 cites W2902777483 @default.
- W3021330142 cites W2902834302 @default.
- W3021330142 cites W2945790622 @default.
- W3021330142 cites W2969219550 @default.
- W3021330142 cites W2973136425 @default.
- W3021330142 cites W2979526564 @default.
- W3021330142 cites W4213308398 @default.
- W3021330142 cites W4243367342 @default.
- W3021330142 cites W575847903 @default.
- W3021330142 cites W91088564 @default.
- W3021330142 doi "https://doi.org/10.1109/access.2020.2991800" @default.
- W3021330142 hasPublicationYear "2020" @default.
- W3021330142 type Work @default.
- W3021330142 sameAs 3021330142 @default.
- W3021330142 citedByCount "18" @default.
- W3021330142 countsByYear W30213301422020 @default.
- W3021330142 countsByYear W30213301422021 @default.
- W3021330142 countsByYear W30213301422022 @default.
- W3021330142 countsByYear W30213301422023 @default.
- W3021330142 crossrefType "journal-article" @default.
- W3021330142 hasAuthorship W3021330142A5012082459 @default.
- W3021330142 hasAuthorship W3021330142A5045016749 @default.
- W3021330142 hasAuthorship W3021330142A5075541426 @default.
- W3021330142 hasBestOaLocation W30213301421 @default.
- W3021330142 hasConcept C10551718 @default.
- W3021330142 hasConcept C110083411 @default.
- W3021330142 hasConcept C111919701 @default.
- W3021330142 hasConcept C119857082 @default.
- W3021330142 hasConcept C124101348 @default.
- W3021330142 hasConcept C152124472 @default.
- W3021330142 hasConcept C154945302 @default.
- W3021330142 hasConcept C199360897 @default.
- W3021330142 hasConcept C2781215313 @default.
- W3021330142 hasConcept C34736171 @default.
- W3021330142 hasConcept C41008148 @default.
- W3021330142 hasConcept C7545210 @default.
- W3021330142 hasConcept C75684735 @default.
- W3021330142 hasConcept C77088390 @default.
- W3021330142 hasConcept C84525736 @default.
- W3021330142 hasConceptScore W3021330142C10551718 @default.
- W3021330142 hasConceptScore W3021330142C110083411 @default.
- W3021330142 hasConceptScore W3021330142C111919701 @default.
- W3021330142 hasConceptScore W3021330142C119857082 @default.
- W3021330142 hasConceptScore W3021330142C124101348 @default.
- W3021330142 hasConceptScore W3021330142C152124472 @default.
- W3021330142 hasConceptScore W3021330142C154945302 @default.
- W3021330142 hasConceptScore W3021330142C199360897 @default.
- W3021330142 hasConceptScore W3021330142C2781215313 @default.
- W3021330142 hasConceptScore W3021330142C34736171 @default.
- W3021330142 hasConceptScore W3021330142C41008148 @default.
- W3021330142 hasConceptScore W3021330142C7545210 @default.
- W3021330142 hasConceptScore W3021330142C75684735 @default.
- W3021330142 hasConceptScore W3021330142C77088390 @default.
- W3021330142 hasConceptScore W3021330142C84525736 @default.
- W3021330142 hasLocation W30213301421 @default.
- W3021330142 hasLocation W30213301422 @default.
- W3021330142 hasLocation W30213301423 @default.
- W3021330142 hasOpenAccess W3021330142 @default.
- W3021330142 hasPrimaryLocation W30213301421 @default.
- W3021330142 hasRelatedWork W2053037595 @default.
- W3021330142 hasRelatedWork W2393709510 @default.
- W3021330142 hasRelatedWork W2803609773 @default.
- W3021330142 hasRelatedWork W3010890513 @default.
- W3021330142 hasRelatedWork W3096082097 @default.
- W3021330142 hasRelatedWork W3195341917 @default.
- W3021330142 hasRelatedWork W4213068940 @default.
- W3021330142 hasRelatedWork W4312632137 @default.
- W3021330142 hasRelatedWork W4316082183 @default.
- W3021330142 hasRelatedWork W3111893788 @default.
- W3021330142 hasVolume "8" @default.
- W3021330142 isParatext "false" @default.
- W3021330142 isRetracted "false" @default.
- W3021330142 magId "3021330142" @default.
- W3021330142 workType "article" @default.