Matches in SemOpenAlex for { <https://semopenalex.org/work/W4288365786> ?p ?o ?g. }
Showing items 1 to 64 of
64
with 100 items per page.
- W4288365786 abstract "Data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training. However, to date, there does not exist a rigorous study on how exactly cleaning affects ML -- ML community usually focuses on developing ML algorithms that are robust to some particular noise types of certain distributions, while database (DB) community has been mostly studying the problem of data cleaning alone without considering how data is consumed by downstream ML analytics. We propose a CleanML study that systematically investigates the impact of data cleaning on ML classification tasks. The open-source and extensible CleanML study currently includes 14 real-world datasets with real errors, five common error types, seven different ML models, and multiple cleaning algorithms for each error type (including both commonly used algorithms in practice as well as state-of-the-art solutions in academic literature). We control the randomness in ML experiments using statistical hypothesis testing, and we also control false discovery rate in our experiments using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a systematic way to derive many interesting and nontrivial observations. We also put forward multiple research directions for researchers." @default.
- W4288365786 created "2022-07-29" @default.
- W4288365786 creator A5010776860 @default.
- W4288365786 creator A5034733471 @default.
- W4288365786 creator A5040657484 @default.
- W4288365786 creator A5041232755 @default.
- W4288365786 creator A5070651426 @default.
- W4288365786 creator A5071458554 @default.
- W4288365786 date "2019-04-20" @default.
- W4288365786 modified "2023-10-16" @default.
- W4288365786 title "CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks" @default.
- W4288365786 doi "https://doi.org/10.48550/arxiv.1904.09483" @default.
- W4288365786 hasPublicationYear "2019" @default.
- W4288365786 type Work @default.
- W4288365786 citedByCount "0" @default.
- W4288365786 crossrefType "posted-content" @default.
- W4288365786 hasAuthorship W4288365786A5010776860 @default.
- W4288365786 hasAuthorship W4288365786A5034733471 @default.
- W4288365786 hasAuthorship W4288365786A5040657484 @default.
- W4288365786 hasAuthorship W4288365786A5041232755 @default.
- W4288365786 hasAuthorship W4288365786A5070651426 @default.
- W4288365786 hasAuthorship W4288365786A5071458554 @default.
- W4288365786 hasBestOaLocation W42883657861 @default.
- W4288365786 hasConcept C105795698 @default.
- W4288365786 hasConcept C115961682 @default.
- W4288365786 hasConcept C119857082 @default.
- W4288365786 hasConcept C124101348 @default.
- W4288365786 hasConcept C125112378 @default.
- W4288365786 hasConcept C154945302 @default.
- W4288365786 hasConcept C2775924081 @default.
- W4288365786 hasConcept C33923547 @default.
- W4288365786 hasConcept C40969351 @default.
- W4288365786 hasConcept C41008148 @default.
- W4288365786 hasConcept C79158427 @default.
- W4288365786 hasConcept C99498987 @default.
- W4288365786 hasConceptScore W4288365786C105795698 @default.
- W4288365786 hasConceptScore W4288365786C115961682 @default.
- W4288365786 hasConceptScore W4288365786C119857082 @default.
- W4288365786 hasConceptScore W4288365786C124101348 @default.
- W4288365786 hasConceptScore W4288365786C125112378 @default.
- W4288365786 hasConceptScore W4288365786C154945302 @default.
- W4288365786 hasConceptScore W4288365786C2775924081 @default.
- W4288365786 hasConceptScore W4288365786C33923547 @default.
- W4288365786 hasConceptScore W4288365786C40969351 @default.
- W4288365786 hasConceptScore W4288365786C41008148 @default.
- W4288365786 hasConceptScore W4288365786C79158427 @default.
- W4288365786 hasConceptScore W4288365786C99498987 @default.
- W4288365786 hasLocation W42883657861 @default.
- W4288365786 hasLocation W42883657862 @default.
- W4288365786 hasOpenAccess W4288365786 @default.
- W4288365786 hasPrimaryLocation W42883657861 @default.
- W4288365786 hasRelatedWork W2961085424 @default.
- W4288365786 hasRelatedWork W3046775127 @default.
- W4288365786 hasRelatedWork W3107474891 @default.
- W4288365786 hasRelatedWork W3156279460 @default.
- W4288365786 hasRelatedWork W4205958290 @default.
- W4288365786 hasRelatedWork W4285260836 @default.
- W4288365786 hasRelatedWork W4286629047 @default.
- W4288365786 hasRelatedWork W4306321456 @default.
- W4288365786 hasRelatedWork W4306674287 @default.
- W4288365786 hasRelatedWork W4224009465 @default.
- W4288365786 isParatext "false" @default.
- W4288365786 isRetracted "false" @default.
- W4288365786 workType "article" @default.