Matches in SemOpenAlex for { <https://semopenalex.org/work/W2569876941> ?p ?o ?g. }
- W2569876941 endingPage "411" @default.
- W2569876941 startingPage "396" @default.
- W2569876941 abstract "Because of the increasing volume of autonomously collected data objects, duplicate detection is an important challenge in today's data management. To evaluate the efficiency of duplicate detection algorithms with respect to big data, large test data sets are required. Existing test data generation tools, however, are either not able to produce large test data sets or are domain-dependent which limits their usefulness to a few cases. In this paper, we describe a new framework that can be used to pollute a clean, homogeneous and large data set from an arbitrary domain with duplicates, errors and inhomogeneities. To prove its concept, we implemented a prototype which is built upon the cluster computing framework Apache Spark and evaluate its performance in several experiments." @default.
- W2569876941 created "2017-01-13" @default.
- W2569876941 creator A5008225284 @default.
- W2569876941 creator A5029450152 @default.
- W2569876941 creator A5054128312 @default.
- W2569876941 creator A5084009903 @default.
- W2569876941 date "2020-06-01" @default.
- W2569876941 modified "2023-10-06" @default.
- W2569876941 title "Large-Scale Data Pollution with Apache Spark" @default.
- W2569876941 cites W1612155886 @default.
- W2569876941 cites W1995099886 @default.
- W2569876941 cites W2013909137 @default.
- W2569876941 cites W2018616927 @default.
- W2569876941 cites W2042913039 @default.
- W2569876941 cites W2044280769 @default.
- W2569876941 cites W2053062910 @default.
- W2569876941 cites W2065398649 @default.
- W2569876941 cites W2081037581 @default.
- W2569876941 cites W2102763740 @default.
- W2569876941 cites W2137242774 @default.
- W2569876941 cites W2164637360 @default.
- W2569876941 cites W2428155754 @default.
- W2569876941 cites W3146259567 @default.
- W2569876941 cites W4242744113 @default.
- W2569876941 cites W760598031 @default.
- W2569876941 doi "https://doi.org/10.1109/tbdata.2016.2637378" @default.
- W2569876941 hasPublicationYear "2020" @default.
- W2569876941 type Work @default.
- W2569876941 sameAs 2569876941 @default.
- W2569876941 citedByCount "12" @default.
- W2569876941 countsByYear W25698769412017 @default.
- W2569876941 countsByYear W25698769412020 @default.
- W2569876941 countsByYear W25698769412021 @default.
- W2569876941 countsByYear W25698769412022 @default.
- W2569876941 countsByYear W25698769412023 @default.
- W2569876941 crossrefType "journal-article" @default.
- W2569876941 hasAuthorship W2569876941A5008225284 @default.
- W2569876941 hasAuthorship W2569876941A5029450152 @default.
- W2569876941 hasAuthorship W2569876941A5054128312 @default.
- W2569876941 hasAuthorship W2569876941A5084009903 @default.
- W2569876941 hasConcept C115903868 @default.
- W2569876941 hasConcept C120314980 @default.
- W2569876941 hasConcept C121332964 @default.
- W2569876941 hasConcept C124101348 @default.
- W2569876941 hasConcept C134306372 @default.
- W2569876941 hasConcept C154945302 @default.
- W2569876941 hasConcept C16910744 @default.
- W2569876941 hasConcept C177264268 @default.
- W2569876941 hasConcept C199360897 @default.
- W2569876941 hasConcept C20556612 @default.
- W2569876941 hasConcept C2778755073 @default.
- W2569876941 hasConcept C2781215313 @default.
- W2569876941 hasConcept C29140674 @default.
- W2569876941 hasConcept C33923547 @default.
- W2569876941 hasConcept C36503486 @default.
- W2569876941 hasConcept C41008148 @default.
- W2569876941 hasConcept C58489278 @default.
- W2569876941 hasConcept C62520636 @default.
- W2569876941 hasConcept C75684735 @default.
- W2569876941 hasConcept C77088390 @default.
- W2569876941 hasConceptScore W2569876941C115903868 @default.
- W2569876941 hasConceptScore W2569876941C120314980 @default.
- W2569876941 hasConceptScore W2569876941C121332964 @default.
- W2569876941 hasConceptScore W2569876941C124101348 @default.
- W2569876941 hasConceptScore W2569876941C134306372 @default.
- W2569876941 hasConceptScore W2569876941C154945302 @default.
- W2569876941 hasConceptScore W2569876941C16910744 @default.
- W2569876941 hasConceptScore W2569876941C177264268 @default.
- W2569876941 hasConceptScore W2569876941C199360897 @default.
- W2569876941 hasConceptScore W2569876941C20556612 @default.
- W2569876941 hasConceptScore W2569876941C2778755073 @default.
- W2569876941 hasConceptScore W2569876941C2781215313 @default.
- W2569876941 hasConceptScore W2569876941C29140674 @default.
- W2569876941 hasConceptScore W2569876941C33923547 @default.
- W2569876941 hasConceptScore W2569876941C36503486 @default.
- W2569876941 hasConceptScore W2569876941C41008148 @default.
- W2569876941 hasConceptScore W2569876941C58489278 @default.
- W2569876941 hasConceptScore W2569876941C62520636 @default.
- W2569876941 hasConceptScore W2569876941C75684735 @default.
- W2569876941 hasConceptScore W2569876941C77088390 @default.
- W2569876941 hasIssue "2" @default.
- W2569876941 hasLocation W25698769411 @default.
- W2569876941 hasOpenAccess W2569876941 @default.
- W2569876941 hasPrimaryLocation W25698769411 @default.
- W2569876941 hasRelatedWork W2043890830 @default.
- W2569876941 hasRelatedWork W2510748453 @default.
- W2569876941 hasRelatedWork W2614996766 @default.
- W2569876941 hasRelatedWork W2734587838 @default.
- W2569876941 hasRelatedWork W2890057416 @default.
- W2569876941 hasRelatedWork W2917146715 @default.
- W2569876941 hasRelatedWork W3118882535 @default.
- W2569876941 hasRelatedWork W3183776484 @default.
- W2569876941 hasRelatedWork W4292102651 @default.
- W2569876941 hasRelatedWork W4312783750 @default.
- W2569876941 hasVolume "6" @default.
- W2569876941 isParatext "false" @default.
- W2569876941 isRetracted "false" @default.
- W2569876941 magId "2569876941" @default.