Matches in SemOpenAlex for { <https://semopenalex.org/work/W26430610> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W26430610 abstract "Duplicate detection is the problem of identifying pairs of records that represent the same real world object, and could thus be merged into a single record. To avoid a prohibitively expensive comparison of all pairs of records, a common technique is to carefully partition the records into smaller subsets. If duplicate records appear in the same partition, only all pairs within each partition must be compared. Two competing approaches are often cited: Blocking methods strictly partition records into disjoint subsets, for instance using zip-codes as partitioning key. Windowing methods, in particular the Sorted-Neighborhood method, sort the data according to some key, such as zip-code, and then slide a window of xed size across the sorted data and compare pairs only within the window. Herein we compare both approaches qualitatively and experimentally. Further, we present a new generalized algorithm, the Sorted Blocks method, with the competing methods as extreme cases. Experiments show that the windowing algorithm is better than blocking and that the generalized algorithm slightly improves upon it in terms of eciency (detected duplicates vs. overall number of comparisons)." @default.
- W26430610 created "2016-06-24" @default.
- W26430610 creator A5053028480 @default.
- W26430610 creator A5076248342 @default.
- W26430610 date "2009-01-01" @default.
- W26430610 modified "2023-09-26" @default.
- W26430610 title "A Comparison and Generalization of Blocking and Windowing Algorithms for Duplicate Detection" @default.
- W26430610 cites W1559390933 @default.
- W26430610 cites W1569123402 @default.
- W26430610 cites W1612155886 @default.
- W26430610 cites W1845040079 @default.
- W26430610 cites W1979954747 @default.
- W26430610 cites W2024770506 @default.
- W26430610 cites W2108991785 @default.
- W26430610 cites W2111116800 @default.
- W26430610 cites W2138745909 @default.
- W26430610 cites W2166988329 @default.
- W26430610 cites W2171574281 @default.
- W26430610 hasPublicationYear "2009" @default.
- W26430610 type Work @default.
- W26430610 sameAs 26430610 @default.
- W26430610 citedByCount "18" @default.
- W26430610 countsByYear W264306102013 @default.
- W26430610 countsByYear W264306102015 @default.
- W26430610 countsByYear W264306102016 @default.
- W26430610 countsByYear W264306102018 @default.
- W26430610 countsByYear W264306102021 @default.
- W26430610 crossrefType "journal-article" @default.
- W26430610 hasAuthorship W26430610A5053028480 @default.
- W26430610 hasAuthorship W26430610A5076248342 @default.
- W26430610 hasConcept C11413529 @default.
- W26430610 hasConcept C114614502 @default.
- W26430610 hasConcept C134306372 @default.
- W26430610 hasConcept C144745244 @default.
- W26430610 hasConcept C177148314 @default.
- W26430610 hasConcept C23123220 @default.
- W26430610 hasConcept C31258907 @default.
- W26430610 hasConcept C33923547 @default.
- W26430610 hasConcept C41008148 @default.
- W26430610 hasConcept C42812 @default.
- W26430610 hasConcept C45340560 @default.
- W26430610 hasConcept C88548561 @default.
- W26430610 hasConceptScore W26430610C11413529 @default.
- W26430610 hasConceptScore W26430610C114614502 @default.
- W26430610 hasConceptScore W26430610C134306372 @default.
- W26430610 hasConceptScore W26430610C144745244 @default.
- W26430610 hasConceptScore W26430610C177148314 @default.
- W26430610 hasConceptScore W26430610C23123220 @default.
- W26430610 hasConceptScore W26430610C31258907 @default.
- W26430610 hasConceptScore W26430610C33923547 @default.
- W26430610 hasConceptScore W26430610C41008148 @default.
- W26430610 hasConceptScore W26430610C42812 @default.
- W26430610 hasConceptScore W26430610C45340560 @default.
- W26430610 hasConceptScore W26430610C88548561 @default.
- W26430610 hasLocation W264306101 @default.
- W26430610 hasOpenAccess W26430610 @default.
- W26430610 hasPrimaryLocation W264306101 @default.
- W26430610 hasRelatedWork W118209581 @default.
- W26430610 hasRelatedWork W1540269031 @default.
- W26430610 hasRelatedWork W1547612978 @default.
- W26430610 hasRelatedWork W1597164057 @default.
- W26430610 hasRelatedWork W1612155886 @default.
- W26430610 hasRelatedWork W1981590391 @default.
- W26430610 hasRelatedWork W2024770506 @default.
- W26430610 hasRelatedWork W2031250218 @default.
- W26430610 hasRelatedWork W2036216970 @default.
- W26430610 hasRelatedWork W2046020929 @default.
- W26430610 hasRelatedWork W2073471108 @default.
- W26430610 hasRelatedWork W2104511295 @default.
- W26430610 hasRelatedWork W2108991785 @default.
- W26430610 hasRelatedWork W2111116800 @default.
- W26430610 hasRelatedWork W2117974736 @default.
- W26430610 hasRelatedWork W2123561513 @default.
- W26430610 hasRelatedWork W2148019918 @default.
- W26430610 hasRelatedWork W2164456230 @default.
- W26430610 hasRelatedWork W2166988329 @default.
- W26430610 hasRelatedWork W2171574281 @default.
- W26430610 isParatext "false" @default.
- W26430610 isRetracted "false" @default.
- W26430610 magId "26430610" @default.
- W26430610 workType "article" @default.