Matches in SemOpenAlex for { <https://semopenalex.org/work/W2753804356> ?p ?o ?g. }
Showing items 1 to 91 of
91
with 100 items per page.
- W2753804356 abstract "Duplicate detection, i.e., the discovery of records that refer to the same real-world entity, is a task that usually depends on multiple input parameters by an expert. Most notably, an expert must specify some similarity measure and some threshold that declares duplicity for record pairs if their similarity surpasses it. Both are typically developed in a trial-and-error based manner with a given (sample) dataset. We posit that the similarity measure largely depends on the nature of the data and its contained errors that cause the duplicates, but that the threshold largely depends on the size of the dataset it was tested on. In consequence, configurations of duplicate detection runs work well on the test dataset, but perform worse if the size of the dataset changes. This weakness is due to the transitive nature of duplicity: In larger datasets transitivity can cause more records to enter a duplicate cluster than intended. We analyze this interesting effect extensively on four popular test datasets using different duplicate detection algorithms and report on our observations." @default.
- W2753804356 created "2017-09-15" @default.
- W2753804356 creator A5053028480 @default.
- W2753804356 creator A5076248342 @default.
- W2753804356 date "2013-01-01" @default.
- W2753804356 modified "2023-09-26" @default.
- W2753804356 title "On choosing thresholds for duplicate detection." @default.
- W2753804356 cites W1235128746 @default.
- W2753804356 cites W1518784700 @default.
- W2753804356 cites W1540269031 @default.
- W2753804356 cites W1612155886 @default.
- W2753804356 cites W1647671624 @default.
- W2753804356 cites W1973734023 @default.
- W2753804356 cites W1981590391 @default.
- W2753804356 cites W2011940398 @default.
- W2753804356 cites W2031250218 @default.
- W2753804356 cites W2046020929 @default.
- W2753804356 cites W2108991785 @default.
- W2753804356 cites W2111116800 @default.
- W2753804356 cites W2138745909 @default.
- W2753804356 cites W2148524305 @default.
- W2753804356 cites W2164456230 @default.
- W2753804356 cites W2318063279 @default.
- W2753804356 cites W26430610 @default.
- W2753804356 hasPublicationYear "2013" @default.
- W2753804356 type Work @default.
- W2753804356 sameAs 2753804356 @default.
- W2753804356 citedByCount "1" @default.
- W2753804356 countsByYear W27538043562021 @default.
- W2753804356 crossrefType "journal-article" @default.
- W2753804356 hasAuthorship W2753804356A5053028480 @default.
- W2753804356 hasAuthorship W2753804356A5076248342 @default.
- W2753804356 hasConcept C103278499 @default.
- W2753804356 hasConcept C114614502 @default.
- W2753804356 hasConcept C115961682 @default.
- W2753804356 hasConcept C119857082 @default.
- W2753804356 hasConcept C124101348 @default.
- W2753804356 hasConcept C153180895 @default.
- W2753804356 hasConcept C154945302 @default.
- W2753804356 hasConcept C162324750 @default.
- W2753804356 hasConcept C187736073 @default.
- W2753804356 hasConcept C191399111 @default.
- W2753804356 hasConcept C23123220 @default.
- W2753804356 hasConcept C2776517306 @default.
- W2753804356 hasConcept C2780009758 @default.
- W2753804356 hasConcept C2780451532 @default.
- W2753804356 hasConcept C33923547 @default.
- W2753804356 hasConcept C41008148 @default.
- W2753804356 hasConceptScore W2753804356C103278499 @default.
- W2753804356 hasConceptScore W2753804356C114614502 @default.
- W2753804356 hasConceptScore W2753804356C115961682 @default.
- W2753804356 hasConceptScore W2753804356C119857082 @default.
- W2753804356 hasConceptScore W2753804356C124101348 @default.
- W2753804356 hasConceptScore W2753804356C153180895 @default.
- W2753804356 hasConceptScore W2753804356C154945302 @default.
- W2753804356 hasConceptScore W2753804356C162324750 @default.
- W2753804356 hasConceptScore W2753804356C187736073 @default.
- W2753804356 hasConceptScore W2753804356C191399111 @default.
- W2753804356 hasConceptScore W2753804356C23123220 @default.
- W2753804356 hasConceptScore W2753804356C2776517306 @default.
- W2753804356 hasConceptScore W2753804356C2780009758 @default.
- W2753804356 hasConceptScore W2753804356C2780451532 @default.
- W2753804356 hasConceptScore W2753804356C33923547 @default.
- W2753804356 hasConceptScore W2753804356C41008148 @default.
- W2753804356 hasLocation W27538043561 @default.
- W2753804356 hasOpenAccess W2753804356 @default.
- W2753804356 hasPrimaryLocation W27538043561 @default.
- W2753804356 hasRelatedWork W1691663295 @default.
- W2753804356 hasRelatedWork W1798324170 @default.
- W2753804356 hasRelatedWork W2031451749 @default.
- W2753804356 hasRelatedWork W2055405704 @default.
- W2753804356 hasRelatedWork W2107399340 @default.
- W2753804356 hasRelatedWork W2114327002 @default.
- W2753804356 hasRelatedWork W2145503758 @default.
- W2753804356 hasRelatedWork W2894986625 @default.
- W2753804356 hasRelatedWork W2897847603 @default.
- W2753804356 hasRelatedWork W2945455575 @default.
- W2753804356 hasRelatedWork W2949648941 @default.
- W2753804356 hasRelatedWork W2951676852 @default.
- W2753804356 hasRelatedWork W2953156120 @default.
- W2753804356 hasRelatedWork W2954973147 @default.
- W2753804356 hasRelatedWork W2987871614 @default.
- W2753804356 hasRelatedWork W2989818459 @default.
- W2753804356 hasRelatedWork W3176117280 @default.
- W2753804356 hasRelatedWork W3185147224 @default.
- W2753804356 hasRelatedWork W649358261 @default.
- W2753804356 hasRelatedWork W849473654 @default.
- W2753804356 isParatext "false" @default.
- W2753804356 isRetracted "false" @default.
- W2753804356 magId "2753804356" @default.
- W2753804356 workType "article" @default.