Matches in SemOpenAlex for { <https://semopenalex.org/work/W3186756857> ?p ?o ?g. }
- W3186756857 endingPage "17" @default.
- W3186756857 startingPage "1" @default.
- W3186756857 abstract "Data integration is an important component of Big Data analytics. One of the key challenges in data integration is record linkage, that is, matching records that represent the same real-world entity. Because of computational costs, methods referred to as blocking are employed as a part of the record linkage pipeline in order to reduce the number of comparisons among records. In the past decade, a range of blocking techniques have been proposed. Real-world applications require approaches that can handle heterogeneous data sources and do not rely on labelled data. We propose high-value token-blocking (HVTB), a simple and efficient approach for blocking that is unsupervised and schema-agnostic, based on a crafted use of Term Frequency-Inverse Document Frequency. We compare HVTB with multiple methods and over a range of datasets, including a novel unstructured dataset composed of titles and abstracts of scientific papers. We thoroughly discuss results in terms of accuracy, use of computational resources, and different characteristics of datasets and records. The simplicity of HVTB yields fast computations and does not harm its accuracy when compared with existing approaches. It is shown to be significantly superior to other methods, suggesting that simpler methods for blocking should be considered before resorting to more sophisticated methods." @default.
- W3186756857 created "2021-08-02" @default.
- W3186756857 creator A5004994425 @default.
- W3186756857 creator A5044425522 @default.
- W3186756857 creator A5047048537 @default.
- W3186756857 date "2021-07-21" @default.
- W3186756857 modified "2023-09-25" @default.
- W3186756857 title "High-Value Token-Blocking: Efficient Blocking Method for Record Linkage" @default.
- W3186756857 cites W1124123724 @default.
- W3186756857 cites W1134508972 @default.
- W3186756857 cites W1612155886 @default.
- W3186756857 cites W1922373164 @default.
- W3186756857 cites W1992930793 @default.
- W3186756857 cites W1997927541 @default.
- W3186756857 cites W1998600401 @default.
- W3186756857 cites W2014964486 @default.
- W3186756857 cites W2036216970 @default.
- W3186756857 cites W2073539176 @default.
- W3186756857 cites W2079649893 @default.
- W3186756857 cites W2109834209 @default.
- W3186756857 cites W2111116800 @default.
- W3186756857 cites W2114764731 @default.
- W3186756857 cites W2152502401 @default.
- W3186756857 cites W2163616098 @default.
- W3186756857 cites W2182703380 @default.
- W3186756857 cites W2399361902 @default.
- W3186756857 cites W2441805796 @default.
- W3186756857 cites W2529367823 @default.
- W3186756857 cites W2535168187 @default.
- W3186756857 cites W2733471169 @default.
- W3186756857 cites W2750964846 @default.
- W3186756857 cites W2794107983 @default.
- W3186756857 cites W2795151173 @default.
- W3186756857 cites W2946741276 @default.
- W3186756857 cites W2957204582 @default.
- W3186756857 cites W3119940851 @default.
- W3186756857 doi "https://doi.org/10.1145/3450527" @default.
- W3186756857 hasPublicationYear "2021" @default.
- W3186756857 type Work @default.
- W3186756857 sameAs 3186756857 @default.
- W3186756857 citedByCount "1" @default.
- W3186756857 countsByYear W31867568572023 @default.
- W3186756857 crossrefType "journal-article" @default.
- W3186756857 hasAuthorship W3186756857A5004994425 @default.
- W3186756857 hasAuthorship W3186756857A5044425522 @default.
- W3186756857 hasAuthorship W3186756857A5047048537 @default.
- W3186756857 hasBestOaLocation W31867568572 @default.
- W3186756857 hasConcept C105795698 @default.
- W3186756857 hasConcept C119857082 @default.
- W3186756857 hasConcept C124101348 @default.
- W3186756857 hasConcept C142210648 @default.
- W3186756857 hasConcept C144024400 @default.
- W3186756857 hasConcept C144745244 @default.
- W3186756857 hasConcept C149923435 @default.
- W3186756857 hasConcept C165064840 @default.
- W3186756857 hasConcept C199360897 @default.
- W3186756857 hasConcept C2908647359 @default.
- W3186756857 hasConcept C31258907 @default.
- W3186756857 hasConcept C33923547 @default.
- W3186756857 hasConcept C38652104 @default.
- W3186756857 hasConcept C41008148 @default.
- W3186756857 hasConcept C43521106 @default.
- W3186756857 hasConcept C48145219 @default.
- W3186756857 hasConcept C52146309 @default.
- W3186756857 hasConcept C75684735 @default.
- W3186756857 hasConcept C79158427 @default.
- W3186756857 hasConceptScore W3186756857C105795698 @default.
- W3186756857 hasConceptScore W3186756857C119857082 @default.
- W3186756857 hasConceptScore W3186756857C124101348 @default.
- W3186756857 hasConceptScore W3186756857C142210648 @default.
- W3186756857 hasConceptScore W3186756857C144024400 @default.
- W3186756857 hasConceptScore W3186756857C144745244 @default.
- W3186756857 hasConceptScore W3186756857C149923435 @default.
- W3186756857 hasConceptScore W3186756857C165064840 @default.
- W3186756857 hasConceptScore W3186756857C199360897 @default.
- W3186756857 hasConceptScore W3186756857C2908647359 @default.
- W3186756857 hasConceptScore W3186756857C31258907 @default.
- W3186756857 hasConceptScore W3186756857C33923547 @default.
- W3186756857 hasConceptScore W3186756857C38652104 @default.
- W3186756857 hasConceptScore W3186756857C41008148 @default.
- W3186756857 hasConceptScore W3186756857C43521106 @default.
- W3186756857 hasConceptScore W3186756857C48145219 @default.
- W3186756857 hasConceptScore W3186756857C52146309 @default.
- W3186756857 hasConceptScore W3186756857C75684735 @default.
- W3186756857 hasConceptScore W3186756857C79158427 @default.
- W3186756857 hasIssue "2" @default.
- W3186756857 hasLocation W31867568571 @default.
- W3186756857 hasLocation W31867568572 @default.
- W3186756857 hasOpenAccess W3186756857 @default.
- W3186756857 hasPrimaryLocation W31867568571 @default.
- W3186756857 hasRelatedWork W1565636512 @default.
- W3186756857 hasRelatedWork W1985859948 @default.
- W3186756857 hasRelatedWork W259013728 @default.
- W3186756857 hasRelatedWork W2765854211 @default.
- W3186756857 hasRelatedWork W2777139086 @default.
- W3186756857 hasRelatedWork W2868280856 @default.
- W3186756857 hasRelatedWork W3041665197 @default.
- W3186756857 hasRelatedWork W3180094802 @default.