Matches in SemOpenAlex for { <https://semopenalex.org/work/W3107132553> ?p ?o ?g. }
- W3107132553 abstract "We study the problem of discovering joinable datasets at scale. This is, how to automatically discover pairs of attributes in a massive collection of independent, heterogeneous datasets that can be joined. Exact (e.g., based on distinct values) and hash-based (e.g., based on locality-sensitive hashing) techniques require indexing the entire dataset, which is unattainable at scale. To overcome this issue, we approach the problem from a learning perspective relying on profiles. These are succinct representations that capture the underlying characteristics of the schemata and data values of datasets, which can be efficiently extracted in a distributed and parallel fashion. Profiles are then compared, to predict the quality of a join operation among a pair of attributes from different datasets. In contrast to the state-of-the-art, we define a novel notion of join quality that relies on a metric considering both the containment and cardinality proportions between candidate attributes. We implement our approach in a system called NextiaJD, and present extensive experiments to show the predictive performance and computational efficiency of our method. Our experiments show that NextiaJD obtains similar predictive performance to that of hash-based methods, yet we are able to scale-up to larger volumes of data. Also, NextiaJD generates a considerably less amount of false positives, which is a desirable feature at scale." @default.
- W3107132553 created "2020-12-07" @default.
- W3107132553 creator A5013379001 @default.
- W3107132553 creator A5034865842 @default.
- W3107132553 creator A5067693515 @default.
- W3107132553 date "2020-12-01" @default.
- W3107132553 modified "2023-09-23" @default.
- W3107132553 title "Scalable Data Discovery Using Profiles." @default.
- W3107132553 cites W1593317353 @default.
- W3107132553 cites W1647671624 @default.
- W3107132553 cites W1999954155 @default.
- W3107132553 cites W2065259291 @default.
- W3107132553 cites W2118100588 @default.
- W3107132553 cites W2125816831 @default.
- W3107132553 cites W2132069633 @default.
- W3107132553 cites W2606791715 @default.
- W3107132553 cites W2613666425 @default.
- W3107132553 cites W2795302121 @default.
- W3107132553 cites W2810039498 @default.
- W3107132553 cites W2810954846 @default.
- W3107132553 cites W2893303656 @default.
- W3107132553 cites W2911964244 @default.
- W3107132553 cites W2948163032 @default.
- W3107132553 cites W2950817225 @default.
- W3107132553 cites W2962979766 @default.
- W3107132553 cites W2963174348 @default.
- W3107132553 cites W2970992672 @default.
- W3107132553 cites W3007024586 @default.
- W3107132553 cites W3031051334 @default.
- W3107132553 cites W3032545863 @default.
- W3107132553 cites W3099802519 @default.
- W3107132553 cites W760598031 @default.
- W3107132553 hasPublicationYear "2020" @default.
- W3107132553 type Work @default.
- W3107132553 sameAs 3107132553 @default.
- W3107132553 citedByCount "0" @default.
- W3107132553 crossrefType "posted-content" @default.
- W3107132553 hasAuthorship W3107132553A5013379001 @default.
- W3107132553 hasAuthorship W3107132553A5034865842 @default.
- W3107132553 hasAuthorship W3107132553A5067693515 @default.
- W3107132553 hasConcept C119857082 @default.
- W3107132553 hasConcept C121332964 @default.
- W3107132553 hasConcept C124101348 @default.
- W3107132553 hasConcept C154945302 @default.
- W3107132553 hasConcept C162324750 @default.
- W3107132553 hasConcept C176217482 @default.
- W3107132553 hasConcept C21547014 @default.
- W3107132553 hasConcept C2776502983 @default.
- W3107132553 hasConcept C2778755073 @default.
- W3107132553 hasConcept C38652104 @default.
- W3107132553 hasConcept C41008148 @default.
- W3107132553 hasConcept C48044578 @default.
- W3107132553 hasConcept C62520636 @default.
- W3107132553 hasConcept C64869954 @default.
- W3107132553 hasConcept C67388219 @default.
- W3107132553 hasConcept C74270461 @default.
- W3107132553 hasConcept C75165309 @default.
- W3107132553 hasConcept C77088390 @default.
- W3107132553 hasConcept C80444323 @default.
- W3107132553 hasConcept C87117476 @default.
- W3107132553 hasConcept C99138194 @default.
- W3107132553 hasConceptScore W3107132553C119857082 @default.
- W3107132553 hasConceptScore W3107132553C121332964 @default.
- W3107132553 hasConceptScore W3107132553C124101348 @default.
- W3107132553 hasConceptScore W3107132553C154945302 @default.
- W3107132553 hasConceptScore W3107132553C162324750 @default.
- W3107132553 hasConceptScore W3107132553C176217482 @default.
- W3107132553 hasConceptScore W3107132553C21547014 @default.
- W3107132553 hasConceptScore W3107132553C2776502983 @default.
- W3107132553 hasConceptScore W3107132553C2778755073 @default.
- W3107132553 hasConceptScore W3107132553C38652104 @default.
- W3107132553 hasConceptScore W3107132553C41008148 @default.
- W3107132553 hasConceptScore W3107132553C48044578 @default.
- W3107132553 hasConceptScore W3107132553C62520636 @default.
- W3107132553 hasConceptScore W3107132553C64869954 @default.
- W3107132553 hasConceptScore W3107132553C67388219 @default.
- W3107132553 hasConceptScore W3107132553C74270461 @default.
- W3107132553 hasConceptScore W3107132553C75165309 @default.
- W3107132553 hasConceptScore W3107132553C77088390 @default.
- W3107132553 hasConceptScore W3107132553C80444323 @default.
- W3107132553 hasConceptScore W3107132553C87117476 @default.
- W3107132553 hasConceptScore W3107132553C99138194 @default.
- W3107132553 hasLocation W31071325531 @default.
- W3107132553 hasOpenAccess W3107132553 @default.
- W3107132553 hasPrimaryLocation W31071325531 @default.
- W3107132553 hasRelatedWork W144174343 @default.
- W3107132553 hasRelatedWork W1782779911 @default.
- W3107132553 hasRelatedWork W2000292914 @default.
- W3107132553 hasRelatedWork W2035835285 @default.
- W3107132553 hasRelatedWork W2105016630 @default.
- W3107132553 hasRelatedWork W2175022647 @default.
- W3107132553 hasRelatedWork W2405197876 @default.
- W3107132553 hasRelatedWork W2536123479 @default.
- W3107132553 hasRelatedWork W2585538791 @default.
- W3107132553 hasRelatedWork W2625660156 @default.
- W3107132553 hasRelatedWork W2923721615 @default.
- W3107132553 hasRelatedWork W2949748389 @default.
- W3107132553 hasRelatedWork W2950111593 @default.
- W3107132553 hasRelatedWork W2963174348 @default.
- W3107132553 hasRelatedWork W2985759449 @default.