Matches in SemOpenAlex for { <https://semopenalex.org/work/W2023703869> ?p ?o ?g. }
- W2023703869 abstract "We investigate the problem of creating and analyzing samples of relational databases to find relationships between string-valued attributes. Our focus is on identifying attribute pairs whose value sets overlap, a pre-condition for typical joins over such attributes. However, real-world data sets are often 'dirty', especially when integrating data from different sources. To deal with this issue, we propose new similarity measures between sets of strings, which not only consider set based similarity, but also similarity between strings instances. To make the measures effective, we develop efficient algorithms for distributed sample creation and similarity computation. Test results show that for dirty data our measures are more accurate for measuring value overlap than existing sample-based methods, but we also observe that there is a clear tradeoff between accuracy and speed. This motivates a two-stage filtering approach, with both measures operating on the same samples." @default.
- W2023703869 created "2016-06-24" @default.
- W2023703869 creator A5010236208 @default.
- W2023703869 creator A5011384237 @default.
- W2023703869 creator A5011393580 @default.
- W2023703869 creator A5070591850 @default.
- W2023703869 creator A5070864647 @default.
- W2023703869 date "2010-06-06" @default.
- W2023703869 modified "2023-10-08" @default.
- W2023703869 title "Sampling dirty data for matching attributes" @default.
- W2023703869 cites W1984566373 @default.
- W2023703869 cites W2008896880 @default.
- W2023703869 cites W2029554959 @default.
- W2023703869 cites W2029815943 @default.
- W2023703869 cites W2038281398 @default.
- W2023703869 cites W2041912938 @default.
- W2023703869 cites W2042558865 @default.
- W2023703869 cites W2059268928 @default.
- W2023703869 cites W2090403603 @default.
- W2023703869 cites W2108991785 @default.
- W2023703869 cites W2110686900 @default.
- W2023703869 cites W2135878030 @default.
- W2023703869 cites W2140313762 @default.
- W2023703869 cites W2151065878 @default.
- W2023703869 cites W2152565070 @default.
- W2023703869 cites W4237172715 @default.
- W2023703869 cites W4250212406 @default.
- W2023703869 doi "https://doi.org/10.1145/1807167.1807177" @default.
- W2023703869 hasPublicationYear "2010" @default.
- W2023703869 type Work @default.
- W2023703869 sameAs 2023703869 @default.
- W2023703869 citedByCount "21" @default.
- W2023703869 countsByYear W20237038692013 @default.
- W2023703869 countsByYear W20237038692014 @default.
- W2023703869 countsByYear W20237038692015 @default.
- W2023703869 countsByYear W20237038692016 @default.
- W2023703869 countsByYear W20237038692019 @default.
- W2023703869 countsByYear W20237038692020 @default.
- W2023703869 countsByYear W20237038692021 @default.
- W2023703869 crossrefType "proceedings-article" @default.
- W2023703869 hasAuthorship W2023703869A5010236208 @default.
- W2023703869 hasAuthorship W2023703869A5011384237 @default.
- W2023703869 hasAuthorship W2023703869A5011393580 @default.
- W2023703869 hasAuthorship W2023703869A5070591850 @default.
- W2023703869 hasAuthorship W2023703869A5070864647 @default.
- W2023703869 hasConcept C103278499 @default.
- W2023703869 hasConcept C105795698 @default.
- W2023703869 hasConcept C106131492 @default.
- W2023703869 hasConcept C115961682 @default.
- W2023703869 hasConcept C120665830 @default.
- W2023703869 hasConcept C121332964 @default.
- W2023703869 hasConcept C124101348 @default.
- W2023703869 hasConcept C140779682 @default.
- W2023703869 hasConcept C154945302 @default.
- W2023703869 hasConcept C157486923 @default.
- W2023703869 hasConcept C165064840 @default.
- W2023703869 hasConcept C177264268 @default.
- W2023703869 hasConcept C185592680 @default.
- W2023703869 hasConcept C192209626 @default.
- W2023703869 hasConcept C198531522 @default.
- W2023703869 hasConcept C199360897 @default.
- W2023703869 hasConcept C2778692605 @default.
- W2023703869 hasConcept C31972630 @default.
- W2023703869 hasConcept C33923547 @default.
- W2023703869 hasConcept C37914503 @default.
- W2023703869 hasConcept C41008148 @default.
- W2023703869 hasConcept C43617362 @default.
- W2023703869 hasConcept C58489278 @default.
- W2023703869 hasConceptScore W2023703869C103278499 @default.
- W2023703869 hasConceptScore W2023703869C105795698 @default.
- W2023703869 hasConceptScore W2023703869C106131492 @default.
- W2023703869 hasConceptScore W2023703869C115961682 @default.
- W2023703869 hasConceptScore W2023703869C120665830 @default.
- W2023703869 hasConceptScore W2023703869C121332964 @default.
- W2023703869 hasConceptScore W2023703869C124101348 @default.
- W2023703869 hasConceptScore W2023703869C140779682 @default.
- W2023703869 hasConceptScore W2023703869C154945302 @default.
- W2023703869 hasConceptScore W2023703869C157486923 @default.
- W2023703869 hasConceptScore W2023703869C165064840 @default.
- W2023703869 hasConceptScore W2023703869C177264268 @default.
- W2023703869 hasConceptScore W2023703869C185592680 @default.
- W2023703869 hasConceptScore W2023703869C192209626 @default.
- W2023703869 hasConceptScore W2023703869C198531522 @default.
- W2023703869 hasConceptScore W2023703869C199360897 @default.
- W2023703869 hasConceptScore W2023703869C2778692605 @default.
- W2023703869 hasConceptScore W2023703869C31972630 @default.
- W2023703869 hasConceptScore W2023703869C33923547 @default.
- W2023703869 hasConceptScore W2023703869C37914503 @default.
- W2023703869 hasConceptScore W2023703869C41008148 @default.
- W2023703869 hasConceptScore W2023703869C43617362 @default.
- W2023703869 hasConceptScore W2023703869C58489278 @default.
- W2023703869 hasLocation W20237038691 @default.
- W2023703869 hasOpenAccess W2023703869 @default.
- W2023703869 hasPrimaryLocation W20237038691 @default.
- W2023703869 hasRelatedWork W1479898003 @default.
- W2023703869 hasRelatedWork W1705322102 @default.
- W2023703869 hasRelatedWork W1892813713 @default.
- W2023703869 hasRelatedWork W1964490760 @default.
- W2023703869 hasRelatedWork W1967208734 @default.
- W2023703869 hasRelatedWork W1969411444 @default.