Matches in SemOpenAlex for { <https://semopenalex.org/work/W2736701013> ?p ?o ?g. }
- W2736701013 abstract "Set similarity join is a fundamental and well-studied database operator. It is usually studied in the exact setting where the goal is to compute all pairs of sets that exceed a given similarity threshold (measured e.g. as Jaccard similarity). But set similarity join is often used in settings where 100% recall may not be important --- indeed, where the exact set similarity join is itself only an approximation of the desired result set. We present a new randomized algorithm for set similarity join that can achieve any desired recall up to 100%, and show theoretically and empirically that it significantly improves on existing methods. The present state-of-the-art exact methods are based on prefix-filtering, the performance of which depends on the data set having many rare tokens. Our method is robust against the absence of such structure in the data. At 90% recall our algorithm is often more than an order of magnitude faster than state-of-the-art exact methods, depending on how well a data set lends itself to prefix filtering. Our experiments on benchmark data sets also show that the method is several times faster than comparable approximate methods. Our algorithm makes use of recent theoretical advances in high-dimensional sketching and indexing that we believe to be of wider relevance to the data engineering community." @default.
- W2736701013 created "2017-07-31" @default.
- W2736701013 creator A5002057389 @default.
- W2736701013 creator A5014293815 @default.
- W2736701013 creator A5083645347 @default.
- W2736701013 date "2017-07-21" @default.
- W2736701013 modified "2023-09-27" @default.
- W2736701013 title "Scalable and robust set similarity join" @default.
- W2736701013 cites W1502916507 @default.
- W2736701013 cites W158955491 @default.
- W2736701013 cites W190065572 @default.
- W2736701013 cites W1998067572 @default.
- W2736701013 cites W2011737794 @default.
- W2736701013 cites W2037972594 @default.
- W2736701013 cites W2048779798 @default.
- W2736701013 cites W2097184821 @default.
- W2736701013 cites W2097776316 @default.
- W2736701013 cites W2097865464 @default.
- W2736701013 cites W2121516976 @default.
- W2736701013 cites W2127675794 @default.
- W2736701013 cites W2129930407 @default.
- W2736701013 cites W2152565070 @default.
- W2736701013 cites W2241750177 @default.
- W2736701013 cites W2241860760 @default.
- W2736701013 cites W2261895596 @default.
- W2736701013 cites W2308071406 @default.
- W2736701013 cites W2317754117 @default.
- W2736701013 cites W2396588571 @default.
- W2736701013 cites W2416529140 @default.
- W2736701013 cites W2568511220 @default.
- W2736701013 cites W2574633002 @default.
- W2736701013 cites W2951208214 @default.
- W2736701013 cites W3098556943 @default.
- W2736701013 cites W3104626761 @default.
- W2736701013 hasPublicationYear "2017" @default.
- W2736701013 type Work @default.
- W2736701013 sameAs 2736701013 @default.
- W2736701013 citedByCount "2" @default.
- W2736701013 countsByYear W27367010132017 @default.
- W2736701013 countsByYear W27367010132018 @default.
- W2736701013 crossrefType "posted-content" @default.
- W2736701013 hasAuthorship W2736701013A5002057389 @default.
- W2736701013 hasAuthorship W2736701013A5014293815 @default.
- W2736701013 hasAuthorship W2736701013A5083645347 @default.
- W2736701013 hasConcept C103278499 @default.
- W2736701013 hasConcept C11413529 @default.
- W2736701013 hasConcept C114614502 @default.
- W2736701013 hasConcept C115961682 @default.
- W2736701013 hasConcept C116738811 @default.
- W2736701013 hasConcept C124101348 @default.
- W2736701013 hasConcept C153180895 @default.
- W2736701013 hasConcept C154945302 @default.
- W2736701013 hasConcept C177264268 @default.
- W2736701013 hasConcept C199360897 @default.
- W2736701013 hasConcept C203519979 @default.
- W2736701013 hasConcept C2776124973 @default.
- W2736701013 hasConcept C33923547 @default.
- W2736701013 hasConcept C41008148 @default.
- W2736701013 hasConcept C48044578 @default.
- W2736701013 hasConcept C4969071 @default.
- W2736701013 hasConcept C75165309 @default.
- W2736701013 hasConcept C77088390 @default.
- W2736701013 hasConcept C80444323 @default.
- W2736701013 hasConceptScore W2736701013C103278499 @default.
- W2736701013 hasConceptScore W2736701013C11413529 @default.
- W2736701013 hasConceptScore W2736701013C114614502 @default.
- W2736701013 hasConceptScore W2736701013C115961682 @default.
- W2736701013 hasConceptScore W2736701013C116738811 @default.
- W2736701013 hasConceptScore W2736701013C124101348 @default.
- W2736701013 hasConceptScore W2736701013C153180895 @default.
- W2736701013 hasConceptScore W2736701013C154945302 @default.
- W2736701013 hasConceptScore W2736701013C177264268 @default.
- W2736701013 hasConceptScore W2736701013C199360897 @default.
- W2736701013 hasConceptScore W2736701013C203519979 @default.
- W2736701013 hasConceptScore W2736701013C2776124973 @default.
- W2736701013 hasConceptScore W2736701013C33923547 @default.
- W2736701013 hasConceptScore W2736701013C41008148 @default.
- W2736701013 hasConceptScore W2736701013C48044578 @default.
- W2736701013 hasConceptScore W2736701013C4969071 @default.
- W2736701013 hasConceptScore W2736701013C75165309 @default.
- W2736701013 hasConceptScore W2736701013C77088390 @default.
- W2736701013 hasConceptScore W2736701013C80444323 @default.
- W2736701013 hasLocation W27367010131 @default.
- W2736701013 hasOpenAccess W2736701013 @default.
- W2736701013 hasPrimaryLocation W27367010131 @default.
- W2736701013 hasRelatedWork W1488303220 @default.
- W2736701013 hasRelatedWork W1583471564 @default.
- W2736701013 hasRelatedWork W2039797165 @default.
- W2736701013 hasRelatedWork W2044163187 @default.
- W2736701013 hasRelatedWork W2096598900 @default.
- W2736701013 hasRelatedWork W2120908895 @default.
- W2736701013 hasRelatedWork W2356914479 @default.
- W2736701013 hasRelatedWork W2396588571 @default.
- W2736701013 hasRelatedWork W2619410666 @default.
- W2736701013 hasRelatedWork W2798412430 @default.
- W2736701013 hasRelatedWork W2803960581 @default.
- W2736701013 hasRelatedWork W2903672378 @default.
- W2736701013 hasRelatedWork W2950817225 @default.
- W2736701013 hasRelatedWork W2963535486 @default.
- W2736701013 hasRelatedWork W2963886823 @default.