Matches in SemOpenAlex for { <https://semopenalex.org/work/W2003108016> ?p ?o ?g. }
Showing items 1 to 85 of
85
with 100 items per page.
- W2003108016 abstract "For implementing content management solutions and enabling new applications associated with data retention, regulatory compliance, and litigation issues, enterprises need to develop advanced analytics to uncover relationships among the documents, e.g., content similarity, provenance, and clustering. In this paper, we evaluate the performance of four syntactic similarity algorithms. Three algorithms are based on Broder's shingling technique while the fourth algorithm employs a more recent approach, content-based chunking. For our experiments, we use a specially designed corpus of documents that includes a set of similar documents with a controlled number of modifications. Our performance study reveals that the similarity metric of all four algorithms is highly sensitive to settings of the algorithms' parameters: sliding window size and fingerprint sampling frequency. We identify a useful range of these parameters for achieving good practical results, and compare the performance of the four algorithms in a controlled environment. We validate our results by applying these algorithms to finding near-duplicates in two large collections of HP technical support documents." @default.
- W2003108016 created "2016-06-24" @default.
- W2003108016 creator A5016669578 @default.
- W2003108016 creator A5023021993 @default.
- W2003108016 creator A5083350972 @default.
- W2003108016 creator A5088426443 @default.
- W2003108016 creator A5090701350 @default.
- W2003108016 date "2009-06-28" @default.
- W2003108016 modified "2023-09-23" @default.
- W2003108016 title "Applying syntactic similarity algorithms for enterprise information management" @default.
- W2003108016 cites W1991800036 @default.
- W2003108016 cites W1997657677 @default.
- W2003108016 cites W2007842132 @default.
- W2003108016 cites W2012833704 @default.
- W2003108016 cites W2056980397 @default.
- W2003108016 cites W2067432306 @default.
- W2003108016 cites W2085922539 @default.
- W2003108016 cites W2123767107 @default.
- W2003108016 cites W2128272588 @default.
- W2003108016 cites W2164634022 @default.
- W2003108016 doi "https://doi.org/10.1145/1557019.1557137" @default.
- W2003108016 hasPublicationYear "2009" @default.
- W2003108016 type Work @default.
- W2003108016 sameAs 2003108016 @default.
- W2003108016 citedByCount "19" @default.
- W2003108016 countsByYear W20031080162012 @default.
- W2003108016 countsByYear W20031080162013 @default.
- W2003108016 countsByYear W20031080162015 @default.
- W2003108016 countsByYear W20031080162016 @default.
- W2003108016 countsByYear W20031080162017 @default.
- W2003108016 countsByYear W20031080162019 @default.
- W2003108016 countsByYear W20031080162020 @default.
- W2003108016 countsByYear W20031080162021 @default.
- W2003108016 countsByYear W20031080162022 @default.
- W2003108016 crossrefType "proceedings-article" @default.
- W2003108016 hasAuthorship W2003108016A5016669578 @default.
- W2003108016 hasAuthorship W2003108016A5023021993 @default.
- W2003108016 hasAuthorship W2003108016A5083350972 @default.
- W2003108016 hasAuthorship W2003108016A5088426443 @default.
- W2003108016 hasAuthorship W2003108016A5090701350 @default.
- W2003108016 hasConcept C103278499 @default.
- W2003108016 hasConcept C11413529 @default.
- W2003108016 hasConcept C115961682 @default.
- W2003108016 hasConcept C119857082 @default.
- W2003108016 hasConcept C124101348 @default.
- W2003108016 hasConcept C154945302 @default.
- W2003108016 hasConcept C162324750 @default.
- W2003108016 hasConcept C176217482 @default.
- W2003108016 hasConcept C177264268 @default.
- W2003108016 hasConcept C199360897 @default.
- W2003108016 hasConcept C21547014 @default.
- W2003108016 hasConcept C23123220 @default.
- W2003108016 hasConcept C41008148 @default.
- W2003108016 hasConcept C73555534 @default.
- W2003108016 hasConceptScore W2003108016C103278499 @default.
- W2003108016 hasConceptScore W2003108016C11413529 @default.
- W2003108016 hasConceptScore W2003108016C115961682 @default.
- W2003108016 hasConceptScore W2003108016C119857082 @default.
- W2003108016 hasConceptScore W2003108016C124101348 @default.
- W2003108016 hasConceptScore W2003108016C154945302 @default.
- W2003108016 hasConceptScore W2003108016C162324750 @default.
- W2003108016 hasConceptScore W2003108016C176217482 @default.
- W2003108016 hasConceptScore W2003108016C177264268 @default.
- W2003108016 hasConceptScore W2003108016C199360897 @default.
- W2003108016 hasConceptScore W2003108016C21547014 @default.
- W2003108016 hasConceptScore W2003108016C23123220 @default.
- W2003108016 hasConceptScore W2003108016C41008148 @default.
- W2003108016 hasConceptScore W2003108016C73555534 @default.
- W2003108016 hasLocation W20031080161 @default.
- W2003108016 hasOpenAccess W2003108016 @default.
- W2003108016 hasPrimaryLocation W20031080161 @default.
- W2003108016 hasRelatedWork W1488437289 @default.
- W2003108016 hasRelatedWork W1963543573 @default.
- W2003108016 hasRelatedWork W1984733048 @default.
- W2003108016 hasRelatedWork W2033805144 @default.
- W2003108016 hasRelatedWork W2187249578 @default.
- W2003108016 hasRelatedWork W2189421535 @default.
- W2003108016 hasRelatedWork W2349125667 @default.
- W2003108016 hasRelatedWork W2356020937 @default.
- W2003108016 hasRelatedWork W2390847229 @default.
- W2003108016 hasRelatedWork W2762277149 @default.
- W2003108016 isParatext "false" @default.
- W2003108016 isRetracted "false" @default.
- W2003108016 magId "2003108016" @default.
- W2003108016 workType "article" @default.