Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287829327> ?p ?o ?g. }
Showing items 1 to 63 of
63
with 100 items per page.
- W4287829327 abstract "The most widely used internal measure for clustering evaluation is the silhouette coefficient, whose naive computation requires a quadratic number of distance calculations, which is clearly unfeasible for massive datasets. Surprisingly, there are no known general methods to efficiently approximate the silhouette coefficient of a clustering with rigorously provable high accuracy. In this paper, we present the first scalable algorithm to compute such a rigorous approximation for the evaluation of clusterings based on any metric distances. Our algorithm hinges on a Probability Proportional to Size (PPS) sampling scheme, and, for any fixed $varepsilon, delta in (0,1)$, it approximates the silhouette coefficient within a mere additive error $O(varepsilon)$ with probability $1-delta$, using a very small number of distance calculations. We also prove that the algorithm can be adapted to obtain rigorous approximations of other internal measures of clustering quality, such as cohesion and separation. Importantly, we provide a distributed implementation of the algorithm using the MapReduce model, which runs in constant rounds and requires only sublinear local space at each worker, which makes our estimation approach applicable to big data scenarios. We perform an extensive experimental evaluation of our silhouette approximation algorithm, comparing its performance to a number of baseline heuristics on real and synthetic datasets. The experiments provide evidence that, unlike other heuristics, our estimation strategy not only provides tight theoretical guarantees but is also able to return highly accurate estimations while running in a fraction of the time required by the exact computation, and that its distributed implementation is highly scalable, thus enabling the computation of internal measures for very large datasets for which the exact computation is prohibitive." @default.
- W4287829327 created "2022-07-26" @default.
- W4287829327 creator A5029947852 @default.
- W4287829327 creator A5036587129 @default.
- W4287829327 creator A5048624224 @default.
- W4287829327 creator A5060279781 @default.
- W4287829327 date "2020-03-03" @default.
- W4287829327 modified "2023-10-17" @default.
- W4287829327 title "Scalable Distributed Approximation of Internal Measures for Clustering Evaluation" @default.
- W4287829327 doi "https://doi.org/10.48550/arxiv.2003.01430" @default.
- W4287829327 hasPublicationYear "2020" @default.
- W4287829327 type Work @default.
- W4287829327 citedByCount "0" @default.
- W4287829327 crossrefType "posted-content" @default.
- W4287829327 hasAuthorship W4287829327A5029947852 @default.
- W4287829327 hasAuthorship W4287829327A5036587129 @default.
- W4287829327 hasAuthorship W4287829327A5048624224 @default.
- W4287829327 hasAuthorship W4287829327A5060279781 @default.
- W4287829327 hasBestOaLocation W42878293271 @default.
- W4287829327 hasConcept C11413529 @default.
- W4287829327 hasConcept C126255220 @default.
- W4287829327 hasConcept C127705205 @default.
- W4287829327 hasConcept C154945302 @default.
- W4287829327 hasConcept C162324750 @default.
- W4287829327 hasConcept C176217482 @default.
- W4287829327 hasConcept C21547014 @default.
- W4287829327 hasConcept C33923547 @default.
- W4287829327 hasConcept C41008148 @default.
- W4287829327 hasConcept C45374587 @default.
- W4287829327 hasConcept C48044578 @default.
- W4287829327 hasConcept C58103923 @default.
- W4287829327 hasConcept C73555534 @default.
- W4287829327 hasConcept C77088390 @default.
- W4287829327 hasConceptScore W4287829327C11413529 @default.
- W4287829327 hasConceptScore W4287829327C126255220 @default.
- W4287829327 hasConceptScore W4287829327C127705205 @default.
- W4287829327 hasConceptScore W4287829327C154945302 @default.
- W4287829327 hasConceptScore W4287829327C162324750 @default.
- W4287829327 hasConceptScore W4287829327C176217482 @default.
- W4287829327 hasConceptScore W4287829327C21547014 @default.
- W4287829327 hasConceptScore W4287829327C33923547 @default.
- W4287829327 hasConceptScore W4287829327C41008148 @default.
- W4287829327 hasConceptScore W4287829327C45374587 @default.
- W4287829327 hasConceptScore W4287829327C48044578 @default.
- W4287829327 hasConceptScore W4287829327C58103923 @default.
- W4287829327 hasConceptScore W4287829327C73555534 @default.
- W4287829327 hasConceptScore W4287829327C77088390 @default.
- W4287829327 hasLocation W42878293271 @default.
- W4287829327 hasOpenAccess W4287829327 @default.
- W4287829327 hasPrimaryLocation W42878293271 @default.
- W4287829327 hasRelatedWork W2041508386 @default.
- W4287829327 hasRelatedWork W2067938758 @default.
- W4287829327 hasRelatedWork W2347429546 @default.
- W4287829327 hasRelatedWork W2375463041 @default.
- W4287829327 hasRelatedWork W2380074299 @default.
- W4287829327 hasRelatedWork W2724951827 @default.
- W4287829327 hasRelatedWork W2748931535 @default.
- W4287829327 hasRelatedWork W2968767342 @default.
- W4287829327 hasRelatedWork W2986775978 @default.
- W4287829327 hasRelatedWork W3090865264 @default.
- W4287829327 isParatext "false" @default.
- W4287829327 isRetracted "false" @default.
- W4287829327 workType "article" @default.