TY - JOUR

T1 - Testing of clustering

AU - Alon, Noga

AU - Dar, Seannie

AU - Parnas, Michal

AU - Ron, Dana

PY - 2003/4

Y1 - 2003/4

N2 - A set X of points in ℛd is (k, b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that, by sampling from a set X, distinguish between the case that X is (k, b)-clusterable and the case that X is ε-far from being (k, b′)-clusterable for any given 0 < ε ≤ 1 and for b′ ≥ b. By ε-far from being (k, b′)-clusterable we mean that more than ε · |X| points should be removed from X so that it becomes (k, b′)-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X| and polynomial in k and 1/ε. Our algorithms can also be used to find approximately good clusterings. Namely, these are clustering of all but an ε-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster to which any given point belongs.

AB - A set X of points in ℛd is (k, b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that, by sampling from a set X, distinguish between the case that X is (k, b)-clusterable and the case that X is ε-far from being (k, b′)-clusterable for any given 0 < ε ≤ 1 and for b′ ≥ b. By ε-far from being (k, b′)-clusterable we mean that more than ε · |X| points should be removed from X so that it becomes (k, b′)-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X| and polynomial in k and 1/ε. Our algorithms can also be used to find approximately good clusterings. Namely, these are clustering of all but an ε-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster to which any given point belongs.

KW - Approximation algorithms

KW - Clustering

KW - Property testing

KW - Randomized algorithms

UR - http://www.scopus.com/inward/record.url?scp=0041384098&partnerID=8YFLogxK

U2 - 10.1137/S0895480102410973

DO - 10.1137/S0895480102410973

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:0041384098

SN - 0895-4801

VL - 16

SP - 393

EP - 417

JO - SIAM Journal on Discrete Mathematics

JF - SIAM Journal on Discrete Mathematics

IS - 3

ER -