TY - GEN
T1 - Fuzzy Clustering with Similarity Queries
AU - Huleihel, Wasim
AU - Mazumdar, Arya
AU - Pal, Soumyabrata
N1 - Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - The fuzzy or soft k-means objective is a popular generalization of the well-known kmeans problem, extending the clustering capability of the k-means to datasets that are uncertain, vague and otherwise hard to cluster. In this paper, we propose a semisupervised active clustering framework, where the learner is allowed to interact with an oracle (domain expert), asking for the similarity between a certain set of chosen items. We study the query and computational complexities of clustering in this framework. We prove that having a few of such similarity queries enables one to get a polynomial-time approximation algorithm to an otherwise conjecturally NP-hard problem. In particular, we provide algorithms for fuzzy clustering in this setting that ask O(poly(k) log n) similarity queries and run with polynomialtime- complexity, where n is the number of items. The fuzzy k-means objective is nonconvex, with k-means as a special case, and is equivalent to some other generic nonconvex problem such as non-negative matrix factorization. The ubiquitous Lloyd-type algorithms (or alternating-minimization algorithms) can get stuck at a local minima. Our results show that by making few similarity queries, the problem becomes easier to solve. Finally, we test our algorithms over real-world datasets, showing their effectiveness in real-world applications.
AB - The fuzzy or soft k-means objective is a popular generalization of the well-known kmeans problem, extending the clustering capability of the k-means to datasets that are uncertain, vague and otherwise hard to cluster. In this paper, we propose a semisupervised active clustering framework, where the learner is allowed to interact with an oracle (domain expert), asking for the similarity between a certain set of chosen items. We study the query and computational complexities of clustering in this framework. We prove that having a few of such similarity queries enables one to get a polynomial-time approximation algorithm to an otherwise conjecturally NP-hard problem. In particular, we provide algorithms for fuzzy clustering in this setting that ask O(poly(k) log n) similarity queries and run with polynomialtime- complexity, where n is the number of items. The fuzzy k-means objective is nonconvex, with k-means as a special case, and is equivalent to some other generic nonconvex problem such as non-negative matrix factorization. The ubiquitous Lloyd-type algorithms (or alternating-minimization algorithms) can get stuck at a local minima. Our results show that by making few similarity queries, the problem becomes easier to solve. Finally, we test our algorithms over real-world datasets, showing their effectiveness in real-world applications.
UR - http://www.scopus.com/inward/record.url?scp=85131839656&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85131839656
T3 - Advances in Neural Information Processing Systems
SP - 789
EP - 801
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -