On Finding Large Conjunctive Clusters

Nina Mishra*, Dana Ron, Ram Swaminathan

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

23 Scopus citations

Abstract

We propose a new formulation of the clustering problem that differs from previous work in several aspects. First, the goal is to explicitly output a collection of simple and meaningful conjunctive descriptions of the clusters. Second, the clusters might overlap, i.e., a point can belong to multiple clusters. Third, the clusters might not cover all points, i.e., not every point is clustered. Finally, we allow a point to be assigned to a conjunctive cluster description even if it does not completely satisfy all of the attributes, but rather only satisfies most. A convenient way to view our clustering problem is that of finding a collection of large bicliques in a bipartite graph. Identifying one largest conjunctive cluster is equivalent to finding a maximum edge biclique. Since this problem is NP-hard and there is evidence that it is difficult to approximate, we solve a relaxed version where the objective is to find a large subgraph that is close to being a biclique. We give a randomized algorithm that finds a relaxed biclique with almost as many edges as the maximum biclique. We then extend this algorithm to identify a good collection of large relaxed bicliques. A key property of these algorithms is that their running time is independent of the number of data points and linear in the number of attributes.

Original languageEnglish
Pages (from-to)448-462
Number of pages15
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2777
DOIs
StatePublished - 2003
Event16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003 - Washington, DC, United States
Duration: 24 Aug 200327 Aug 2003

Keywords

  • Conceptual Clustering
  • Max Edge Biclique
  • Unsupervised Learning

Fingerprint

Dive into the research topics of 'On Finding Large Conjunctive Clusters'. Together they form a unique fingerprint.

Cite this