Data driven similarity measures for k-means like clustering algorithms

Jacob Kogan, Marc Teboulle, Charles Nicholas

Research output: Contribution to journalArticlepeer-review

Abstract

We present an optimization approach that generates k-means like clustering algorithms. The batch k-means and the incremental k-means are two well known versions of the classical k-means clustering algorithm (Duda et al. 2000). To benefit from the speed of the batch version and the accuracy of the incremental version we combine the two in a "ping-pong" fashion. We use a distance-like function that combines the squared Euclidean distance with relative entropy. In the extreme cases our algorithm recovers the classical k-means clustering algorithm and generalizes the Divisive Information Theoretic clustering algorithm recently reported independently by Berkhin and Becher (2002) and Dhillon1 et al. (2002). Results of numerical experiments that demonstrate the viability of our approach are reported.

Original languageEnglish
Pages (from-to)331-349
Number of pages19
JournalInformation Retrieval
Volume8
Issue number2
DOIs
StatePublished - Apr 2005

Keywords

  • Clustering algorithms
  • Entropy
  • Optimization

Fingerprint

Dive into the research topics of 'Data driven similarity measures for k-means like clustering algorithms'. Together they form a unique fingerprint.

Cite this