Evaluating a positive attribute clustering model for data mining

Zippy Erlich*, Roy Gelbard, Israel Spiegler

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

We outline and evaluate a binary-positive clustering model. It is based on a binary representation of data records in rows where column entries, either T or '0', correspond to all possible data values that tributes may take. A new group similarity index (GSI) is devised which takes into account only the positive attributes as basis for the grouping and clustering algorithm. The model is compared with standard clustering models. For the comparison we define an objective measure about two similarity factors: within-class similarity (WCS) and between-class similarity (BCS), seeking a maximum intra-group and minimum inter-group proximity, respectively. A coefficient of variation (CV) statistic is then employed to combine the two factors into a measure of relative diversity between records and groups. When applied to a common data set our binary clustering shows significant advantages over standard clustering models.

Original languageEnglish
Pages (from-to)100-108
Number of pages9
JournalJournal of Computer Information Systems
Volume43
Issue number3
StatePublished - Mar 2003

Keywords

  • Clustering
  • Coefficient of Variation
  • Data Mining
  • Similarity

Fingerprint

Dive into the research topics of 'Evaluating a positive attribute clustering model for data mining'. Together they form a unique fingerprint.

Cite this