Abstract
We outline and evaluate a binary-positive clustering model. It is based on a binary representation of data records in rows where column entries, either T or '0', correspond to all possible data values that tributes may take. A new group similarity index (GSI) is devised which takes into account only the positive attributes as basis for the grouping and clustering algorithm. The model is compared with standard clustering models. For the comparison we define an objective measure about two similarity factors: within-class similarity (WCS) and between-class similarity (BCS), seeking a maximum intra-group and minimum inter-group proximity, respectively. A coefficient of variation (CV) statistic is then employed to combine the two factors into a measure of relative diversity between records and groups. When applied to a common data set our binary clustering shows significant advantages over standard clustering models.
Original language | English |
---|---|
Pages (from-to) | 100-108 |
Number of pages | 9 |
Journal | Journal of Computer Information Systems |
Volume | 43 |
Issue number | 3 |
State | Published - Mar 2003 |
Keywords
- Clustering
- Coefficient of Variation
- Data Mining
- Similarity