Privacy-preserving data mining: A feature set partitioning approach

Nissim Matatov, Lior Rokach, Oded Maimon

Research output: Contribution to journalArticlepeer-review


In privacy-preserving data mining (PPDM), a widely used method for achieving data mining goals while preserving privacy is based on k-anonymity. This method, which protects subject-specific sensitive data by anonymizing it before it is released for data mining, demands that every tuple in the released table should be indistinguishable from no fewer than k subjects. The most common approach for achieving compliance with k-anonymity is to replace certain values with less specific but semantically consistent values. In this paper we propose a different approach for achieving k-anonymity by partitioning the original dataset into several projections such that each one of them adheres to k-anonymity. Moreover, any attempt to rejoin the projections, results in a table that still complies with k-anonymity. A classifier is trained on each projection and subsequently, an unlabelled instance is classified by combining the classifications of all classifiers. Guided by classification accuracy and k-anonymity constraints, the proposed data mining privacy by decomposition (DMPD) algorithm uses a genetic algorithm to search for optimal feature set partitioning. Ten separate datasets were evaluated with DMPD in order to compare its classification performance with other k-anonymity-based methods. The results suggest that DMPD performs better than existing k-anonymity-based algorithms and there is no necessity for applying domain dependent knowledge. Using multiobjective optimization methods, we also examine the tradeoff between the two conflicting objectives in PPDM: privacy and predictive performance.

Original languageEnglish
Pages (from-to)2696-2720
Number of pages25
JournalInformation Sciences
Issue number14
StatePublished - 15 Jul 2010


  • Data mining
  • Feature set partitioning
  • Genetic algorithms
  • Privacy
  • k-Anonymity


Dive into the research topics of 'Privacy-preserving data mining: A feature set partitioning approach'. Together they form a unique fingerprint.

Cite this