Hierarchical data organization, clustering and denoising via localized diffusion folders

Gil David*, Amir Averbuch

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

27 Scopus citations

Abstract

Data clustering is a common technique for data analysis. It is used in many fields including machine learning, data mining, customer segmentation, trend analysis, pattern recognition and image analysis. The proposed Localized Diffusion Folders (LDF) methodology, whose localized folders are called diffusion folders (DF), introduces consistency criteria for hierarchical folder organization, clustering and classification of high-dimensional datasets. The DF are multi-level data partitioning into local neighborhoods that are generated by several random selections of data points and DF in a diffusion graph and by redefining local diffusion distances between them. This multi-level partitioning defines an improved localized geometry for the data and a localized Markov transition matrix that is used for the next time step in the advancement of the hierarchical diffusion process. The result of this clustering method is a bottom-up hierarchical data organization where each level in the hierarchy contains LDF of DF from the lower levels. This methodology preserves the local neighborhood of each point while eliminating noisy spurious connections between points and areas in the data affinities graph. One of our goals in this paper is to illustrate the impact of the initial affinities selection on data graphs definition and on the robustness of the hierarchical data organization. This process is similar to filter banks selection for signals denoising. The performance of the algorithms is demonstrated on real data and it is compared to existing methods. The proposed solution is generic since it fits a large number of related problems where the source datasets contain high-dimensional data.

Original languageEnglish
Pages (from-to)1-23
Number of pages23
JournalApplied and Computational Harmonic Analysis
Volume33
Issue number1
DOIs
StatePublished - Jul 2012

Keywords

  • Diffusion geometry
  • Hierarchical clustering
  • Spectral graph

Fingerprint

Dive into the research topics of 'Hierarchical data organization, clustering and denoising via localized diffusion folders'. Together they form a unique fingerprint.

Cite this