Measure-based diffusion grid construction and high-dimensional data discretization

Amit Bermanis, Moshe Salhov, Guy Wolf, Amir Averbuch

Research output: Contribution to journalArticlepeer-review

Abstract

The diffusion maps framework is a kernel-based method for manifold learning and data analysis that models a Markovian process over data. Analysis of this process provides meaningful information concerning inner geometric structures in the data. Recently, it was suggested to replace the standard kernel by a measure-based kernel, which incorporates information about the density of the data. Thus, the manifold assumption is replaced by a more general measure assumption. The measure-based diffusion kernel utilizes two separate independent datasets. The first is the set by which the measure is determined. This measure correlates with a density that represents normal behaviors and patterns in the data. The second set consists of the analyzed data points that are embedded by the metastable states of the underlying diffusion process. This set can either be contiguous or discrete. In this paper, we present a data discretization methodology for analyzing a contiguous domain. The obtained discretization is achieved by constructing a uniform grid over this domain. This discretization is designed to approximate the continuous measure-based diffusion process by a discrete random walk process. This paper provides a proved criterion to determine the grid resolution that ensures a controllable approximation error for the continuous steady states by the discrete ones. Finally, the presented methodology is demonstrated on analytically generated data.

Original languageEnglish
Pages (from-to)207-228
Number of pages22
JournalApplied and Computational Harmonic Analysis
Volume40
Issue number2
DOIs
StatePublished - 1 Mar 2016

Keywords

  • Data discretization
  • Diffusion-based kernel
  • Dimensionality reduction
  • Grid construction
  • Kernel PCA
  • Measure-based information

Fingerprint

Dive into the research topics of 'Measure-based diffusion grid construction and high-dimensional data discretization'. Together they form a unique fingerprint.

Cite this