TY - JOUR
T1 - Measure-based diffusion grid construction and high-dimensional data discretization
AU - Bermanis, Amit
AU - Salhov, Moshe
AU - Wolf, Guy
AU - Averbuch, Amir
N1 - Publisher Copyright:
© 2015 Elsevier Inc. All rights reserved.
PY - 2016/3/1
Y1 - 2016/3/1
N2 - The diffusion maps framework is a kernel-based method for manifold learning and data analysis that models a Markovian process over data. Analysis of this process provides meaningful information concerning inner geometric structures in the data. Recently, it was suggested to replace the standard kernel by a measure-based kernel, which incorporates information about the density of the data. Thus, the manifold assumption is replaced by a more general measure assumption. The measure-based diffusion kernel utilizes two separate independent datasets. The first is the set by which the measure is determined. This measure correlates with a density that represents normal behaviors and patterns in the data. The second set consists of the analyzed data points that are embedded by the metastable states of the underlying diffusion process. This set can either be contiguous or discrete. In this paper, we present a data discretization methodology for analyzing a contiguous domain. The obtained discretization is achieved by constructing a uniform grid over this domain. This discretization is designed to approximate the continuous measure-based diffusion process by a discrete random walk process. This paper provides a proved criterion to determine the grid resolution that ensures a controllable approximation error for the continuous steady states by the discrete ones. Finally, the presented methodology is demonstrated on analytically generated data.
AB - The diffusion maps framework is a kernel-based method for manifold learning and data analysis that models a Markovian process over data. Analysis of this process provides meaningful information concerning inner geometric structures in the data. Recently, it was suggested to replace the standard kernel by a measure-based kernel, which incorporates information about the density of the data. Thus, the manifold assumption is replaced by a more general measure assumption. The measure-based diffusion kernel utilizes two separate independent datasets. The first is the set by which the measure is determined. This measure correlates with a density that represents normal behaviors and patterns in the data. The second set consists of the analyzed data points that are embedded by the metastable states of the underlying diffusion process. This set can either be contiguous or discrete. In this paper, we present a data discretization methodology for analyzing a contiguous domain. The obtained discretization is achieved by constructing a uniform grid over this domain. This discretization is designed to approximate the continuous measure-based diffusion process by a discrete random walk process. This paper provides a proved criterion to determine the grid resolution that ensures a controllable approximation error for the continuous steady states by the discrete ones. Finally, the presented methodology is demonstrated on analytically generated data.
KW - Data discretization
KW - Diffusion-based kernel
KW - Dimensionality reduction
KW - Grid construction
KW - Kernel PCA
KW - Measure-based information
UR - http://www.scopus.com/inward/record.url?scp=84953638516&partnerID=8YFLogxK
U2 - 10.1016/j.acha.2015.02.001
DO - 10.1016/j.acha.2015.02.001
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84953638516
SN - 1063-5203
VL - 40
SP - 207
EP - 228
JO - Applied and Computational Harmonic Analysis
JF - Applied and Computational Harmonic Analysis
IS - 2
ER -