TY - JOUR
T1 - Approximately-isometric diffusion maps
AU - Salhov, Moshe
AU - Bermanis, Amit
AU - Wolf, Guy
AU - Averbuch, Amir
N1 - Publisher Copyright:
© 2014 Elsevier Inc. All rights reserved.
PY - 2015/5/1
Y1 - 2015/5/1
N2 - Diffusion Maps (DM), and other kernel methods, are utilized for the analysis of high dimensional datasets. The DM method uses a Markovian diffusion process to model and analyze data. A spectral analysis of the DM kernel yields a map of the data into a low dimensional space, where Euclidean distances between the mapped data points represent the diffusion distances between the corresponding high dimensional data points. Many machine learning methods, which are based on the Euclidean metric, can be applied to the mapped data points in order to take advantage of the diffusion relations between them. However, a significant drawback of the DM is the need to apply spectral decomposition to a kernel matrix, which becomes infeasible for large datasets. In this paper, we present an efficient approximation of the DM embedding. The presented approximation algorithm produces a dictionary of data points by identifying a small set of informative representatives. Then, based on this dictionary, the entire dataset is efficiently embedded into a low dimensional space. The Euclidean distances in the resulting embedded space approximate the diffusion distances. The properties of the presented embedding and its relation to DM method are analyzed and demonstrated.
AB - Diffusion Maps (DM), and other kernel methods, are utilized for the analysis of high dimensional datasets. The DM method uses a Markovian diffusion process to model and analyze data. A spectral analysis of the DM kernel yields a map of the data into a low dimensional space, where Euclidean distances between the mapped data points represent the diffusion distances between the corresponding high dimensional data points. Many machine learning methods, which are based on the Euclidean metric, can be applied to the mapped data points in order to take advantage of the diffusion relations between them. However, a significant drawback of the DM is the need to apply spectral decomposition to a kernel matrix, which becomes infeasible for large datasets. In this paper, we present an efficient approximation of the DM embedding. The presented approximation algorithm produces a dictionary of data points by identifying a small set of informative representatives. Then, based on this dictionary, the entire dataset is efficiently embedded into a low dimensional space. The Euclidean distances in the resulting embedded space approximate the diffusion distances. The properties of the presented embedding and its relation to DM method are analyzed and demonstrated.
KW - Diffusion distance
KW - Diffusion maps
KW - Dimensionality reduction
KW - Distance preservation
KW - Kernel PCA
KW - Manifold learning
UR - http://www.scopus.com/inward/record.url?scp=84925289137&partnerID=8YFLogxK
U2 - 10.1016/j.acha.2014.05.002
DO - 10.1016/j.acha.2014.05.002
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84925289137
SN - 1063-5203
VL - 38
SP - 399
EP - 419
JO - Applied and Computational Harmonic Analysis
JF - Applied and Computational Harmonic Analysis
IS - 3
ER -