TY - GEN
T1 - Polar classification of nominal data
AU - Wolf, Guy
AU - Harussi, Shachar
AU - Shmueli, Yaniv
AU - Averbuch, Amir
N1 - Publisher Copyright:
© 2013 Springer Science+Business Media Dordrecht.
PY - 2013
Y1 - 2013
N2 - Many modern systems record various types of parameter values. Numerical values are relatively convenient for data analysis tools because there are many methods to measure distances and similarities between them. The application of dimensionality reduction techniques for data sets with such values is also a well known practice. Nominal (i.e., categorical) values, on the other hand, encompass some problems for current methods. Most of all, there is no meaningful distance between possible nominal values, which are either equal or unequal to each other. Since many dimensionality reduction methods rely on preserving some form of similarity or distance measure, their application to such data sets is not straightforward. We propose a method to achieve clustering of such data sets by applying the diffusion maps methodology to it. Our method is based on a distance metric that utilizes the effect of the boolean nature of similarities between nominal values (i.e., equal or unequal) on the diffusion kernel and, in turn, on the embedded space resulting from its principal components.We use a multi-view approach by analyzing small, closely related, sets of parameters at a time instead of the whole data set. This way, we achieve a comprehensive understanding of the data set from many points of view.
AB - Many modern systems record various types of parameter values. Numerical values are relatively convenient for data analysis tools because there are many methods to measure distances and similarities between them. The application of dimensionality reduction techniques for data sets with such values is also a well known practice. Nominal (i.e., categorical) values, on the other hand, encompass some problems for current methods. Most of all, there is no meaningful distance between possible nominal values, which are either equal or unequal to each other. Since many dimensionality reduction methods rely on preserving some form of similarity or distance measure, their application to such data sets is not straightforward. We propose a method to achieve clustering of such data sets by applying the diffusion maps methodology to it. Our method is based on a distance metric that utilizes the effect of the boolean nature of similarities between nominal values (i.e., equal or unequal) on the diffusion kernel and, in turn, on the embedded space resulting from its principal components.We use a multi-view approach by analyzing small, closely related, sets of parameters at a time instead of the whole data set. This way, we achieve a comprehensive understanding of the data set from many points of view.
KW - Clustering
KW - Diffusion maps
KW - Nominal data
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=84964253190&partnerID=8YFLogxK
U2 - 10.1007/978-94-007-5288-7_14
DO - 10.1007/978-94-007-5288-7_14
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84964253190
SN - 9789400752870
T3 - Computational Methods in Applied Sciences
SP - 253
EP - 271
BT - Numerical Methods for Differential Equations, Optimization, and Technological Problems
A2 - Repin, Sergey
A2 - Tiihonen, Timo
A2 - Tuovinen, Tero
PB - Springer Netherland
T2 - ECCOMAS Thematic Conference Computational Analysis and Optimization, CAO 2011
Y2 - 9 June 2011 through 11 June 2011
ER -