TY - JOUR
T1 - Regularization of projection directions via best basis selection approach
AU - Stainvas, Inna
AU - Intrator, Nathan
PY - 2006
Y1 - 2006
N2 - Classification and recognition of high-dimensional data is difficult due to the "curse of dimensionality" problem, i.e. it is not enough data to robustly train an estimator. The problem may be overcome by dimensionality reduction. Many statistical models, such as linear discriminant analysis (LDA) and neural networks (NNs), for example, include dimensionality reduction as an implicit preprocessing step. However, such projection onto discriminant directions is not sufficient since the number of direction parameters still remains large (proportional to dimensionality of the data); and models persist to be many parameter models and require regularization. In this work, we propose to regularize the low-dimensional structure of the projection parameter space based on compression concepts. We assume that an intrinsic dimensionality of the discriminant space spanned by projection directions is essentially small and the latter may be sufficiently well represented as a linear superposition of a small number of wavelet functions in the wavelet packet basis. We further, introduce a simple incremental way to increase the dimensionality of the parameter space using hypothesis testing and apply the technique to logistic regression and to Fisher linear discrimination. Three benchmark data-sets: triangular waveforms (Breiman 1984), the vowel data-set (CMU repository) and a letter data set (DELVE) are used to demonstrate the proposed method. We show that this approach leads to significant classification improvement.
AB - Classification and recognition of high-dimensional data is difficult due to the "curse of dimensionality" problem, i.e. it is not enough data to robustly train an estimator. The problem may be overcome by dimensionality reduction. Many statistical models, such as linear discriminant analysis (LDA) and neural networks (NNs), for example, include dimensionality reduction as an implicit preprocessing step. However, such projection onto discriminant directions is not sufficient since the number of direction parameters still remains large (proportional to dimensionality of the data); and models persist to be many parameter models and require regularization. In this work, we propose to regularize the low-dimensional structure of the projection parameter space based on compression concepts. We assume that an intrinsic dimensionality of the discriminant space spanned by projection directions is essentially small and the latter may be sufficiently well represented as a linear superposition of a small number of wavelet functions in the wavelet packet basis. We further, introduce a simple incremental way to increase the dimensionality of the parameter space using hypothesis testing and apply the technique to logistic regression and to Fisher linear discrimination. Three benchmark data-sets: triangular waveforms (Breiman 1984), the vowel data-set (CMU repository) and a letter data set (DELVE) are used to demonstrate the proposed method. We show that this approach leads to significant classification improvement.
KW - Best-Basis wavelet packet
KW - Dimensionality reduction
KW - Projection methods
UR - http://www.scopus.com/inward/record.url?scp=80052268599&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:80052268599
SN - 0973-1377
VL - 4
SP - 1
EP - 22
JO - International Journal of Applied Mathematics and Statistics
JF - International Journal of Applied Mathematics and Statistics
IS - JO6
ER -