TY - JOUR
T1 - K-Vectors
T2 - An Alternating Minimization Algorithm for Learning Regression Functions
AU - Weinberger, Nir
AU - Feder, Meir
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2020/11
Y1 - 2020/11
N2 - The k-vectors algorithm for learning regression functions proposed here is akin to the well-known k-means algorithm. Both algorithms partition the feature space, but unlike the k-means algorithm, the k-vectors algorithm aims to reconstruct the response rather than the feature. The partitioning rule of the algorithm is based on maximizing the correlation (inner product) of the feature vector with a set of k vectors, and generates polyhedral cells, similar to the ones generated by the nearest-neighbor rule of the k-means algorithm. Similarly to k-means, the learning algorithm alternates between two types of steps. In the first type of steps, k labels are determined via a centroid-type rule (in the response space), which uses a surrogate hinge-type loss function to the mean squared error loss function. In the second type of steps, the k vectors which determine the partition are updated according to a multiclass classification rule, in the spirit of support vector machines. It is proved that both steps of the algorithm only require solving convex optimization problems, and that the algorithm is empirically consistent - as the length of the training sequence increases to infinity, fixed-points of the empirical version of the algorithm tend to fixed points of the population version of the algorithm. Learnability of the predictor class posit by the algorithm is also established.
AB - The k-vectors algorithm for learning regression functions proposed here is akin to the well-known k-means algorithm. Both algorithms partition the feature space, but unlike the k-means algorithm, the k-vectors algorithm aims to reconstruct the response rather than the feature. The partitioning rule of the algorithm is based on maximizing the correlation (inner product) of the feature vector with a set of k vectors, and generates polyhedral cells, similar to the ones generated by the nearest-neighbor rule of the k-means algorithm. Similarly to k-means, the learning algorithm alternates between two types of steps. In the first type of steps, k labels are determined via a centroid-type rule (in the response space), which uses a surrogate hinge-type loss function to the mean squared error loss function. In the second type of steps, the k vectors which determine the partition are updated according to a multiclass classification rule, in the spirit of support vector machines. It is proved that both steps of the algorithm only require solving convex optimization problems, and that the algorithm is empirically consistent - as the length of the training sequence increases to infinity, fixed-points of the empirical version of the algorithm tend to fixed points of the population version of the algorithm. Learnability of the predictor class posit by the algorithm is also established.
KW - Alternating minimization
KW - Lloyd-Max algorithm
KW - clustering
KW - convergence rates
KW - global convergence
KW - k-means
KW - nonparametric regression
KW - partitioning estimates
KW - quantization
KW - supervised learning
KW - support vector machines
UR - https://www.scopus.com/pages/publications/85094651223
U2 - 10.1109/TIT.2020.3008058
DO - 10.1109/TIT.2020.3008058
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85094651223
SN - 0018-9448
VL - 66
SP - 7196
EP - 7221
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 11
M1 - 9136791
ER -