TY - JOUR

T1 - K-Vectors

T2 - An Alternating Minimization Algorithm for Learning Regression Functions

AU - Weinberger, Nir

AU - Feder, Meir

N1 - Publisher Copyright:
© 1963-2012 IEEE.

PY - 2020/11

Y1 - 2020/11

N2 - The k-vectors algorithm for learning regression functions proposed here is akin to the well-known k-means algorithm. Both algorithms partition the feature space, but unlike the k-means algorithm, the k-vectors algorithm aims to reconstruct the response rather than the feature. The partitioning rule of the algorithm is based on maximizing the correlation (inner product) of the feature vector with a set of k vectors, and generates polyhedral cells, similar to the ones generated by the nearest-neighbor rule of the k-means algorithm. Similarly to k-means, the learning algorithm alternates between two types of steps. In the first type of steps, k labels are determined via a centroid-type rule (in the response space), which uses a surrogate hinge-type loss function to the mean squared error loss function. In the second type of steps, the k vectors which determine the partition are updated according to a multiclass classification rule, in the spirit of support vector machines. It is proved that both steps of the algorithm only require solving convex optimization problems, and that the algorithm is empirically consistent - as the length of the training sequence increases to infinity, fixed-points of the empirical version of the algorithm tend to fixed points of the population version of the algorithm. Learnability of the predictor class posit by the algorithm is also established.

AB - The k-vectors algorithm for learning regression functions proposed here is akin to the well-known k-means algorithm. Both algorithms partition the feature space, but unlike the k-means algorithm, the k-vectors algorithm aims to reconstruct the response rather than the feature. The partitioning rule of the algorithm is based on maximizing the correlation (inner product) of the feature vector with a set of k vectors, and generates polyhedral cells, similar to the ones generated by the nearest-neighbor rule of the k-means algorithm. Similarly to k-means, the learning algorithm alternates between two types of steps. In the first type of steps, k labels are determined via a centroid-type rule (in the response space), which uses a surrogate hinge-type loss function to the mean squared error loss function. In the second type of steps, the k vectors which determine the partition are updated according to a multiclass classification rule, in the spirit of support vector machines. It is proved that both steps of the algorithm only require solving convex optimization problems, and that the algorithm is empirically consistent - as the length of the training sequence increases to infinity, fixed-points of the empirical version of the algorithm tend to fixed points of the population version of the algorithm. Learnability of the predictor class posit by the algorithm is also established.

KW - Alternating minimization

KW - Lloyd-Max algorithm

KW - clustering

KW - convergence rates

KW - global convergence

KW - k-means

KW - nonparametric regression

KW - partitioning estimates

KW - quantization

KW - supervised learning

KW - support vector machines

UR - http://www.scopus.com/inward/record.url?scp=85094651223&partnerID=8YFLogxK

U2 - 10.1109/TIT.2020.3008058

DO - 10.1109/TIT.2020.3008058

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:85094651223

SN - 0018-9448

VL - 66

SP - 7196

EP - 7221

JO - IEEE Transactions on Information Theory

JF - IEEE Transactions on Information Theory

IS - 11

M1 - 9136791

ER -