TY - JOUR
T1 - Faster kernel ridge regression using sketching and preconditioning
AU - Avron, Haim
AU - Clarkson, Kenneth L.
AU - Woodruff, David P.
N1 - Publisher Copyright:
© 2017 Society for Industrial and Applied Mathematics.
PY - 2017
Y1 - 2017
N2 - Kernel ridge regression is a simple yet powerful technique for nonparametric regression whose computation amounts to solving a linear system. This system is usually dense and highly ill-conditioned. In addition, the dimensions of the matrix are the same as the number of data points, so direct methods are unrealistic for large-scale datasets. In this paper, we propose a preconditioning technique for accelerating the solution of the aforementioned linear system. The preconditioner is based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods, such as kernel ridge regression, by resorting to approximations. However, random feature maps only provide crude approximations to the kernel function, so delivering state-of-the-art results by directly solving the approximated system requires the number of random features to be very large. We show that random feature maps can be much more effective in forming preconditioners, since under certain conditions a not-too-large number of random features is sufficient to yield an effective preconditioner. We empirically evaluate our method and show it is highly effective for datasets of up to one million training examples.
AB - Kernel ridge regression is a simple yet powerful technique for nonparametric regression whose computation amounts to solving a linear system. This system is usually dense and highly ill-conditioned. In addition, the dimensions of the matrix are the same as the number of data points, so direct methods are unrealistic for large-scale datasets. In this paper, we propose a preconditioning technique for accelerating the solution of the aforementioned linear system. The preconditioner is based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods, such as kernel ridge regression, by resorting to approximations. However, random feature maps only provide crude approximations to the kernel function, so delivering state-of-the-art results by directly solving the approximated system requires the number of random features to be very large. We show that random feature maps can be much more effective in forming preconditioners, since under certain conditions a not-too-large number of random features is sufficient to yield an effective preconditioner. We empirically evaluate our method and show it is highly effective for datasets of up to one million training examples.
KW - Kernel ridge regression
KW - Preconditioning
KW - Random features
UR - http://www.scopus.com/inward/record.url?scp=85040309792&partnerID=8YFLogxK
U2 - 10.1137/16M1105396
DO - 10.1137/16M1105396
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85040309792
SN - 0895-4798
VL - 38
SP - 1116
EP - 1138
JO - SIAM Journal on Matrix Analysis and Applications
JF - SIAM Journal on Matrix Analysis and Applications
IS - 4
ER -