TY - JOUR
T1 - Hierarchically compositional kernels for scalable nonparametric learning
AU - Chen, Jie
AU - Avron, Haim
AU - Sindhwani, Vikas
N1 - Publisher Copyright:
© 2017 Jie Chen, Haim Avron and Vikas Sindhwani.
PY - 2017/8/1
Y1 - 2017/8/1
N2 - We propose a novel class of kernels to alleviate the high computational cost of large-scale nonparametric learning with kernel methods. The proposed kernel is defined based on a hierarchical partitioning of the underlying data domain, where the Nyström method (a globally low-rank approximation) is married with a locally lossless approximation in a hierarchical fashion. The kernel maintains (strict) positive-definiteness. The corresponding kernel matrix admits a recursively off-diagonal low-rank structure, which allows for fast linear algebra computations. Suppressing the factor of data dimension, the memory and arithmetic complexities for training a regression or a classifier are reduced from O(n2) and O(n3) to O(nr) and O(nr2), respectively, where n is the number of training examples and r is the rank on each level of the hierarchy. Although other randomized approximate kernels entail a similar complexity, empirical results show that the proposed kernel achieves a matching performance with a smaller r. We demonstrate comprehensive experiments to show the effective use of the proposed kernel on data sizes up to the order of millions.
AB - We propose a novel class of kernels to alleviate the high computational cost of large-scale nonparametric learning with kernel methods. The proposed kernel is defined based on a hierarchical partitioning of the underlying data domain, where the Nyström method (a globally low-rank approximation) is married with a locally lossless approximation in a hierarchical fashion. The kernel maintains (strict) positive-definiteness. The corresponding kernel matrix admits a recursively off-diagonal low-rank structure, which allows for fast linear algebra computations. Suppressing the factor of data dimension, the memory and arithmetic complexities for training a regression or a classifier are reduced from O(n2) and O(n3) to O(nr) and O(nr2), respectively, where n is the number of training examples and r is the rank on each level of the hierarchy. Although other randomized approximate kernels entail a similar complexity, empirical results show that the proposed kernel achieves a matching performance with a smaller r. We demonstrate comprehensive experiments to show the effective use of the proposed kernel on data sizes up to the order of millions.
KW - Hierarchical kernels
KW - Nonparametric learning
UR - http://www.scopus.com/inward/record.url?scp=85030168676&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85030168676
SN - 1532-4435
VL - 18
SP - 1
EP - 42
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
ER -