TY - JOUR
T1 - Robust subspace approximation in a stream
AU - Levin, Roie
AU - Sevekari, Anish
AU - Woodruff, David P.
N1 - Publisher Copyright:
© 2018 Curran Associates Inc.All rights reserved.
PY - 2018
Y1 - 2018
N2 - We study robust subspace estimation in the streaming and distributed settings. Given a set of n data points {ai}ni=1 in Rd and an integer k, we wish to find a linear subspace S of dimension k for which∑i M(dist(S, ai)) is minimized, where dist(S, x):= miny∈S ∥x − y∥2, and M(·) is some loss function. When M is the identity function, S gives a subspace that is more robust to outliers than that provided by the truncated SVD. Though the problem is NP-hard, it is approximable within a (1 + ϵ) factor in polynomial time when k and ϵ are constant. We give the first sublinear approximation algorithm for this problem in the turnstile streaming and arbitrary partition distributed models, achieving the same time guarantees as in the offline case. Our algorithm is the first based entirely on oblivious dimensionality reduction, and significantly simplifies prior methods for this problem, which held in neither the streaming nor distributed models.
AB - We study robust subspace estimation in the streaming and distributed settings. Given a set of n data points {ai}ni=1 in Rd and an integer k, we wish to find a linear subspace S of dimension k for which∑i M(dist(S, ai)) is minimized, where dist(S, x):= miny∈S ∥x − y∥2, and M(·) is some loss function. When M is the identity function, S gives a subspace that is more robust to outliers than that provided by the truncated SVD. Though the problem is NP-hard, it is approximable within a (1 + ϵ) factor in polynomial time when k and ϵ are constant. We give the first sublinear approximation algorithm for this problem in the turnstile streaming and arbitrary partition distributed models, achieving the same time guarantees as in the offline case. Our algorithm is the first based entirely on oblivious dimensionality reduction, and significantly simplifies prior methods for this problem, which held in neither the streaming nor distributed models.
UR - http://www.scopus.com/inward/record.url?scp=85064827762&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:85064827762
SN - 1049-5258
VL - 2018-December
SP - 10683
EP - 10693
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 32nd Conference on Neural Information Processing Systems, NeurIPS 2018
Y2 - 2 December 2018 through 8 December 2018
ER -