Robust subspace approximation in a stream

Roie Levin, Anish Sevekari, David P. Woodruff

Research output: Contribution to journalConference articlepeer-review

Abstract

We study robust subspace estimation in the streaming and distributed settings. Given a set of n data points {ai}ni=1 in Rd and an integer k, we wish to find a linear subspace S of dimension k for whichi M(dist(S, ai)) is minimized, where dist(S, x):= minyS ∥x − y∥2, and M(·) is some loss function. When M is the identity function, S gives a subspace that is more robust to outliers than that provided by the truncated SVD. Though the problem is NP-hard, it is approximable within a (1 + ϵ) factor in polynomial time when k and ϵ are constant. We give the first sublinear approximation algorithm for this problem in the turnstile streaming and arbitrary partition distributed models, achieving the same time guarantees as in the offline case. Our algorithm is the first based entirely on oblivious dimensionality reduction, and significantly simplifies prior methods for this problem, which held in neither the streaming nor distributed models.

Original languageEnglish
Pages (from-to)10683-10693
Number of pages11
JournalAdvances in Neural Information Processing Systems
Volume2018-December
StatePublished - 2018
Externally publishedYes
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: 2 Dec 20188 Dec 2018

Fingerprint

Dive into the research topics of 'Robust subspace approximation in a stream'. Together they form a unique fingerprint.

Cite this