Robust Linear Regression for General Feature Distribution

Tom Norman, Nir Weinberger, Kfir Y. Levy

Research output: Contribution to journalConference articlepeer-review

Abstract

We investigate robust linear regression where data may be contaminated by an oblivious adversary, i.e., an adversary that knows the data distribution but is otherwise oblivious to the realization of the data samples. This model has been previously analyzed under strong assumptions. Concretely, (i) all previous works assume that the covariance matrix of the features is positive definite; (ii) most of them assume that the features are centered. Additionally, all previous works make additional restrictive assumptions, e.g., assuming Gaussianity of the features or symmetric distribution of the corruptions. In this work, we investigate robust regression under a more general set of assumptions: (i) the covariance matrix may be either positive definite or positive semi definite, (ii) features may not be centered, (iii) no assumptions beyond boundedness (or sub-Gaussianity) of the features and the measurement noise. Under these assumptions we analyze a sequential algorithm, namely, a natural SGD variant for this problem, and show that it enjoys a fast convergence rate when the covariance matrix is positive definite. In the positive semi definite case we show that there are two regimes: if the features are centered, we can obtain a standard convergence rate; Otherwise, the adversary can cause any learner to fail arbitrarily.

Original languageEnglish
Pages (from-to)2405-2435
Number of pages31
JournalProceedings of Machine Learning Research
Volume206
StatePublished - 2023
Externally publishedYes
Event26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023 - Valencia, Spain
Duration: 25 Apr 202327 Apr 2023

Fingerprint

Dive into the research topics of 'Robust Linear Regression for General Feature Distribution'. Together they form a unique fingerprint.

Cite this