High-Dimensional Classification by Sparse Logistic Regression

Felix Abramovich, Vadim Grinshtein

Research output: Contribution to journalArticlepeer-review

Abstract

We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic bounds for its misclassification excess risk. To assess its tightness, we establish the corresponding minimax lower bounds. The bounds can be reduced under the additional low-noise condition. The proposed complexity penalty is remarkably related to the Vapnik-Chervonenkis-dimension of a set of sparse linear classifiers. Implementation of any complexity penalty-based criterion, however, requires a combinatorial search over all possible models. To find a model selection procedure computationally feasible for high-dimensional data, we extend the Slope estimator for logistic regression and show that under an additional weighted restricted eigenvalue condition it is rate-optimal in the minimax sense.

Original languageEnglish
Article number8561249
Pages (from-to)3068-3079
Number of pages12
JournalIEEE Transactions on Information Theory
Volume65
Issue number5
DOIs
StatePublished - May 2019

Keywords

  • Complexity penalty
  • VC-dimension
  • feature selection
  • high-dimensionality
  • misclassification excess risk
  • sparsity

Fingerprint

Dive into the research topics of 'High-Dimensional Classification by Sparse Logistic Regression'. Together they form a unique fingerprint.

Cite this