Abstract
We study problem-dependent rates, that is, generalization errors that scale near-optimally with the variance, effective loss, or gradient norms evaluated at the “best hypothesis.” We introduce a principled framework dubbed “uniform localized convergence” and characterize sharp problem-dependent rates for central statistical learning problems. From a methodological viewpoint, our framework resolves several fundamental limitations of existing uniform convergence and localization analysis approaches. It also provides improvements and some level of unification in the study of localized complexities, one-sided uniform inequalities, and sample-based iterative algorithms. In the so-called “slow rate” regime, we provide the first (moment-penalized) estimator that achieves the optimal variance-dependent rate for general “rich” classes; we also establish an improved loss-dependent rate for standard empirical risk minimization. In the “fast rate” regime, we establish finite-sample, problem-dependent bounds that are comparable to precise asymptotics. In addition, we show that iterative algorithms such as gradient descent and first order expectation maximization can achieve optimal generalization error in several representative problems across the areas of nonconvex learning, stochastic optimization, and learning with missing data.
| Original language | English |
|---|---|
| Pages (from-to) | 40-67 |
| Number of pages | 28 |
| Journal | Mathematics of Operations Research |
| Volume | 50 |
| Issue number | 1 |
| DOIs | |
| State | Published - Feb 2025 |
| Externally published | Yes |
Keywords
- expectation maximization
- iterative algorithms
- nonconvex learning
- problem-dependent generalization error bounds
- statistical learning theory
- stochastic optimization
- uniform convergence and localization
- variance penalization