Testing Distributional Assumptions of Learning Algorithms

Ronitt Rubinfeld, Arsen Vasilyan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs (A,T), such that if the distribution on examples in the data passes the tester T then one can safely trust the output of the agnostic learner A on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined run-time of nÕ(1/"4). This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the L1 and EMD distance measures. Previously it was known that half-spaces are well-approximated with low-degree polynomials relative to the Gaussian distribution. A key step in our analysis is showing that this is the case even relative to distributions whose low-degree moments approximately match those of a Gaussian. We also go beyond spherically-symmetric distributions, and give a tester-learner pair for halfspaces under the uniform distribution on {0,1}n with combined run-time of nÕ(1/"4). This is achieved using polynomial approximation theory and critical index machinery of [Diakonikolas, Gopalan, Jaiswal, Servedio, and Viola 2009]. Can one design agnostic learning algorithms under distributional assumptions and count on future technical work to produce, as a matter of course, tester-learner pairs with similar run-time? Our answer is a resounding no, as we show there exist some well-studied settings for which 2Õ(n) run-time agnostic learning algorithms are available, yet the combined run-times of tester-learner pairs must be as high as 2ω(n). On that account, the design of tester-learner pairs is a research direction in its own right independent of standard agnostic learning. To be specific, our lower bounds apply to the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over {0,1}n.

Original languageEnglish
Title of host publicationSTOC 2023 - Proceedings of the 55th Annual ACM Symposium on Theory of Computing
EditorsBarna Saha, Rocco A. Servedio
PublisherAssociation for Computing Machinery
Pages1643-1656
Number of pages14
ISBN (Electronic)9781450399135
DOIs
StatePublished - 2 Jun 2023
Externally publishedYes
Event55th Annual ACM Symposium on Theory of Computing, STOC 2023 - Orlando, United States
Duration: 20 Jun 202323 Jun 2023

Publication series

NameProceedings of the Annual ACM Symposium on Theory of Computing
ISSN (Print)0737-8017

Conference

Conference55th Annual ACM Symposium on Theory of Computing, STOC 2023
Country/TerritoryUnited States
CityOrlando
Period20/06/2323/06/23

Funding

FundersFunder number
NSFDMS-2022448, CCF-2006664, CCF-1565235, CCF-1955217

    Keywords

    • agnostic learning
    • distribution testing
    • learning theory

    Fingerprint

    Dive into the research topics of 'Testing Distributional Assumptions of Learning Algorithms'. Together they form a unique fingerprint.

    Cite this