TY - GEN

T1 - Finding skewed subcubes under a distribution

AU - Gopalan, Parikshit

AU - Levin, Roie

AU - Wieder, Udi

N1 - Publisher Copyright:
© Parikshit Gopalan, Roie Levin, and Udi Wieder.

PY - 2020/1

Y1 - 2020/1

N2 - Say that we are given samples from a distribution ψ over an n-dimensional space. We expect or desire ψ to behave like a product distribution (or a k-wise independent distribution over its marginals for small k). We propose the problem of enumerating/list-decoding all large subcubes where the distribution ψ deviates markedly from what we expect; we refer to such subcubes as skewed subcubes. Skewed subcubes are certificates of dependencies between small subsets of variables in ψ. We motivate this problem by showing that it arises naturally in the context of algorithmic fairness and anomaly detection. In this work we focus on the special but important case where the space is the Boolean hypercube, and the expected marginals are uniform. We show that the obvious definition of skewed subcubes can lead to intractable list sizes, and propose a better definition of a minimal skewed subcube, which are subcubes whose skew cannot be attributed to a larger subcube that contains it. Our main technical contribution is a list-size bound for this definition and an algorithm to efficiently find all such subcubes. Both the bound and the algorithm rely on Fourier-analytic techniques, especially the powerful hypercontractive inequality. On the lower bounds side, we show that finding skewed subcubes is as hard as the sparse noisy parity problem, and hence our algorithms cannot be improved on substantially without a breakthrough on this problem which is believed to be intractable. Motivated by this, we study alternate models allowing query access to ψ where finding skewed subcubes might be easier.

AB - Say that we are given samples from a distribution ψ over an n-dimensional space. We expect or desire ψ to behave like a product distribution (or a k-wise independent distribution over its marginals for small k). We propose the problem of enumerating/list-decoding all large subcubes where the distribution ψ deviates markedly from what we expect; we refer to such subcubes as skewed subcubes. Skewed subcubes are certificates of dependencies between small subsets of variables in ψ. We motivate this problem by showing that it arises naturally in the context of algorithmic fairness and anomaly detection. In this work we focus on the special but important case where the space is the Boolean hypercube, and the expected marginals are uniform. We show that the obvious definition of skewed subcubes can lead to intractable list sizes, and propose a better definition of a minimal skewed subcube, which are subcubes whose skew cannot be attributed to a larger subcube that contains it. Our main technical contribution is a list-size bound for this definition and an algorithm to efficiently find all such subcubes. Both the bound and the algorithm rely on Fourier-analytic techniques, especially the powerful hypercontractive inequality. On the lower bounds side, we show that finding skewed subcubes is as hard as the sparse noisy parity problem, and hence our algorithms cannot be improved on substantially without a breakthrough on this problem which is believed to be intractable. Motivated by this, we study alternate models allowing query access to ψ where finding skewed subcubes might be easier.

KW - Algorithmic fairness

KW - Anomaly detection

KW - Fourier analysis

KW - Probability

KW - Unsupervised learning

UR - http://www.scopus.com/inward/record.url?scp=85078059072&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.ITCS.2020.84

DO - 10.4230/LIPIcs.ITCS.2020.84

M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???

AN - SCOPUS:85078059072

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 11th Innovations in Theoretical Computer Science Conference, ITCS 2020

A2 - Vidick, Thomas

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

T2 - 11th Innovations in Theoretical Computer Science Conference, ITCS 2020

Y2 - 12 January 2020 through 14 January 2020

ER -