In the general submatrix detection problem, the task is to detect the presence of a small k × k submatrix with entries sampled from a distribution P in an n × n matrix of samples from Q. This formulation includes a number of well-studied problems, such as biclustering when P and Q are Gaussians and the planted dense subgraph formulation of community detection when the submatrix is a principal minor and P and Q are Bernoulli random variables. These problems all seem to exhibit a universal phenomenon: there is a statistical-computational gap depending on P and Q between the minimum k at which this task can be solved and the minimum k at which it can be solved in polynomial time. Our main result is to tightly characterize this computational barrier as a tradeoff between k and the KL divergences between P and Q through average-case reductions from the planted clique conjecture. These computational lower bounds hold given mild assumptions on P and Q arising naturally from classical binary hypothesis testing. In particular, our results recover and generalize the planted clique lower bounds for Gaussian biclustering in Ma and Wu (2015); Brennan et al. (2018) and for the sparse and general regimes of planted dense subgraph in Hajek et al. (2015); Brennan et al. (2018). This yields the first universality principle for computational lower bounds obtained through average-case reductions. To reduce from planted clique to submatrix detection for a specific pair P and Q, we introduce two techniques for average-case reductions: (1) multivariate rejection kernels which perform an algorithmic change of measure and lift to a larger submatrix while obtaining an optimal tradeoff in KL divergence, and (2) a technique for embedding adjacency matrices of graphs as principal minors in larger matrices that handles distributional issues arising from their diagonal entries and the matching row and column supports of the k × k submatrix. We suspect that these techniques have applications in average-case reductions to other problems and are likely of independent interest. We also characterize the statistical barrier in our general formulation of submatrix detection.
- average-case reductions
- community detection
- planted clique conjecture
- statistical-computational gaps
- submatrix detection