TY - JOUR

T1 - The Sample Complexity of Sparse Multireference Alignment and Single-Particle Cryo-Electron Microscopy

AU - Bendory, T

AU - Edidin, D

PY - 2024

Y1 - 2024

N2 - Multireference alignment (MRA) is the problem of recovering a signal from its multiple noisy copies, each acted upon by a random group element. MRA is mainly motivated by single-particle cryoelectron microscopy (cryo-EM) that has recently joined X-ray crystallography as one of the two leading technologies to reconstruct biological molecular structures. Previous papers have shown that, in the high-noise regime, the sample complexity of MRA and cryo-EM is n = w(\sigma2d), where n is the number of observations, \sigma2 is the variance of the noise, and d is the lowest-order moment of the observations that uniquely determines the signal. In particular, it was shown that, in many cases, d = 3 for generic signals, and thus, the sample complexity is n = w(\sigma6). In this paper, we analyze the second moment of the MRA and cryo-EM models. First, we show that, in both models, the second moment determines the signal up to a set of unitary matrices whose dimension is governed by the decomposition of the space of signals into irreducible representations of the group. Second, we derive sparsity conditions under which a signal can be recovered from the second moment, implying sample complexity of n = w(\sigma4). Notably, we show that the sample complexity of cryo-EM is n = w(\sigma4) if at most one-third of the coefficients representing the molecular structure are nonzero; this bound is near-optimal. The analysis is based on tools from representation theory and algebraic geometry. We also derive bounds on recovering a sparse signal from its power spectrum, which is the main computational problem of X-ray crystallography.

AB - Multireference alignment (MRA) is the problem of recovering a signal from its multiple noisy copies, each acted upon by a random group element. MRA is mainly motivated by single-particle cryoelectron microscopy (cryo-EM) that has recently joined X-ray crystallography as one of the two leading technologies to reconstruct biological molecular structures. Previous papers have shown that, in the high-noise regime, the sample complexity of MRA and cryo-EM is n = w(\sigma2d), where n is the number of observations, \sigma2 is the variance of the noise, and d is the lowest-order moment of the observations that uniquely determines the signal. In particular, it was shown that, in many cases, d = 3 for generic signals, and thus, the sample complexity is n = w(\sigma6). In this paper, we analyze the second moment of the MRA and cryo-EM models. First, we show that, in both models, the second moment determines the signal up to a set of unitary matrices whose dimension is governed by the decomposition of the space of signals into irreducible representations of the group. Second, we derive sparsity conditions under which a signal can be recovered from the second moment, implying sample complexity of n = w(\sigma4). Notably, we show that the sample complexity of cryo-EM is n = w(\sigma4) if at most one-third of the coefficients representing the molecular structure are nonzero; this bound is near-optimal. The analysis is based on tools from representation theory and algebraic geometry. We also derive bounds on recovering a sparse signal from its power spectrum, which is the main computational problem of X-ray crystallography.

KW - X-ray crystallography

KW - cryo-EM

KW - Multireference alignment

KW - Representation theory

KW - Signal processing

KW - Sparsity

UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=tau-cris-version-2&SrcAuth=WosAPI&KeyUT=WOS:001197903300002&DestLinkType=FullRecord&DestApp=WOS_CPL

U2 - 10.1137/23M155685X

DO - 10.1137/23M155685X

M3 - Article

SN - 2577-0187

VL - 6

SP - 254

EP - 282

JO - SIAM Journal on Mathematics of Data Science

JF - SIAM Journal on Mathematics of Data Science

IS - 2

ER -