A randomized least squares solver for terabyte-sized dense overdetermined systems

Chander Iyer*, Haim Avron, Georgios Kollias, Yves Ineichen, Christopher Carothers, Petros Drineas

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


We present a fast randomized least-squares solver for distributed-memory platforms. Our solver is based on the Blendenpik algorithm, but employs multiple random projection schemes to construct a sketch of the input matrix. These random projection sketching schemes, and in particular the use of the randomized Discrete Cosine Transform, enable our algorithm to scale the distributed memory vanilla implementation of Blendenpik to terabyte-sized matrices and provide up to ×7.5 speedup over a state-of-the-art scalable least-squares solver based on the classic QR algorithm. Experimental evaluations on terabyte scale matrices demonstrate excellent speedups on up to 16,384 cores on a Blue Gene/Q supercomputer.

Original languageEnglish
Article number100547
JournalJournal of Computational Science
StatePublished - Sep 2019


FundersFunder number
National Science FoundationIIS-1302231
U.S. Department of Energy
Defense Advanced Research Projects Agency
International Business Machines Corporation
AT and T
Air Force Research LaboratoryFA8750- 12-C-0323
Army Research Laboratory


    • Dense least squares regression
    • High-performance computing
    • Randomized numerical linear algebra


    Dive into the research topics of 'A randomized least squares solver for terabyte-sized dense overdetermined systems'. Together they form a unique fingerprint.

    Cite this