The design, implementation, and evaluation of a banded linear solver for distributed-memory parallel computers

Anshul Gupta, Fred G. Gustavson, Mahesh Joshi, Sivan Toledo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper describes the design, implementation, and evaluation of a parallel algorithm for the Cholesky factorization of banded matrices. The algorithm is part of IBM's Parallel Engineering and Scientific Subroutine Library version 1.2 and is compatible with ScaLA PACK’s banded solver. Analysis, as well as experiments on an IBM SP2 distributed-memory parallel computer, show that the algorithm efficiently factors banded matrices with wide bandwidth. For example, a 31- node SP2 factors a large matrix more than 16 times faster than a single node would factor it using the best sequential algorithm, and more than 20 times faster than a single node would using LAPACK's DPBTRP. The algorithm uses novel ideas in the area of distributed dense matrix computations that include the use of a dynamic schedule for a blocked systolic-like algorithm and the separation of the input and output data layouts from the layout the algorithm uses internally. The algorithm also uses known techniques such as blocking to improve its communicationto- computation ratio and its data-cache behavior.

Original languageEnglish
Title of host publicationApplied Parallel Computing
Subtitle of host publicationIndustrial Computation and Optimization - 3rd International Workshop, PARA 1996, Proceedings
EditorsJerzy Waśniewski, Dorte Olesen, Jack Dongarra, Kaj Madsen
PublisherSpringer Verlag
Pages328-340
Number of pages13
ISBN (Print)3540620958, 9783540620952
DOIs
StatePublished - 1996
Externally publishedYes
Event3rd International Workshop on Applied Parallel Computing in Industrial Problems and Optimization, PARA 1996 - Lyngby, Denmark
Duration: 18 Aug 199621 Aug 1996

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1184
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd International Workshop on Applied Parallel Computing in Industrial Problems and Optimization, PARA 1996
Country/TerritoryDenmark
CityLyngby
Period18/08/9621/08/96

Fingerprint

Dive into the research topics of 'The design, implementation, and evaluation of a banded linear solver for distributed-memory parallel computers'. Together they form a unique fingerprint.

Cite this