TY - JOUR
T1 - Parallel unsymmetric-pattern multifrontal sparse LU with column preordering
AU - Avron, Haim
AU - Shklarski, Gil
AU - Toledo, Sivan
PY - 2008/3/1
Y1 - 2008/3/1
N2 - We present a new parallel sparse LU factorization algorithm and code. The algorithm uses a column-preordering partial-pivoting unsymmetric-pattern multifrontal approach. Our baseline sequential algorithm is based on UMFPACK 4, but is somewhat simpler and is often somewhat faster than UMFPACK version 4.0. Our parallel algorithm is designed for shared-memory machines with a small or moderate number of processors (we tested it on up to 32 processors). We experimentally compare our algorithm with SuperLU_MT, an existing shared-memory sparse LU factorization with partial pivoting. SuperLU_MT scales better than our new algorithm, but our algorithm is more reliable and is usually faster. More specifically, on matrices that are costly to factor, our algorithm is usually faster on up to 4 processors, and is usually faster on 8 and 16. We were not able to run SuperLU_MT on 32. The main contribution of this article is showing that the column-preordering partial-pivoting unsymmetric-pattern multifrontal approach, developed as a sequential algorithm by Davis in several recent versions of UMFPACK, can be effectively parallelized.
AB - We present a new parallel sparse LU factorization algorithm and code. The algorithm uses a column-preordering partial-pivoting unsymmetric-pattern multifrontal approach. Our baseline sequential algorithm is based on UMFPACK 4, but is somewhat simpler and is often somewhat faster than UMFPACK version 4.0. Our parallel algorithm is designed for shared-memory machines with a small or moderate number of processors (we tested it on up to 32 processors). We experimentally compare our algorithm with SuperLU_MT, an existing shared-memory sparse LU factorization with partial pivoting. SuperLU_MT scales better than our new algorithm, but our algorithm is more reliable and is usually faster. More specifically, on matrices that are costly to factor, our algorithm is usually faster on up to 4 processors, and is usually faster on 8 and 16. We were not able to run SuperLU_MT on 32. The main contribution of this article is showing that the column-preordering partial-pivoting unsymmetric-pattern multifrontal approach, developed as a sequential algorithm by Davis in several recent versions of UMFPACK, can be effectively parallelized.
KW - Gaussian elimination
KW - Multifrontal
KW - Unsymmetric
UR - http://www.scopus.com/inward/record.url?scp=41149119930&partnerID=8YFLogxK
U2 - 10.1145/1326548.1326550
DO - 10.1145/1326548.1326550
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:41149119930
SN - 0098-3500
VL - 34
JO - ACM Transactions on Mathematical Software
JF - ACM Transactions on Mathematical Software
IS - 2
M1 - 8
ER -