TY - GEN

T1 - Parallel algorithms for evaluating matrix polynomials

AU - Toledo, Sivan

AU - Waisel, Amit

N1 - Publisher Copyright:
© 2019 ACM.

PY - 2019/8/5

Y1 - 2019/8/5

N2 - We develop and evaluate parallel algorithms for a fundamental problem in numerical computing, namely the evaluation of a polynomial of a matrix. The algorithm consists of many building blocks that can be assembled in several ways. We investigate parallelism in individual building blocks, develop parallel implemenations, and assemble them into an overall parallel algorithm. We analyze the effects of both the dimension of the matrix and the degree of the polynomial on both arithmetic complexity and on parallelism, and we consequently propose which variants use in different cases. Our theoretical results indicate that one variant of the algorithm, based on applying the Paterson-Stockmeyer method to the entire matrix, parallelizes very effectively on virtually any matrix dimension and polynomial degree. However, it is not the most efficient from the arithmetic complexity viewpoint. Another algorithm, based on the Davies-Higham block recurrence is much more efficient from the arithmetic complexity viewpoint, but one of its building blocks is serial. Experimental results on a dual-socket 28-core server show that the first algorithm can effectively use all the cores, but that on high-degree polynomials the second algorithm is often faster, in spite of the sequential phase. This indicates that our parallel algorithms for the other phases are indeed effective.

AB - We develop and evaluate parallel algorithms for a fundamental problem in numerical computing, namely the evaluation of a polynomial of a matrix. The algorithm consists of many building blocks that can be assembled in several ways. We investigate parallelism in individual building blocks, develop parallel implemenations, and assemble them into an overall parallel algorithm. We analyze the effects of both the dimension of the matrix and the degree of the polynomial on both arithmetic complexity and on parallelism, and we consequently propose which variants use in different cases. Our theoretical results indicate that one variant of the algorithm, based on applying the Paterson-Stockmeyer method to the entire matrix, parallelizes very effectively on virtually any matrix dimension and polynomial degree. However, it is not the most efficient from the arithmetic complexity viewpoint. Another algorithm, based on the Davies-Higham block recurrence is much more efficient from the arithmetic complexity viewpoint, but one of its building blocks is serial. Experimental results on a dual-socket 28-core server show that the first algorithm can effectively use all the cores, but that on high-degree polynomials the second algorithm is often faster, in spite of the sequential phase. This indicates that our parallel algorithms for the other phases are indeed effective.

KW - Matrix Polynomials

KW - Parallel Algorithms

KW - Polynomial Evaluation

UR - http://www.scopus.com/inward/record.url?scp=85071102161&partnerID=8YFLogxK

U2 - 10.1145/3337821.3337871

DO - 10.1145/3337821.3337871

M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???

AN - SCOPUS:85071102161

T3 - ACM International Conference Proceeding Series

BT - Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019

PB - Association for Computing Machinery

T2 - 48th International Conference on Parallel Processing, ICPP 2019

Y2 - 5 August 2019 through 8 August 2019

ER -