TY - JOUR
T1 - Portable parallel FFT for MIMD multiprocessors
AU - Averbuch, Amir
AU - Gabber, Eran
PY - 1998/7
Y1 - 1998/7
N2 - A portable parallelization of the Cooley-Tukey FFT algorithm for MIMD multiprocessors is presented. The implementation uses the virtual machine for multiprocessors (VMMP) and PVM portable software packages. Since VMMP provides the same set of services on all target machines, a single version of the parallel FFT code was used for shared memory (25-processor Sequent Symmetry), shared bus (MOS-running distributed UNIX) and distributed memory multiprocessor (transputer network and 64-processor IBM SP2). It is accompanied with detailed performance analysis of the implementations. The algorithm achieved high efficiencies on all target machines. The analysis indicates that most overheads are caused by the target architecture and not by VMMP or PVM inefficiencies. The portability analysis of the FFT provides several important insights. On the message passing architecture, the parallel FFT algorithm can obtain linearly increasing speedup with respect to the number of processors with only a moderate increase in the problem size. The parallel FFT can be executed by any number of processors, but generally the number of processors is much less than the length of the input data. The results indicate that the parallel FFT is portable: it achieves very good speedups on either a shared memory multiprocessor with high memory bandwidth or on a message passing multiprocessor without any change in the programs.
AB - A portable parallelization of the Cooley-Tukey FFT algorithm for MIMD multiprocessors is presented. The implementation uses the virtual machine for multiprocessors (VMMP) and PVM portable software packages. Since VMMP provides the same set of services on all target machines, a single version of the parallel FFT code was used for shared memory (25-processor Sequent Symmetry), shared bus (MOS-running distributed UNIX) and distributed memory multiprocessor (transputer network and 64-processor IBM SP2). It is accompanied with detailed performance analysis of the implementations. The algorithm achieved high efficiencies on all target machines. The analysis indicates that most overheads are caused by the target architecture and not by VMMP or PVM inefficiencies. The portability analysis of the FFT provides several important insights. On the message passing architecture, the parallel FFT algorithm can obtain linearly increasing speedup with respect to the number of processors with only a moderate increase in the problem size. The parallel FFT can be executed by any number of processors, but generally the number of processors is much less than the length of the input data. The results indicate that the parallel FFT is portable: it achieves very good speedups on either a shared memory multiprocessor with high memory bandwidth or on a message passing multiprocessor without any change in the programs.
UR - http://www.scopus.com/inward/record.url?scp=0032119930&partnerID=8YFLogxK
U2 - 10.1002/(SICI)1096-9128(199807)10:8<583::AID-CPE327>3.0.CO;2-3
DO - 10.1002/(SICI)1096-9128(199807)10:8<583::AID-CPE327>3.0.CO;2-3
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:0032119930
SN - 1040-3108
VL - 10
SP - 583
EP - 605
JO - Concurrency Practice and Experience
JF - Concurrency Practice and Experience
IS - 8
ER -