A portable parallelization of the Cooley-Tukey FFT algorithm for MIMD multiprocessors is presented. The implementation uses the virtual machine for multiprocessors (VMMP) and PVM portable software packages. Since VMMP provides the same set of services on all target machines, a single version of the parallel FFT code was used for shared memory (25-processor Sequent Symmetry), shared bus (MOS-running distributed UNIX) and distributed memory multiprocessor (transputer network and 64-processor IBM SP2). It is accompanied with detailed performance analysis of the implementations. The algorithm achieved high efficiencies on all target machines. The analysis indicates that most overheads are caused by the target architecture and not by VMMP or PVM inefficiencies. The portability analysis of the FFT provides several important insights. On the message passing architecture, the parallel FFT algorithm can obtain linearly increasing speedup with respect to the number of processors with only a moderate increase in the problem size. The parallel FFT can be executed by any number of processors, but generally the number of processors is much less than the length of the input data. The results indicate that the parallel FFT is portable: it achieves very good speedups on either a shared memory multiprocessor with high memory bandwidth or on a message passing multiprocessor without any change in the programs.
|Number of pages||23|
|Journal||Concurrency Practice and Experience|
|State||Published - Jul 1998|