A Study of Scalar Compilation Techniques for Pipelined Supercomputers

Shlomo Weiss, James E. Smith

Research output: Contribution to journalArticlepeer-review

5 Scopus citations


This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture 1990 and of the hardware (size of instruction buffer) on the efficiency of loop unrolling. We also develop a methodology for classifying software pipelining techniques. For loop unrolling, a straightforward scheduling algorithm is shown to produce near-optimal results when not inhibited by recurrences or memory hazards. Our study indicates that the performance produced with a modified CRAY-1S scalar architecture and a code scheduler utilizing loop unrolling is comparable to the performance achieved by the CRAY-1S with a vector unit and the CFT vectorizing compiler. Finally, we show that the combination of loop unrolling and dynamic software pipelining, as implemented by a decoupled computer, substantially outperforms the vector CRAY-1S.

Original languageEnglish
Pages (from-to)223-245
Number of pages23
JournalACM Transactions on Mathematical Software
Issue number3
StatePublished - 9 Jan 1990
Externally publishedYes


Dive into the research topics of 'A Study of Scalar Compilation Techniques for Pipelined Supercomputers'. Together they form a unique fingerprint.

Cite this