TY - JOUR
T1 - A Simulation Study of Decoupled Architecture Computers
AU - Smith, James E.
AU - Weiss, Shlomo
AU - Pang, Nicholas Y.
PY - 1986/8
Y1 - 1986/8
N2 - Decoupled architectures achieve high scalar performance by cleanly splitting instruction processing into memory access and execution tasks. Several decoupled architectures have been proposed, and they all have two characteristics in common: 1) they have two separate sets of instructions, one for accessing memory and one for performing function execution. 2) The memory accessing task and the execution task communicate via architectural queues. These characteristics lead to pipelined computers that have the following advantages: 1) they can issue more than one instruction per clock period; 2) they can dynamically schedule instructions at runtime; 3) they are less sensitive to memory access delays than conventional architectures. We present a simulation study of decoupled architectures. The simulation models are very detailed, with timing resolution to the clock period. The Lawrence Livermore Loops are used as the workload. We first describe a decoupled architecture based on the CRAY-1 scalar architecture. The sensitivity to memory access delays are studied by varying memory access time over a wide range of values. We show that performance improvements increase linearly over the scalar CRAY-1 as the memory access paths of both are lengthened. Then, we study queue lengths in decoupled machines, and show the affect of queue lengths on performance. Relatively short queues are shown to give optimum, or near-optimum, performance.
AB - Decoupled architectures achieve high scalar performance by cleanly splitting instruction processing into memory access and execution tasks. Several decoupled architectures have been proposed, and they all have two characteristics in common: 1) they have two separate sets of instructions, one for accessing memory and one for performing function execution. 2) The memory accessing task and the execution task communicate via architectural queues. These characteristics lead to pipelined computers that have the following advantages: 1) they can issue more than one instruction per clock period; 2) they can dynamically schedule instructions at runtime; 3) they are less sensitive to memory access delays than conventional architectures. We present a simulation study of decoupled architectures. The simulation models are very detailed, with timing resolution to the clock period. The Lawrence Livermore Loops are used as the workload. We first describe a decoupled architecture based on the CRAY-1 scalar architecture. The sensitivity to memory access delays are studied by varying memory access time over a wide range of values. We show that performance improvements increase linearly over the scalar CRAY-1 as the memory access paths of both are lengthened. Then, we study queue lengths in decoupled machines, and show the affect of queue lengths on performance. Relatively short queues are shown to give optimum, or near-optimum, performance.
KW - Decoupled architectures
KW - performance evaluation
KW - pipelined processors
KW - scientific computers
KW - supercomputers
UR - http://www.scopus.com/inward/record.url?scp=0022769348&partnerID=8YFLogxK
U2 - 10.1109/TC.1986.1676820
DO - 10.1109/TC.1986.1676820
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:0022769348
SN - 0018-9340
VL - C-35
SP - 692
EP - 702
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 8
ER -