DDMR: Dynamic and scalable dual modular redundancy with short validation intervals

Amit Golander*, Shlomo Weiss, Ronny Ronen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

To address the problem of soft errors in chip multiprocessors, we propose Dynamic Dual Modular Redundancy(DDMR). DDMR uses known techniques and components to construct a novel multicore architecture that provides soft error detection and recovery. DDMR may be easily integrated with CMP architectures. DDMR replaces pairwise links connected at manufacturing with a new ring architecture that supports runtime linking of redundant cores. Requiring minimal area and power resources, the ring prevents loading the general purpose and more expensive CMP interconnect with transfers needed to coordinate redundant processing. DDMR uses signatures to reduce bandwidth requirements. Signatures are exchanged after short validation intervals, an approach that saves resources needed to buffer uncommitted data and reduces latencies in parallel programs. DDMR scales with the number of cores and may be used in large multicore architectures.

Original languageEnglish
Article number4564436
Pages (from-to)65-68
Number of pages4
JournalIEEE Computer Architecture Letters
Volume7
Issue number2
DOIs
StatePublished - Feb 2008

Fingerprint

Dive into the research topics of 'DDMR: Dynamic and scalable dual modular redundancy with short validation intervals'. Together they form a unique fingerprint.

Cite this