Distributed error confinement

Yossi Azar*, Shay Kutten, Boaz Patt-Shamir

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

We study error confinement in distributed applications, which can be viewed as an extreme case of various fault locality notions studied in the past. Error confinement means that to the external observer, only nodes that were directly hit by a fault may deviate from their specified correct behavior, and only temporarily. The externally observable behavior of all other nodes must remain impeccable, even though their internal state may be affected. Error confinement is impossible if an adversary is allowed to inflict arbitrary transient faults on the system, since the faults might completely wipe out input values. We introduce a new fault-tolerance measure we call agility, which quantifies the fault tolerance of an algorithm that disseminates information against state corrupting faults. We then propose broadcast algorithms that guarantee error confinement with optimal agility to within a constant factor in synchronous networks. These algorithms can serve as building blocks in more general reactive systems. Previous results in exploring locality in reactive systems were not error confined, or allowed a wide range of behaviors to be considered correct. Our results also include a new technique that can be used to analyze the "cow path" problem.

Original languageEnglish
Article number48
JournalACM Transactions on Algorithms
Volume6
Issue number3
DOIs
StatePublished - 1 Jun 2010

Keywords

  • Distributed algorithms
  • Persistence
  • Self-stabilization
  • Voting

Fingerprint

Dive into the research topics of 'Distributed error confinement'. Together they form a unique fingerprint.

Cite this