Distributed Error Confinement

Yossi Azar*, Shay Kutten, Boaz Patt-Shamir

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Scopus citations

Abstract

We initiate the study of error confinement in distributed applications, where the goal is that only nodes that were directly hit by a fault may deviate from their correct external behavior, and only temporarily. The external behavior of all other nodes must remain impeccable, even though their internal state may be affected. Error confinement is impossible if an adversary is allowed to inflict arbitrary transient faults on the system, since the faults might completely wipe out input values. We introduce a new fault tolerance measure we call agility, which quantifies the strength of an algorithm that disseminate information, against state corrupting faults. We study the basic problem of broadcast, and propose algorithms that guarantee error confinement with optimal agility to within a constant factor, even in asynchronous networks when the topology is unknown. These algorithms can serve as building blocks in more general reactive systems. Previous results in exploring locality in reactive systems were not error confined, and relied on the assumption (not used in current paper) that the errors hitting each node are probabilistic, such that a faulty node itself, or its neighbor, can detect the node faulty. The main algorithm uses the novel core bootstrapping technique, that seems inherent for voting in reactive networks; its analysis leads to an interesting combinatorial problem. The technique and the analysis may be of independent interest.

Original languageEnglish
Title of host publicationPODC '03: Proceedings of the twenty-second annual symposium on Principles of distributed computing
PublisherAssociation for Computing Machinery (ACM)
Pages33-42
Number of pages10
ISBN (Print)978-1-58113-708-8
DOIs
StatePublished - 2003
EventTwenty-Second Annual ACM Symposium on Principles of Distributed Computing, PODC 2003 - Boston, MA, United States
Duration: 13 Jul 200316 Jul 2003

Conference

ConferenceTwenty-Second Annual ACM Symposium on Principles of Distributed Computing, PODC 2003
Country/TerritoryUnited States
CityBoston, MA
Period13/07/0316/07/03

Fingerprint

Dive into the research topics of 'Distributed Error Confinement'. Together they form a unique fingerprint.

Cite this