Abstract
We study the scenario where a transient fault hit f of the n nodes of a distributed system by corrupting their state. We consider the basic problem of persistent bit, where the system is required to maintain a value in the face of transient failures by means of replication. We give an algorithm to recover the value quickly: the value of the bit is recovered at all nodes in O(f) time units for any unknown value of f > n/2. Moreover, complete state quiescence occurs in O(diam) time units, where diam denotes the diameter of the network. This means that the value persists indefinitely so long as any f < n/2 faults are followed by Ω(diam) fault-free time units. We prove lower bounds which show that both time bounds are asymptotically optimal. Using the algorithm for persistent bit, we present a general transformer which takes a distributed non-reactive, non-stabilizing protocol P, and produces a self-stabilizing protocol P′ which solves the problem P solves, with the additional property that if the number of faults that hit the system after stabilization is f, for any unknown f < n/2, then the output of P′ regains stability in O(f) time units, and the state stabilizes in O(diam) time units.
Original language | English |
---|---|
Pages | 149-158 |
Number of pages | 10 |
DOIs | |
State | Published - 1997 |
Externally published | Yes |
Event | Proceedings of the 1997 16th Annual ACM Symposium on Principles of Distributed Computing - Santa Barbara, CA, USA Duration: 21 Aug 1997 → 24 Aug 1997 |
Conference
Conference | Proceedings of the 1997 16th Annual ACM Symposium on Principles of Distributed Computing |
---|---|
City | Santa Barbara, CA, USA |
Period | 21/08/97 → 24/08/97 |