## Abstract

We study the scenario where a transient fault hit f of the n nodes of a distributed system by corrupting their state. We consider the basic problem of persistent bit, where the system is required to maintain a value in the face of transient failures by means of replication. We give an algorithm to recover the value quickly: the value of the bit is recovered at all nodes in O(f) time units for any unknown value of f > n/2. Moreover, complete state quiescence occurs in O(diam) time units, where diam denotes the diameter of the network. This means that the value persists indefinitely so long as any f < n/2 faults are followed by Ω(diam) fault-free time units. We prove lower bounds which show that both time bounds are asymptotically optimal. Using the algorithm for persistent bit, we present a general transformer which takes a distributed non-reactive, non-stabilizing protocol P, and produces a self-stabilizing protocol P′ which solves the problem P solves, with the additional property that if the number of faults that hit the system after stabilization is f, for any unknown f < n/2, then the output of P′ regains stability in O(f) time units, and the state stabilizes in O(diam) time units.

Original language | English |
---|---|

Pages | 149-158 |

Number of pages | 10 |

DOIs | |

State | Published - 1997 |

Externally published | Yes |

Event | Proceedings of the 1997 16th Annual ACM Symposium on Principles of Distributed Computing - Santa Barbara, CA, USA Duration: 21 Aug 1997 → 24 Aug 1997 |

### Conference

Conference | Proceedings of the 1997 16th Annual ACM Symposium on Principles of Distributed Computing |
---|---|

City | Santa Barbara, CA, USA |

Period | 21/08/97 → 24/08/97 |