TY - GEN

T1 - Fault tolerant gradient clock synchronization

AU - Bund, Johannes

AU - Lenzen, Christoph

AU - Rosenbaum, Will

N1 - Publisher Copyright:
© 2019 Association for Computing Machinery. All rights reserved.

PY - 2019/7/16

Y1 - 2019/7/16

N2 - Synchronizing clocks in distributed systems is well-understood, both in terms of fault-tolerance in fully connected systems, and the optimal achievable local skew in general fault-free networks. However, so far nothing non-trivial is known about the local skew that can be achieved in non-fully-connected topologies even under a single Byzantine fault. In this work, we show that asymptotically optimal local skew can be achieved in the presence of Byzantine faults. Our approach combines the Lynch-Welch algorithm [19] for synchronizing a clique of n nodes with up to f < n/3 Byzantine faults, and the gradient clock synchronization (GCS) algorithm by Lenzen et al. [15] in order to render the latter resilient to faults. This is not possible on general graphs, so we augment an arbitrary input graph G by replacing each node with a fully connected cluster of 3f + 1 copies, and execute an instance of the Lynch-Welch algorithm within each cluster. We interpret the clusters as supernodes executing the GCS algorithm on G, where each node in the cluster maintains an estimate of the logical clock of its supernode. By also fully connecting clusters corresponding to neighbors in G, supernodes maintain estimates of neighboring clusters' logical clocks. We achieve asymptotically optimal local skew, assuming that no cluster contains more than f faulty nodes. This construction yields factors of O(f) and O(f2) overheads in terms of nodes and edges, respectively. Since tolerating f faulty neighbors trivially requires degrees larger than f , these overheads are asymptotically optimal.

AB - Synchronizing clocks in distributed systems is well-understood, both in terms of fault-tolerance in fully connected systems, and the optimal achievable local skew in general fault-free networks. However, so far nothing non-trivial is known about the local skew that can be achieved in non-fully-connected topologies even under a single Byzantine fault. In this work, we show that asymptotically optimal local skew can be achieved in the presence of Byzantine faults. Our approach combines the Lynch-Welch algorithm [19] for synchronizing a clique of n nodes with up to f < n/3 Byzantine faults, and the gradient clock synchronization (GCS) algorithm by Lenzen et al. [15] in order to render the latter resilient to faults. This is not possible on general graphs, so we augment an arbitrary input graph G by replacing each node with a fully connected cluster of 3f + 1 copies, and execute an instance of the Lynch-Welch algorithm within each cluster. We interpret the clusters as supernodes executing the GCS algorithm on G, where each node in the cluster maintains an estimate of the logical clock of its supernode. By also fully connecting clusters corresponding to neighbors in G, supernodes maintain estimates of neighboring clusters' logical clocks. We achieve asymptotically optimal local skew, assuming that no cluster contains more than f faulty nodes. This construction yields factors of O(f) and O(f2) overheads in terms of nodes and edges, respectively. Since tolerating f faulty neighbors trivially requires degrees larger than f , these overheads are asymptotically optimal.

KW - Clock synchronization

KW - Fault tolerance

KW - Gradient clock synchronization

KW - Local skew

UR - http://www.scopus.com/inward/record.url?scp=85071029482&partnerID=8YFLogxK

U2 - 10.1145/3293611.3331637

DO - 10.1145/3293611.3331637

M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???

AN - SCOPUS:85071029482

T3 - Proceedings of the Annual ACM Symposium on Principles of Distributed Computing

SP - 357

EP - 365

BT - PODC 2019 - Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing

PB - Association for Computing Machinery

T2 - 38th ACM Symposium on Principles of Distributed Computing, PODC 2019

Y2 - 29 July 2019 through 2 August 2019

ER -