Asynchronous Fully-Decentralized SGD in the Cluster-Based Model

Hagit Attiya*, Noa Schiller

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper presents fault-tolerant asynchronous Stochastic Gradient Decent (SGD) algorithms. SGD is widely used for approximating the minimum of a cost function Q, a core part of optimization and learning algorithms. Our algorithms are designed for the cluster-based model, which combines message-passing and shared-memory communication layers. Processes may fail by crashing, and the algorithm inside each cluster is wait-free, using only reads and writes. For a strongly convex Q, our algorithm can withstand partitions of the system. It provides convergence rate that is the maximal distributed acceleration over the optimal convergence rate of sequential SGD. For arbitrary smooth functions, the convergence rate has an additional term that depends on the maximal difference between the parameters at the same iteration. (This holds under standard assumptions on Q). In this case, the algorithm obtains the same convergence rate as sequential SGD, up to a logarithmic factor. This is achieved by using, at each iteration, a multidimensional approximate agreement algorithm, tailored for the cluster-based model. The general algorithm communicates with nonfaulty processes belonging to clusters that include a majority of all processes. We prove that this condition is necessary when optimizing some non-convex functions.

Original languageEnglish
Title of host publicationAlgorithms and Complexity - 13th International Conference, CIAC 2023, Proceedings
EditorsMarios Mavronicolas
PublisherSpringer Science and Business Media Deutschland GmbH
Pages52-66
Number of pages15
ISBN (Print)9783031304477
DOIs
StatePublished - 2023
Externally publishedYes
Event13th International Symposium on Algorithms and Complexity, CIAC 2023 - Larnaca, Cyprus
Duration: 13 Jun 202316 Jun 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13898 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th International Symposium on Algorithms and Complexity, CIAC 2023
Country/TerritoryCyprus
CityLarnaca
Period13/06/2316/06/23

Keywords

  • Asynchronous computing
  • Cluster-based model
  • Distributed learning
  • Multi-dimensional approximate agreement
  • Stochastic gradient descent

Fingerprint

Dive into the research topics of 'Asynchronous Fully-Decentralized SGD in the Cluster-Based Model'. Together they form a unique fingerprint.

Cite this