TY - GEN
T1 - Asynchronous Fully-Decentralized SGD in the Cluster-Based Model
AU - Attiya, Hagit
AU - Schiller, Noa
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - This paper presents fault-tolerant asynchronous Stochastic Gradient Decent (SGD) algorithms. SGD is widely used for approximating the minimum of a cost function Q, a core part of optimization and learning algorithms. Our algorithms are designed for the cluster-based model, which combines message-passing and shared-memory communication layers. Processes may fail by crashing, and the algorithm inside each cluster is wait-free, using only reads and writes. For a strongly convex Q, our algorithm can withstand partitions of the system. It provides convergence rate that is the maximal distributed acceleration over the optimal convergence rate of sequential SGD. For arbitrary smooth functions, the convergence rate has an additional term that depends on the maximal difference between the parameters at the same iteration. (This holds under standard assumptions on Q). In this case, the algorithm obtains the same convergence rate as sequential SGD, up to a logarithmic factor. This is achieved by using, at each iteration, a multidimensional approximate agreement algorithm, tailored for the cluster-based model. The general algorithm communicates with nonfaulty processes belonging to clusters that include a majority of all processes. We prove that this condition is necessary when optimizing some non-convex functions.
AB - This paper presents fault-tolerant asynchronous Stochastic Gradient Decent (SGD) algorithms. SGD is widely used for approximating the minimum of a cost function Q, a core part of optimization and learning algorithms. Our algorithms are designed for the cluster-based model, which combines message-passing and shared-memory communication layers. Processes may fail by crashing, and the algorithm inside each cluster is wait-free, using only reads and writes. For a strongly convex Q, our algorithm can withstand partitions of the system. It provides convergence rate that is the maximal distributed acceleration over the optimal convergence rate of sequential SGD. For arbitrary smooth functions, the convergence rate has an additional term that depends on the maximal difference between the parameters at the same iteration. (This holds under standard assumptions on Q). In this case, the algorithm obtains the same convergence rate as sequential SGD, up to a logarithmic factor. This is achieved by using, at each iteration, a multidimensional approximate agreement algorithm, tailored for the cluster-based model. The general algorithm communicates with nonfaulty processes belonging to clusters that include a majority of all processes. We prove that this condition is necessary when optimizing some non-convex functions.
KW - Asynchronous computing
KW - Cluster-based model
KW - Distributed learning
KW - Multi-dimensional approximate agreement
KW - Stochastic gradient descent
UR - http://www.scopus.com/inward/record.url?scp=85161465908&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-30448-4_5
DO - 10.1007/978-3-031-30448-4_5
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85161465908
SN - 9783031304477
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 52
EP - 66
BT - Algorithms and Complexity - 13th International Conference, CIAC 2023, Proceedings
A2 - Mavronicolas, Marios
PB - Springer Science and Business Media Deutschland GmbH
T2 - 13th International Symposium on Algorithms and Complexity, CIAC 2023
Y2 - 13 June 2023 through 16 June 2023
ER -