TY - JOUR
T1 - On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes
AU - Kolosov, Oleg
AU - Yadgar, Gala
AU - Liram, Matan
AU - Tamo, Itzhak
AU - Barg, Alexander
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/6
Y1 - 2020/6
N2 - Erasure codes in large-scale storage systems allow recovery of data from a failed node. A recently developed class of codes, locally repairable codes (LRCs), offers tradeoffs between storage overhead and repair cost. LRCs facilitate efficient recovery scenarios by adding parity blocks to the system. However, these additional blocks may eventually increase the number of blocks that must be reconstructed. Existing LRCs differ in their use of the parity blocks, in their locality semantics, and in their parameter space. Thus, existing theoretical models cannot directly compare different LRCs to determine which code offers the best recovery performance, and at what cost. We perform the first systematic comparison of existing LRC approaches. We analyze Xorbas, Azure's LRCs, and Optimal-LRCs in light of two new metrics: average degraded read cost and normalized repair cost. We show the tradeoff between these costs and the code's fault tolerance, and that different approaches offer different choices in this tradeoff. Our experimental evaluation on a Ceph cluster further demonstrates the different effects of realistic system bottlenecks on the benefit from each LRC approach. Despite these differences, the normalized repair cost metric can reliably identify the LRC approach that would achieve the lowest repair cost in each setup.
AB - Erasure codes in large-scale storage systems allow recovery of data from a failed node. A recently developed class of codes, locally repairable codes (LRCs), offers tradeoffs between storage overhead and repair cost. LRCs facilitate efficient recovery scenarios by adding parity blocks to the system. However, these additional blocks may eventually increase the number of blocks that must be reconstructed. Existing LRCs differ in their use of the parity blocks, in their locality semantics, and in their parameter space. Thus, existing theoretical models cannot directly compare different LRCs to determine which code offers the best recovery performance, and at what cost. We perform the first systematic comparison of existing LRC approaches. We analyze Xorbas, Azure's LRCs, and Optimal-LRCs in light of two new metrics: average degraded read cost and normalized repair cost. We show the tradeoff between these costs and the code's fault tolerance, and that different approaches offer different choices in this tradeoff. Our experimental evaluation on a Ceph cluster further demonstrates the different effects of realistic system bottlenecks on the benefit from each LRC approach. Despite these differences, the normalized repair cost metric can reliably identify the LRC approach that would achieve the lowest repair cost in each setup.
KW - Erasure codes
KW - local repair
UR - http://www.scopus.com/inward/record.url?scp=85086803310&partnerID=8YFLogxK
U2 - 10.1145/3381832
DO - 10.1145/3381832
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85086803310
SN - 1553-3077
VL - 16
JO - ACM Transactions on Storage
JF - ACM Transactions on Storage
IS - 2
M1 - 11
ER -