Sketching volume capacities in deduplicated storage

Danny Harnik, Moshik Hershcovitch, Yosef Shatsky, Amir Epstein, Ronen Kat

Research output: Contribution to journalArticlepeer-review


The adoption of deduplication in storage systems has introduced significant new challenges for storage management. Specifically, the physical capacities associated with volumes are no longer readily available. In this work, we introduce a new approach to analyzing capacities in deduplicated storage environments. We provide sketch-based estimations of fundamental capacity measures required for managing a storage system: How much physical space would be reclaimed if a volume or group of volumes were to be removed from a system (the reclaimable capacity) and how much of the physical space should be attributed to each of the volumes in the system (the attributed capacity). Our methods also support capacity queries for volume groups across multiple storage systems, e.g., how much capacity would a volume group consume after being migrated to another storage system? We provide analytical accuracy guarantees for our estimations as well as empirical evaluations. Our technology is integrated into a prominent all-flash storage array and exhibits high performance even for very large systems. We also demonstrate how this method opens the door for performing placement decisions at the data-center level and obtaining insights on deduplication in the field.

Original languageEnglish
Article numbera24
JournalACM Transactions on Storage
Issue number4
StatePublished - Dec 2019
Externally publishedYes


  • Capacity management
  • Deduplication
  • Estimation


Dive into the research topics of 'Sketching volume capacities in deduplicated storage'. Together they form a unique fingerprint.

Cite this