Efficiently Archiving Photos under Storage Constraints

Susan B. Davidson, Shay Gershtein, Tova Milo, Slava Novgorodov, May Shoshan

Research output: Contribution to journalConference articlepeer-review


Our ability to collect data is rapidly outstripping our ability to effectively store and use it. Organizations are therefore facing tough decisions of what data to archive (or dispose of) to effectively meet their business goals. We address this general problem in the context of image data (photos) by proposing which photos to archive to meet an online storage budget. The decision is based on factors such as usage patterns and their relative importance, the quality and size of a photo, the relevance of a photo for a usage pattern, the similarity between different photos, as well as policy requirements of what photos must be retained. We formalize the photo archival problem, analyze its complexity, and give two approximation algorithms. One algorithm comes with an optimal approximation guarantee and another, more scalable, algorithm that comes with both worst-case and data-dependent guarantees. Based on these algorithms we implemented an end-to-end system, PHOcus, and discuss how to automatically derive the inputs for this system in many settings. An extensive experimental study based on public as well as private datasets demonstrates the effectiveness and efficiency of PHOcus. Furthermore, a user study using business analysts in a real e-commerce application shows that it can save a tremendous amount of human effort and yield unexpected insights.

Original languageEnglish
Pages (from-to)591-603
Number of pages13
JournalAdvances in Database Technology - EDBT
Issue number3
StatePublished - 20 Mar 2023
Event26th International Conference on Extending Database Technology, EDBT 2023 - Ioannina, Greece
Duration: 28 Mar 202331 Mar 2023


Dive into the research topics of 'Efficiently Archiving Photos under Storage Constraints'. Together they form a unique fingerprint.

Cite this