Practical data-dependent metric compression with provable guarantees

Piotr Indyk, Ilya Razenshteyn, Tal Wagner

Research output: Contribution to journalConference articlepeer-review

9 Scopus citations

Abstract

We introduce a new distance-preserving compact representation of multidimensional point-sets. Given n points in a d-dimensional space where each coordinate is represented using B bits (i.e., dB bits per point), it produces a representation of size O(dlog(dB/e) + logn) bits per point from which one can approximate the distances up to a factor of 1 ± e. Our algorithm almost matches the recent bound of [6] while being much simpler. We compare our algorithm to Product Quantization (PQ) [7], a state of the art heuristic metric compression method. We evaluate both algorithms on several data sets: SIFT (used in [7]), MNIST [11], New York City taxi time series [4] and a synthetic one-dimensional data set embedded in a high-dimensional space. With appropriately tuned parameters, our algorithm produces representations that are comparable to or better than those produced by PQ, while having provable guarantees on its performance.

Original languageEnglish
Pages (from-to)2618-2627
Number of pages10
JournalAdvances in Neural Information Processing Systems
Volume2017-December
StatePublished - 2017
Externally publishedYes
Event31st Annual Conference on Neural Information Processing Systems, NIPS 2017 - Long Beach, United States
Duration: 4 Dec 20179 Dec 2017

Fingerprint

Dive into the research topics of 'Practical data-dependent metric compression with provable guarantees'. Together they form a unique fingerprint.

Cite this