Abstract
We consider the (1 + ∈)-approximate nearest neighbor search problem: given a set X of n points in a d-dimensional space, build a data structure that, given any query point y, finds a point x ∈ X whose distance to y is at most (1 + ∈) minx∈X kx − yk for an accuracy parameter ∈ (0, 1). Our main result is a data structure that occupies only O(∈−2n log(n) log(1/∈)) bits of space, assuming all point coordinates are integers in the range {−nO(1) . . . nO(1)}, i.e., the coordinates have O(log n) bits of precision. This improves over the best previously known space bound of O(∈−2n log(n)2), obtained via the randomized dimensionality reduction method of Johnson and Lindenstrauss (1984). We also consider the more general problem of estimating all distances from a collection of query points to all data points X, and provide almost tight upper and lower bounds for the space complexity of this problem.
Original language | English |
---|---|
Pages (from-to) | 2012-2036 |
Number of pages | 25 |
Journal | Proceedings of Machine Learning Research |
Volume | 75 |
State | Published - 2018 |
Externally published | Yes |
Event | 31st Annual Conference on Learning Theory, COLT 2018 - Stockholm, Sweden Duration: 6 Jul 2018 → 9 Jul 2018 |
Keywords
- dimension reduction
- distance estimation
- distance sketches
- metric compression
- nearest neighbor
- quantization