TY - GEN
T1 - Reporting neighbors in high-dimensional Euclidean space
AU - Aiger, Dror
AU - Kaplan, Haim
AU - Sharir, Micha
PY - 2013
Y1 - 2013
N2 - We consider the following problem, which arises in many database and web-based applications: Given a set P of n points in a high-dimensional space ℝd and a distance r, we want to report all pairs of points of P at Euclidean distance at most r. We present two randomized algorithms, one based on randomly shifted grids, and the other on randomly shifted and rotated grids. The running time of both algorithms is of the form C(d)(n + k) log n, where k is the output size and C(d) is a constant that depends on the dimension d. The logn factor is needed to guarantee, with high probability, that all neighbor pairs are reported, and can be dropped if it suffices to report, in expectation, an arbitrarily large fraction of the pairs. When only translations are used, C(d) is of the form (a√d)d, for some (small) absolute constant a ≈ 0.484; this bound is worst-case tight, up to an exponential factor of about 2d. When both rotations and translations are used, C(d) can be improved to roughly 6.74d, getting rid of the super-exponential factor √dd. When the input set (lies in a subset of d-space that) has low doubling dimension δ, the performance of the first algorithm improves to C(d,δ)(n + k) log n (or to C(d, δ)(n + k)), where C(d, δ) = O((ed/δ)δ), for δ ≤ √d. Otherwise, C(d, δ) = O (e√d√δ. We also present experimental results on several large datasets, demonstrating that our algorithms run significantly faster than all the leading existing algorithms for reporting neighbors.
AB - We consider the following problem, which arises in many database and web-based applications: Given a set P of n points in a high-dimensional space ℝd and a distance r, we want to report all pairs of points of P at Euclidean distance at most r. We present two randomized algorithms, one based on randomly shifted grids, and the other on randomly shifted and rotated grids. The running time of both algorithms is of the form C(d)(n + k) log n, where k is the output size and C(d) is a constant that depends on the dimension d. The logn factor is needed to guarantee, with high probability, that all neighbor pairs are reported, and can be dropped if it suffices to report, in expectation, an arbitrarily large fraction of the pairs. When only translations are used, C(d) is of the form (a√d)d, for some (small) absolute constant a ≈ 0.484; this bound is worst-case tight, up to an exponential factor of about 2d. When both rotations and translations are used, C(d) can be improved to roughly 6.74d, getting rid of the super-exponential factor √dd. When the input set (lies in a subset of d-space that) has low doubling dimension δ, the performance of the first algorithm improves to C(d,δ)(n + k) log n (or to C(d, δ)(n + k)), where C(d, δ) = O((ed/δ)δ), for δ ≤ √d. Otherwise, C(d, δ) = O (e√d√δ. We also present experimental results on several large datasets, demonstrating that our algorithms run significantly faster than all the leading existing algorithms for reporting neighbors.
UR - http://www.scopus.com/inward/record.url?scp=84876048939&partnerID=8YFLogxK
U2 - 10.1137/1.9781611973105.56
DO - 10.1137/1.9781611973105.56
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84876048939
SN - 9781611972511
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 784
EP - 803
BT - Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013
PB - Association for Computing Machinery
T2 - 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013
Y2 - 6 January 2013 through 8 January 2013
ER -