TY - GEN

T1 - Locality sensitive hashing for set-queries, motivated by group recommendations

AU - Kaplan, Haim

AU - Tenenbaum, Jay

N1 - Publisher Copyright:
© Haim Kaplan and Jay Tenenbaum; licensed under Creative Commons License CC-BY

PY - 2020/6/1

Y1 - 2020/6/1

N2 - Locality Sensitive Hashing (LSH) is an effective method to index a set of points such that we can efficiently find the nearest neighbors of a query point. We extend this method to our novel Set-query LSH (SLSH), such that it can find the nearest neighbors of a set of points, given as a query. Let s(x, y) be the similarity between two points x and y. We define a similarity between a set Q and a point x by aggregating the similarities s(p, x) for all p ∈ Q. For example, we can take s(p, x) to be the angular similarity between p and x (i.e., 1 − z(x,p) ), and aggregate by arithmetic π or geometric averaging, or taking the lowest similarity. We develop locality sensitive hash families and data structures for a large set of such arithmetic and geometric averaging similarities, and analyze their collision probabilities. We also establish an analogous framework and hash families for distance functions. Specifically, we give a structure for the euclidean distance aggregated by either averaging or taking the maximum. We leverage SLSH to solve a geometric extension of the approximate near neighbors problem. In this version, we consider a metric for which the unit ball is an ellipsoid and its orientation is specified with the query. An important application that motivates our work is group recommendation systems. Such a system embeds movies and users in the same feature space, and the task of recommending a movie for a group to watch together, translates to a set-query Q using an appropriate similarity.

AB - Locality Sensitive Hashing (LSH) is an effective method to index a set of points such that we can efficiently find the nearest neighbors of a query point. We extend this method to our novel Set-query LSH (SLSH), such that it can find the nearest neighbors of a set of points, given as a query. Let s(x, y) be the similarity between two points x and y. We define a similarity between a set Q and a point x by aggregating the similarities s(p, x) for all p ∈ Q. For example, we can take s(p, x) to be the angular similarity between p and x (i.e., 1 − z(x,p) ), and aggregate by arithmetic π or geometric averaging, or taking the lowest similarity. We develop locality sensitive hash families and data structures for a large set of such arithmetic and geometric averaging similarities, and analyze their collision probabilities. We also establish an analogous framework and hash families for distance functions. Specifically, we give a structure for the euclidean distance aggregated by either averaging or taking the maximum. We leverage SLSH to solve a geometric extension of the approximate near neighbors problem. In this version, we consider a metric for which the unit ball is an ellipsoid and its orientation is specified with the query. An important application that motivates our work is group recommendation systems. Such a system embeds movies and users in the same feature space, and the task of recommending a movie for a group to watch together, translates to a set-query Q using an appropriate similarity.

KW - Distance functions

KW - Ellipsoid

KW - Group recommendations

KW - Locality sensitive hashing

KW - Nearest neighbors

KW - Similarity functions

KW - Similarity search

UR - http://www.scopus.com/inward/record.url?scp=85090382195&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.SWAT.2020.28

DO - 10.4230/LIPIcs.SWAT.2020.28

M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???

AN - SCOPUS:85090382195

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 17th Scandinavian Symposium and Workshops on Algorithm Theory, SWAT 2020

A2 - Albers, Susanne

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

T2 - 17th Scandinavian Symposium and Workshops on Algorithm Theory, SWAT 2020

Y2 - 22 June 2020 through 24 June 2020

ER -