TY - JOUR

T1 - Finding Similar Users in Social Networks

AU - Nisgav, Aviv

AU - Patt-Shamir, Boaz

PY - 2011/11

Y1 - 2011/11

N2 - We consider a system where users wish to find similar users. To model similarity, we assume the existence of a set of queries, and two users are deemed similar if their answers to these queries are (mostly) identical. Technically, each user has a vector of preferences (answers to queries), and two users are similar if their preference vectors differ in only a few coordinates. The preferences are unknown to the system initially, and the goal of the algorithm is to classify the users into classes of roughly the same preferences by asking each user to answer the least possible number of queries. We prove nearly matching lower and upper bounds on the maximal number of queries required to solve the problem. Specifically, we present an "anytime" algorithm that asks each user at most one query in each round, while maintaining a partition of the users. The quality of the partition improves over time: for n users and time T, groups of Õ(n/T) users with the same preferences will be separated (with high probability) if they differ in sufficiently many queries. We present a lower bound that matches the upper bound, up to a constant factor, for nearly all possible distances between user groups.

AB - We consider a system where users wish to find similar users. To model similarity, we assume the existence of a set of queries, and two users are deemed similar if their answers to these queries are (mostly) identical. Technically, each user has a vector of preferences (answers to queries), and two users are similar if their preference vectors differ in only a few coordinates. The preferences are unknown to the system initially, and the goal of the algorithm is to classify the users into classes of roughly the same preferences by asking each user to answer the least possible number of queries. We prove nearly matching lower and upper bounds on the maximal number of queries required to solve the problem. Specifically, we present an "anytime" algorithm that asks each user at most one query in each round, while maintaining a partition of the users. The quality of the partition improves over time: for n users and time T, groups of Õ(n/T) users with the same preferences will be separated (with high probability) if they differ in sufficiently many queries. We present a lower bound that matches the upper bound, up to a constant factor, for nearly all possible distances between user groups.

KW - Collaborative filtering

KW - Market segmentation

KW - Randomized algorithms

KW - Recommendation systems

KW - User classification

UR - http://www.scopus.com/inward/record.url?scp=80053445700&partnerID=8YFLogxK

U2 - 10.1007/s00224-010-9307-2

DO - 10.1007/s00224-010-9307-2

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:80053445700

SN - 1432-4350

VL - 49

SP - 720

EP - 737

JO - Theory of Computing Systems

JF - Theory of Computing Systems

IS - 4

ER -