TY - GEN

T1 - The Communication Complexity of Set Intersection Under Product Distributions

AU - Oshman, Rotem

AU - Roth, Tal

N1 - Publisher Copyright:
© Rotem Oshman and Tal Roth.

PY - 2023/7

Y1 - 2023/7

N2 - We consider a multiparty setting where k parties have private inputs X1, . . ., Xk ⊆ [n] and wish to compute the intersectionTkℓ=1 Xℓ of their sets, using as little communication as possible. This task generalizes the well-known problem of set disjointness, where the parties are required only to determine whether the intersection is empty or not. In the worst-case, it is known that the communication complexity of finding the intersection is the same as that of solving set disjointness, regardless of the size of the intersection: the cost of both problems is Ω (n log k + k) bits in the shared blackboard model, and Ω (nk) bits in the coordinator model. In this work we consider a realistic setting where the parties’ inputs are independent of one another, that is, the input is drawn from a product distribution. We show that this makes finding the intersection significantly easier than in the worst-case: only Θ̃((n1−1/k (H(S) + 1)1/k) + k) bits of communication are required, where H(S) is the Shannon entropy of the intersection S. We also show that the parties do not need to know the exact underlying input distribution; if we are given in advance O(n1/k) samples from the underlying distribution µ, we can learn enough about µ to allow us to compute the intersection of an input drawn from µ using expected communication Θ̃((n1−1/k E[|S|]1/k) + k), where |S| is the size of the intersection.

AB - We consider a multiparty setting where k parties have private inputs X1, . . ., Xk ⊆ [n] and wish to compute the intersectionTkℓ=1 Xℓ of their sets, using as little communication as possible. This task generalizes the well-known problem of set disjointness, where the parties are required only to determine whether the intersection is empty or not. In the worst-case, it is known that the communication complexity of finding the intersection is the same as that of solving set disjointness, regardless of the size of the intersection: the cost of both problems is Ω (n log k + k) bits in the shared blackboard model, and Ω (nk) bits in the coordinator model. In this work we consider a realistic setting where the parties’ inputs are independent of one another, that is, the input is drawn from a product distribution. We show that this makes finding the intersection significantly easier than in the worst-case: only Θ̃((n1−1/k (H(S) + 1)1/k) + k) bits of communication are required, where H(S) is the Shannon entropy of the intersection S. We also show that the parties do not need to know the exact underlying input distribution; if we are given in advance O(n1/k) samples from the underlying distribution µ, we can learn enough about µ to allow us to compute the intersection of an input drawn from µ using expected communication Θ̃((n1−1/k E[|S|]1/k) + k), where |S| is the size of the intersection.

KW - Communication complexity

KW - intersection

KW - set disjointness

UR - http://www.scopus.com/inward/record.url?scp=85167350571&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.ICALP.2023.95

DO - 10.4230/LIPIcs.ICALP.2023.95

M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???

AN - SCOPUS:85167350571

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 50th International Colloquium on Automata, Languages, and Programming, ICALP 2023

A2 - Etessami, Kousha

A2 - Feige, Uriel

A2 - Puppis, Gabriele

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

T2 - 50th International Colloquium on Automata, Languages, and Programming, ICALP 2023

Y2 - 10 July 2023 through 14 July 2023

ER -