TY - GEN
T1 - Almost Optimal Bounds for Sublinear-Time Sampling of k-Cliques in Bounded Arboricity Graphs
AU - Eden, Talya
AU - Ron, Dana
AU - Rosenbaum, Will
N1 - Publisher Copyright:
© Talya Eden, Dana Ron, and Will Rosenbaum; licensed under Creative Commons License CC-BY 4.0
PY - 2022/7/1
Y1 - 2022/7/1
N2 - Counting and sampling small subgraphs are fundamental algorithmic tasks. Motivated by the need to handle massive datasets efficiently, recent theoretical work has examined the problems in the sublinear time regime. In this work, we consider the problem of sampling a k-clique in a graph from an almost uniform distribution. Specifically the algorithm should output each k-clique with probability (1 ± ϵ)/nk, where nk denotes the number of k-cliques in the graph and ϵ is a given approximation parameter. To this end, the algorithm may perform degree, neighbor, and pair queries. We focus on the class of graphs with arboricity at most α, and prove that the query complexity of the problem is (Equation presented) where n is the number of vertices in the graph, and Θ∗(·) suppresses dependencies on (log n/ϵ)O(k). Our upper bound is based on defining a special auxiliary graph Hk, such that sampling edges almost uniformly in Hk translates to sampling k-cliques almost uniformly in the original graph G. We then build on a known edge-sampling algorithm (Eden, Ron and Rosenbaum, ICALP19) to sample edges in Hk. The challenge is simulating queries to Hk while being given query access only to G. Our lower bound follows from a construction of a family of graphs with arboricity α such that in each graph there are nk k-cliques, where one of these cliques is “hidden” and hence hard to sample.
AB - Counting and sampling small subgraphs are fundamental algorithmic tasks. Motivated by the need to handle massive datasets efficiently, recent theoretical work has examined the problems in the sublinear time regime. In this work, we consider the problem of sampling a k-clique in a graph from an almost uniform distribution. Specifically the algorithm should output each k-clique with probability (1 ± ϵ)/nk, where nk denotes the number of k-cliques in the graph and ϵ is a given approximation parameter. To this end, the algorithm may perform degree, neighbor, and pair queries. We focus on the class of graphs with arboricity at most α, and prove that the query complexity of the problem is (Equation presented) where n is the number of vertices in the graph, and Θ∗(·) suppresses dependencies on (log n/ϵ)O(k). Our upper bound is based on defining a special auxiliary graph Hk, such that sampling edges almost uniformly in Hk translates to sampling k-cliques almost uniformly in the original graph G. We then build on a known edge-sampling algorithm (Eden, Ron and Rosenbaum, ICALP19) to sample edges in Hk. The challenge is simulating queries to Hk while being given query access only to G. Our lower bound follows from a construction of a family of graphs with arboricity α such that in each graph there are nk k-cliques, where one of these cliques is “hidden” and hence hard to sample.
KW - arboricity
KW - cliques
KW - graph algorithms
KW - sublinear time algorithms
KW - uniform sampling
UR - http://www.scopus.com/inward/record.url?scp=85133416157&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.ICALP.2022.56
DO - 10.4230/LIPIcs.ICALP.2022.56
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85133416157
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022
A2 - Bojanczyk, Mikolaj
A2 - Merelli, Emanuela
A2 - Woodruff, David P.
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022
Y2 - 4 July 2022 through 8 July 2022
ER -