## Abstract

A Bottom-k sketch is a summary of a set of items with nonnegative weights that supports approximate query processing. A sketch is obtained by associating with each item in a ground set an independent random rank drawn from a probability distribution that depends on the weight of the item and including the k items with smallest rank value. Bottom-k sketches are an alternative to k-mins sketches [9], which consist of the k minimum ranked items in k independent rank assignments, and of min-hash [5] sketches, where hash functions replace random rank assignments. Sketches support approximate aggregations, including weight and selectivity of a subpopulation. Coordinated sketches of multiple subsets over the same ground set support subset-relation queries such as Jaccard similarity or the weight of the union. All-distances sketches are applicable for datasets where items lie in some metric space such as data streams (time) or networks. These sketches compactly encode the respective plain sketches of all neighborhoods of a location. These sketches support queries posed over time windows or neighborhoods and time/spatially decaying aggregates. An important advantage of bottom-k sketches, established in a line of recent work, is much tighter estimators for several basic aggregates. To materialize this benet, we must adapt traditional k-mins applications to use bottom-k sketches. We propose all-distances bottom-k sketches and develop and analyze data structures that incrementally construct bottom-k sketches and alldistances bottom-k sketches. Another advantage of bottom-k sketches is that when the data is represented explicitly, they can be obtained much more efciently than k-mins sketches. We show that k-mins sketches can be derived from respective bottom-k sketches, which enables the use of bottom-k sketches with off-the-shelf k-mins estimators. (In fact, we obtain tighter estimators since each bottom-k sketch is a distribution over k-mins sketches).

Original language | English |
---|---|

Title of host publication | PODC'07 |

Subtitle of host publication | Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing |

Pages | 225-234 |

Number of pages | 10 |

DOIs | |

State | Published - 2007 |

Event | PODC'07: 26th Annual ACM Symposium on Principles of Distributed Computing - Portland, OR, United States Duration: 12 Aug 2007 → 15 Aug 2007 |

### Publication series

Name | Proceedings of the Annual ACM Symposium on Principles of Distributed Computing |
---|

### Conference

Conference | PODC'07: 26th Annual ACM Symposium on Principles of Distributed Computing |
---|---|

Country/Territory | United States |

City | Portland, OR |

Period | 12/08/07 → 15/08/07 |

## Keywords

- All-distances sketches
- Bottom-k sketches
- Data streams