Spectral Bloom Filters

Saar Cohen*, Yossi Matias

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

349 Scopus citations

Abstract

A Bloom Filter is a space-efficient randomized data structure allowing membership queries over sets with certain allowable errors. It is widely used in many applications which take advantage of its ability to compactly represent a set, and filter out effectively any element that does not belong to the set, with small error probability. This paper introduces the Spectral Bloom Filter (SBF), an extension of the original Bloom Filter to multi-sets, allowing the filtering of elements whose multiplicities are below a threshold given at query time. Using memory only slightly larger than that of the original Bloom Filter, the SBF supports queries on the multiplicities of individual keys with a guaranteed, small error probability. The SBF also supports insertions and deletions over the data set. We present novel methods for reducing the probability and magnitude of errors. We also present an efficient data structure and algorithms to build it incrementally and maintain it over streaming data, as well as over materialized data with arbitrary insertions and deletions. The SBF does not assume any a priori filtering threshold and effectively and efficiently maintains information over the entire data-set, allowing for ad-hoc queries with arbitrary parameters and enabling a range of new applications.

Original languageEnglish
Pages (from-to)241-252
Number of pages12
JournalProceedings of the ACM SIGMOD International Conference on Management of Data
DOIs
StatePublished - 2003
Event2003 ACM SIGMOD International Conference on Management of Data - San Diego, CA, United States
Duration: 9 Jun 200312 Jun 2003

Fingerprint

Dive into the research topics of 'Spectral Bloom Filters'. Together they form a unique fingerprint.

Cite this