Approximating and testing k-histogram distributions in sub-linear time

Piotr Indyk*, Reut Levi, Ronitt Rubinfeld

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

A discrete distribution p, over [n], is a k histogram if its probability distribution function can be represented as a piece-wise constant function with k pieces. Such a function is represented by a list of k intervals and k corresponding values. We consider the following problem: given a collection of samples from a distribution p, find a k-histogram that (approximately) minimizes the ℓ 2 distance to the distribution p. We give time and sample efficient algorithms for this problem. We further provide algorithms that distinguish distributions that have the property of being a k-histogram from distributions that are ε-far from any k-histogram in the ℓ 2 distance and ℓ 2 distance respectively.

Original languageEnglish
Title of host publicationPODS '12 - Proceedings of the 31st Symposium on Principles of Database Systems
Pages15-21
Number of pages7
DOIs
StatePublished - 2012
Event31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '12 - Scottsdale, AZ, United States
Duration: 21 May 201223 May 2012

Publication series

NameProceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

Conference

Conference31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '12
Country/TerritoryUnited States
CityScottsdale, AZ
Period21/05/1223/05/12

Keywords

  • distribution
  • histogram
  • property testing

Fingerprint

Dive into the research topics of 'Approximating and testing k-histogram distributions in sub-linear time'. Together they form a unique fingerprint.

Cite this