TY - JOUR
T1 - The complexity of approximating the entropy
AU - Batu, Tuǧkan
AU - Dasgupta, Sanjoy
AU - Kumar, Ravi
AU - Rubinfeld, Ronitt
PY - 2006
Y1 - 2006
N2 - We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy. In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this case, we show that a γ-multiplicative approximation to the entropy can be obtained in O(n 1+n)/γ2) log n) time for distributions with entropy Ω(γ/1≠), where n is the size of the domain of the distribution and n is an arbitrarily small positive constant. We show that this model does not permit a multiplicative approximation to the entropy in general. For the class of distributions to which our upper bound applies, we obtain a lower bound of Ω(n 1/2γ 2)). We next consider a combined oracle model in which the algorithm has access to both the generation and the evaluation oracles of the distribution. In this model, significantly greater efficiency can be achieved: we present an algorithm for γ-multiplicative approximation to the entropy that runs in O((γ 2 log 2 n)/(h 2(γ - 1) 2)) time for distributions with entropy Ω(h); for such distributions, we also show a lower bound of Ω((log n)/(h(γ 2 - 1) + γ 2)). Finally, we consider two special families of distributions: those in which the probabilities of the elements decrease monotonically with respect to a known ordering of the domain, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.
AB - We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy. In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this case, we show that a γ-multiplicative approximation to the entropy can be obtained in O(n 1+n)/γ2) log n) time for distributions with entropy Ω(γ/1≠), where n is the size of the domain of the distribution and n is an arbitrarily small positive constant. We show that this model does not permit a multiplicative approximation to the entropy in general. For the class of distributions to which our upper bound applies, we obtain a lower bound of Ω(n 1/2γ 2)). We next consider a combined oracle model in which the algorithm has access to both the generation and the evaluation oracles of the distribution. In this model, significantly greater efficiency can be achieved: we present an algorithm for γ-multiplicative approximation to the entropy that runs in O((γ 2 log 2 n)/(h 2(γ - 1) 2)) time for distributions with entropy Ω(h); for such distributions, we also show a lower bound of Ω((log n)/(h(γ 2 - 1) + γ 2)). Finally, we consider two special families of distributions: those in which the probabilities of the elements decrease monotonically with respect to a known ordering of the domain, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.
KW - Entropy estimation
KW - Properties of distributions
KW - Property testing
KW - Sublinear algorithms
UR - http://www.scopus.com/inward/record.url?scp=33644591301&partnerID=8YFLogxK
U2 - 10.1137/S0097539702403645
DO - 10.1137/S0097539702403645
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:33644591301
SN - 0097-5397
VL - 35
SP - 132
EP - 150
JO - SIAM Journal on Computing
JF - SIAM Journal on Computing
IS - 1
ER -