TY - JOUR

T1 - The minimum-entropy set cover problem

AU - Halperin, Eran

AU - Karp, Richard M.

N1 - Funding Information:
∗ Corresponding author. Tel.: +1 510 666 2973; fax: +1 510 666 2956. E-mail addresses: heran@cs.princeton.edu (E. Halperin), karp@icsi.berkeley.edu (R.M. Karp). 1 Some of this work was done while the author was in UC Berkeley and ICSI, Berkeley, CA. The research was partly supported by NSF ITR Grant CCR-0121555.

PY - 2005/12/8

Y1 - 2005/12/8

N2 - We consider the minimum entropy principle for learning data generated by a random source and observed with random noise. In our setting we have a sequence of observations of objects drawn uniformly at random from a population. Each object in the population belongs to one class. We perform an observation for each object which determines that it belongs to one of a given set of classes. Given these observations, we are interested in assigning the most likely class to each of the objects. This scenario is a very natural one that appears in many real life situations. We show that under reasonable assumptions finding the most likely assignment is equivalent to the following variant of the set cover problem. Given a universe U and a collection S=(S1,...,St) of subsets of U, we wish to find an assignment f : U→S such that u∈f(u) and the entropy of the distribution defined by the values |f-1(Si)| is minimized. We show that this problem is NP-hard and that the greedy algorithm for set cover s with an additive constant error with respect to the optimal cover. This sheds a new light on the behavior of the greedy set cover algorithm. We further enhance the greedy algorithm and show that the problem admits a polynomial time approximation scheme (PTAS). Finally, we demonstrate how this model and the greedy algorithm can be useful in real life scenarios, and in particular, in problems arising naturally in computational biology.

AB - We consider the minimum entropy principle for learning data generated by a random source and observed with random noise. In our setting we have a sequence of observations of objects drawn uniformly at random from a population. Each object in the population belongs to one class. We perform an observation for each object which determines that it belongs to one of a given set of classes. Given these observations, we are interested in assigning the most likely class to each of the objects. This scenario is a very natural one that appears in many real life situations. We show that under reasonable assumptions finding the most likely assignment is equivalent to the following variant of the set cover problem. Given a universe U and a collection S=(S1,...,St) of subsets of U, we wish to find an assignment f : U→S such that u∈f(u) and the entropy of the distribution defined by the values |f-1(Si)| is minimized. We show that this problem is NP-hard and that the greedy algorithm for set cover s with an additive constant error with respect to the optimal cover. This sheds a new light on the behavior of the greedy set cover algorithm. We further enhance the greedy algorithm and show that the problem admits a polynomial time approximation scheme (PTAS). Finally, we demonstrate how this model and the greedy algorithm can be useful in real life scenarios, and in particular, in problems arising naturally in computational biology.

UR - http://www.scopus.com/inward/record.url?scp=33644555811&partnerID=8YFLogxK

U2 - 10.1016/j.tcs.2005.09.015

DO - 10.1016/j.tcs.2005.09.015

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:33644555811

VL - 348

SP - 240

EP - 250

JO - Theoretical Computer Science

JF - Theoretical Computer Science

SN - 0304-3975

IS - 2-3 SPEC. ISS.

ER -