TY - GEN

T1 - Compressing and Teaching for Low VC-Dimension

AU - Moran, Shay

AU - Shpilka, Amir

AU - Wigderson, Avi

AU - Yehudayoff, Amir

N1 - Publisher Copyright:
© 2015 IEEE.

PY - 2015/12/11

Y1 - 2015/12/11

N2 - In this work we study the quantitative relation between VC-dimension and two other basic parameters related to learning and teaching. Namely, the quality of sample compression schemes and of teaching sets for classes of low VC-dimension. Let C be a binary concept class of size m and VC-dimension d. Prior to this work, the best known upper bounds for both parameters were log(m), while the best lower bounds are linear in d. We present significantly better upper bounds on both as follows. We construct sample compression schemes of size exp(d) for C. This resolves a question of Littlest one and Warmuth (1986). Roughly speaking, we show that given an arbitrary set of labeled examples from an unknown concept in C, one can retain only a subset of exp(d) of them, in a way that allows to recover the labels of all other examples in the set, using additional exp(d) information bits. We further show that there always exists a concept c in C with a teaching set (i.e. A list of c-labeled examples uniquely identifying c in C) of size exp(d) log log(m). This problem was studied by Kuhlmann (1999). Our construction also implies that the recursive teaching (RT) dimension of C is at most exp(d) log log(m) as well. The RT-dimension was suggested by Zilles et al. And Doliwa et al. (2010). The same notion (under the name partial-ID width) was independently studied by Wigderson and Yehuday off (2013). An upper bound on this parameter that depends only on d is known just for the very simple case d=1, and is open even for d=2. We also make small progress towards this seemingly modest goal.

AB - In this work we study the quantitative relation between VC-dimension and two other basic parameters related to learning and teaching. Namely, the quality of sample compression schemes and of teaching sets for classes of low VC-dimension. Let C be a binary concept class of size m and VC-dimension d. Prior to this work, the best known upper bounds for both parameters were log(m), while the best lower bounds are linear in d. We present significantly better upper bounds on both as follows. We construct sample compression schemes of size exp(d) for C. This resolves a question of Littlest one and Warmuth (1986). Roughly speaking, we show that given an arbitrary set of labeled examples from an unknown concept in C, one can retain only a subset of exp(d) of them, in a way that allows to recover the labels of all other examples in the set, using additional exp(d) information bits. We further show that there always exists a concept c in C with a teaching set (i.e. A list of c-labeled examples uniquely identifying c in C) of size exp(d) log log(m). This problem was studied by Kuhlmann (1999). Our construction also implies that the recursive teaching (RT) dimension of C is at most exp(d) log log(m) as well. The RT-dimension was suggested by Zilles et al. And Doliwa et al. (2010). The same notion (under the name partial-ID width) was independently studied by Wigderson and Yehuday off (2013). An upper bound on this parameter that depends only on d is known just for the very simple case d=1, and is open even for d=2. We also make small progress towards this seemingly modest goal.

KW - PAC learning

KW - VC dimension

KW - recursive teaching dimension

KW - sample compression schemes

UR - http://www.scopus.com/inward/record.url?scp=84960380269&partnerID=8YFLogxK

U2 - 10.1109/FOCS.2015.12

DO - 10.1109/FOCS.2015.12

M3 - פרסום בספר כנס

AN - SCOPUS:84960380269

T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS

SP - 40

EP - 51

BT - Proceedings - 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015

PB - IEEE Computer Society

Y2 - 17 October 2015 through 20 October 2015

ER -