TY - JOUR
T1 - On the boosting ability of top-down decision tree learning algorithms
AU - Kearns, Michael
AU - Mansour, Yishay
N1 - Funding Information:
-Y. Mansour was supported in part by the Israel Science Foundation, administered by the Israel Academy of Science and Humanities, and by a grant of the Israeli Ministry of Science and Technology.
PY - 1999/2
Y1 - 1999/2
N2 - We analyze the performance of top-down algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are boosting algorithms. By this we mean that if the functions that label the internal nodes of the decision tree can weakly approximate the unknown target function, then the top-down algorithms we study will amplify this weaks advantage to build a tree achieving any desired level of accuracy. The bounds we obtain for this amplification show an interesting dependence on the splitting criterion used by the top-down algorithm. More precisely, if the functions used to label the internal nodes have error 1/2 - y as approximations to the target function, then for the splitting criteria used by CART and C4.5, trees of size (1/∈)O(1/y2∈2 and (1/∈)O(1/∈)/y2 (respectively) suffice to drive the error below ∈. Thus (for example), a small constant advantage over random guessing is amplified to any larger constant advantage with trees of constant size. For a new splitting criterion suggested by our analysis, the much stronger bound of (1/∈)O(1/y2) which is polynomial in 1/∈) is obtained, which is provably optimal for decision tree algorithms. The differing bounds have a natured explanation in terms of concavity properties of the splitting criterion. The primary contribution of this work is in proving that some popular and empirically successful heuristics that are base on first principles meet the criteria of an independently motivated theoretical model.
AB - We analyze the performance of top-down algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are boosting algorithms. By this we mean that if the functions that label the internal nodes of the decision tree can weakly approximate the unknown target function, then the top-down algorithms we study will amplify this weaks advantage to build a tree achieving any desired level of accuracy. The bounds we obtain for this amplification show an interesting dependence on the splitting criterion used by the top-down algorithm. More precisely, if the functions used to label the internal nodes have error 1/2 - y as approximations to the target function, then for the splitting criteria used by CART and C4.5, trees of size (1/∈)O(1/y2∈2 and (1/∈)O(1/∈)/y2 (respectively) suffice to drive the error below ∈. Thus (for example), a small constant advantage over random guessing is amplified to any larger constant advantage with trees of constant size. For a new splitting criterion suggested by our analysis, the much stronger bound of (1/∈)O(1/y2) which is polynomial in 1/∈) is obtained, which is provably optimal for decision tree algorithms. The differing bounds have a natured explanation in terms of concavity properties of the splitting criterion. The primary contribution of this work is in proving that some popular and empirically successful heuristics that are base on first principles meet the criteria of an independently motivated theoretical model.
UR - http://www.scopus.com/inward/record.url?scp=0033075132&partnerID=8YFLogxK
U2 - 10.1006/jcss.1997.1543
DO - 10.1006/jcss.1997.1543
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:0033075132
SN - 0022-0000
VL - 58
SP - 109
EP - 128
JO - Journal of Computer and System Sciences
JF - Journal of Computer and System Sciences
IS - 1
ER -