TY - JOUR
T1 - An Experimental and Theoretical Comparison of Model Selection Methods
AU - Kearns, Michael
AU - Mansour, Yishay
AU - Ng, Andrew Y.
AU - Ron, Dana
N1 - Funding Information:
We give warm thanks to Yoav Freund and Ronitt Rubinfeld for their collaboration on various portions of the work presented here, and for their insightful comments. Thanks to Sebastian Seung and Vladimir Vapnik for interesting and helpful conversations. Y. Mansour would like to acknowledge the support of The Israel Science Foundation administered by The Israel Academy of Science and Humanities and a grant of the Israeli Ministry of Science and Technology. D. Ron would like to acknowledge the support of the Eshkol Fellowship and the National Science Foundation Postdoctoral Research Fellowship.
PY - 1997
Y1 - 1997
N2 - We investigate the problem of model selection in the setting of supervised learning of boolean functions from independent random examples. More precisely, we compare methods for finding a balance between the complexity of the hypothesis chosen and its observed error on a random training sample of limited size, when the goal is that of minimizing the resulting generalization error. We undertake a detailed comparison of three well-known model selection methods - a variation of Vapnik's Guaranteed Risk Minimization (GRM), an instance of Rissanen's Minimum Description Length Principle (MDL), and (hold-out) cross validation (CV). We introduce a general class of model selection methods (called penalty-based methods) that includes both GRM and MDL, and provide general methods for analyzing such rules. We provide both controlled experimental evidence and formal theorems to support the following conclusions: • Even on simple model selection problems, the behavior of the methods examined can be both complex and incomparable. Furthermore, no amount of "tuning" of the rules investigated (such as introducing constant multipliers on the complexity penalty terms, or a distribution-specific "effective dimension") can eliminate this incomparability. • It is possible to give rather general bounds on the generalization error, as a function of sample size, for penalty-based methods. The quality of such bounds depends in a precise way on the extent to which the method considered automatically limits the complexity of the hypothesis selected. • For any model selection problem, the additional error of cross validation compared to any other method can be bounded above by the sum of two terms. The first term is large only if the learning curve of the underlying function classes experiences a "phase transition" between (1 - γ)m and m examples (where γ is the fraction saved for testing in CV). The second and competing term can be made arbitrarily small by increasing γ. • The class of penalty-based methods is fundamentally handicapped in the sense that there exist two types of model selection problems for which every penalty-based method must incur large generalization error on at least one, while CV enjoys small generalization error on both.
AB - We investigate the problem of model selection in the setting of supervised learning of boolean functions from independent random examples. More precisely, we compare methods for finding a balance between the complexity of the hypothesis chosen and its observed error on a random training sample of limited size, when the goal is that of minimizing the resulting generalization error. We undertake a detailed comparison of three well-known model selection methods - a variation of Vapnik's Guaranteed Risk Minimization (GRM), an instance of Rissanen's Minimum Description Length Principle (MDL), and (hold-out) cross validation (CV). We introduce a general class of model selection methods (called penalty-based methods) that includes both GRM and MDL, and provide general methods for analyzing such rules. We provide both controlled experimental evidence and formal theorems to support the following conclusions: • Even on simple model selection problems, the behavior of the methods examined can be both complex and incomparable. Furthermore, no amount of "tuning" of the rules investigated (such as introducing constant multipliers on the complexity penalty terms, or a distribution-specific "effective dimension") can eliminate this incomparability. • It is possible to give rather general bounds on the generalization error, as a function of sample size, for penalty-based methods. The quality of such bounds depends in a precise way on the extent to which the method considered automatically limits the complexity of the hypothesis selected. • For any model selection problem, the additional error of cross validation compared to any other method can be bounded above by the sum of two terms. The first term is large only if the learning curve of the underlying function classes experiences a "phase transition" between (1 - γ)m and m examples (where γ is the fraction saved for testing in CV). The second and competing term can be made arbitrarily small by increasing γ. • The class of penalty-based methods is fundamentally handicapped in the sense that there exist two types of model selection problems for which every penalty-based method must incur large generalization error on at least one, while CV enjoys small generalization error on both.
KW - Complexity regularization
KW - Cross validation
KW - Minimum description length principle
KW - Model selection
KW - Structural risk minimization
KW - vc dimension
UR - http://www.scopus.com/inward/record.url?scp=0031122049&partnerID=8YFLogxK
U2 - 10.1023/A:1007344726582
DO - 10.1023/A:1007344726582
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.systematicreview???
AN - SCOPUS:0031122049
SN - 0885-6125
VL - 27
SP - 7
EP - 50
JO - Machine Learning
JF - Machine Learning
IS - 1
ER -