TY - JOUR
T1 - Ranking-based evaluation of regression models
AU - Rosset, Saharon
AU - Perlich, Claudia
AU - Zadrozny, Bianca
PY - 2007/8
Y1 - 2007/8
N2 - We suggest the use of ranking-based evaluation measures for regression models, as a complement to the commonly used residual-based evaluation. We argue that in some cases, such as the case study we present, ranking can be the main underlying goal in building a regression model, and ranking performance is the correct evaluation metric. However, even when ranking is not the contextually correct performance metric, the measures we explore still have significant advantages: They are robust against extreme outliers in the evaluation set; and they are interpretable. The two measures we consider correspond closely to non-parametric correlation coefficients commonly used in data analysis (Spearman's ρ and Kendall's τ); and they both have interesting graphical representations, which, similarly to ROC curves, offer useful various model performance views, in addition to a one-number summary in the area under the curve. An interesting extension which we explore is to evaluate models on their performance in "partially" ranking the data, which we argue can better represent the utility of the model in many cases. We illustrate our methods on a case study of evaluating IT Wallet size estimation models for IBM's customers.
AB - We suggest the use of ranking-based evaluation measures for regression models, as a complement to the commonly used residual-based evaluation. We argue that in some cases, such as the case study we present, ranking can be the main underlying goal in building a regression model, and ranking performance is the correct evaluation metric. However, even when ranking is not the contextually correct performance metric, the measures we explore still have significant advantages: They are robust against extreme outliers in the evaluation set; and they are interpretable. The two measures we consider correspond closely to non-parametric correlation coefficients commonly used in data analysis (Spearman's ρ and Kendall's τ); and they both have interesting graphical representations, which, similarly to ROC curves, offer useful various model performance views, in addition to a one-number summary in the area under the curve. An interesting extension which we explore is to evaluate models on their performance in "partially" ranking the data, which we argue can better represent the utility of the model in many cases. We illustrate our methods on a case study of evaluating IT Wallet size estimation models for IBM's customers.
KW - Evaluation robustness
KW - Model evaluation
KW - Performance visualization
KW - Ranking correlation
KW - Regression
UR - http://www.scopus.com/inward/record.url?scp=34547732000&partnerID=8YFLogxK
U2 - 10.1007/s10115-006-0037-3
DO - 10.1007/s10115-006-0037-3
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:34547732000
VL - 12
SP - 331
EP - 353
JO - Knowledge and Information Systems
JF - Knowledge and Information Systems
SN - 0219-1377
IS - 3
ER -