TY - JOUR
T1 - Cross-Validation for Correlated Data
AU - Rabinowicz, Assaf
AU - Rosset, Saharon
N1 - Publisher Copyright:
© 2020 American Statistical Association.
PY - 2022
Y1 - 2022
N2 - Abstract–K-fold cross-validation (CV) with squared error loss is widely used for evaluating predictive models, especially when strong distributional assumptions cannot be taken. However, CV with squared error loss is not free from distributional assumptions, in particular in cases involving non-iid data. This article analyzes CV for correlated data. We present a criterion for suitability of standard CV in presence of correlations. When this criterion does not hold, we introduce a bias corrected CV estimator which we term (Formula presented.) that yields an unbiased estimate of prediction error in many settings where standard CV is invalid. We also demonstrate our results numerically, and find that introducing our correction substantially improves both, model evaluation and model selection in simulations and real data studies. Supplementary materials for this article are available online.
AB - Abstract–K-fold cross-validation (CV) with squared error loss is widely used for evaluating predictive models, especially when strong distributional assumptions cannot be taken. However, CV with squared error loss is not free from distributional assumptions, in particular in cases involving non-iid data. This article analyzes CV for correlated data. We present a criterion for suitability of standard CV in presence of correlations. When this criterion does not hold, we introduce a bias corrected CV estimator which we term (Formula presented.) that yields an unbiased estimate of prediction error in many settings where standard CV is invalid. We also demonstrate our results numerically, and find that introducing our correction substantially improves both, model evaluation and model selection in simulations and real data studies. Supplementary materials for this article are available online.
KW - Dependent data
KW - Gaussian process regression
KW - Linear mixed model
KW - Model selection
KW - Prediction error estimation
UR - http://www.scopus.com/inward/record.url?scp=85090976022&partnerID=8YFLogxK
U2 - 10.1080/01621459.2020.1801451
DO - 10.1080/01621459.2020.1801451
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85090976022
SN - 0162-1459
VL - 117
SP - 718
EP - 731
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 538
ER -