Cross-Validation for Correlated Data

Assaf Rabinowicz, Saharon Rosset*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

25 Scopus citations

Abstract

Abstract–K-fold cross-validation (CV) with squared error loss is widely used for evaluating predictive models, especially when strong distributional assumptions cannot be taken. However, CV with squared error loss is not free from distributional assumptions, in particular in cases involving non-iid data. This article analyzes CV for correlated data. We present a criterion for suitability of standard CV in presence of correlations. When this criterion does not hold, we introduce a bias corrected CV estimator which we term (Formula presented.) that yields an unbiased estimate of prediction error in many settings where standard CV is invalid. We also demonstrate our results numerically, and find that introducing our correction substantially improves both, model evaluation and model selection in simulations and real data studies. Supplementary materials for this article are available online.

Original languageEnglish
Pages (from-to)718-731
Number of pages14
JournalJournal of the American Statistical Association
Volume117
Issue number538
DOIs
StatePublished - 2022

Funding

FundersFunder number
Israel Science Foundation1804/16

    Keywords

    • Dependent data
    • Gaussian process regression
    • Linear mixed model
    • Model selection
    • Prediction error estimation

    Fingerprint

    Dive into the research topics of 'Cross-Validation for Correlated Data'. Together they form a unique fingerprint.

    Cite this