TY - JOUR
T1 - Better-than-chance classification for signal detection
AU - Rosenblatt, Jonathan D.
AU - Benjamini, Yuval
AU - Gilron, Roee
AU - Mukamel, Roy
AU - Goeman, Jelle J.
N1 - Publisher Copyright:
© The Author 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].
PY - 2021/4/10
Y1 - 2021/4/10
N2 - The estimated accuracy of a classifier is a random quantity with variability. A common practice in supervised machine learning, is thus to test if the estimated accuracy is significantly better than chance level. This method of signal detection is particularly popular in neuroimaging and genetics. We provide evidence that using a classifier's accuracy as a test statistic can be an underpowered strategy for finding differences between populations, compared to a bona fide statistical test. It is also computationally more demanding than a statistical test. Via simulation, we compare test statistics that are based on classification accuracy, to others based on multivariate test statistics. We find that the probability of detecting differences between two distributions is lower for accuracy-based statistics. We examine several candidate causes for the low power of accuracy-tests. These causes include: the discrete nature of the accuracy-test statistic, the type of signal accuracy-tests are designed to detect, their inefficient use of the data, and their suboptimal regularization. When the purpose of the analysis is the evaluation of a particular classifier, not signal detection, we suggest several improvements to increase power. In particular, to replace V-fold cross-validation with the Leave-One-Out Bootstrap.
AB - The estimated accuracy of a classifier is a random quantity with variability. A common practice in supervised machine learning, is thus to test if the estimated accuracy is significantly better than chance level. This method of signal detection is particularly popular in neuroimaging and genetics. We provide evidence that using a classifier's accuracy as a test statistic can be an underpowered strategy for finding differences between populations, compared to a bona fide statistical test. It is also computationally more demanding than a statistical test. Via simulation, we compare test statistics that are based on classification accuracy, to others based on multivariate test statistics. We find that the probability of detecting differences between two distributions is lower for accuracy-based statistics. We examine several candidate causes for the low power of accuracy-tests. These causes include: the discrete nature of the accuracy-test statistic, the type of signal accuracy-tests are designed to detect, their inefficient use of the data, and their suboptimal regularization. When the purpose of the analysis is the evaluation of a particular classifier, not signal detection, we suggest several improvements to increase power. In particular, to replace V-fold cross-validation with the Leave-One-Out Bootstrap.
KW - High dimension
KW - Multivariate testing
KW - Neuroimaging
KW - Statistical genetics
KW - Supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85104209820&partnerID=8YFLogxK
U2 - 10.1093/biostatistics/kxz035
DO - 10.1093/biostatistics/kxz035
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 31612223
AN - SCOPUS:85104209820
SN - 1465-4644
VL - 22
SP - 365
EP - 380
JO - Biostatistics
JF - Biostatistics
IS - 2
ER -