TY - JOUR
T1 - Many Phenotypes Without Many False Discoveries
T2 - Error Controlling Strategies for Multitrait Association Studies
AU - Peterson, Christine B.
AU - Bogomolov, Marina
AU - Benjamini, Yoav
AU - Sabatti, Chiara
N1 - Publisher Copyright:
© 2016 Wiley Periodicals, Inc.
PY - 2016/1/1
Y1 - 2016/1/1
N2 - The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and the false discovery rate (FDR) is frequently adopted as a measure of global error. In the interest of interpretability, results are often summarized so that reporting focuses on variants discovered to be associated to some phenotypes. We show that applying FDR-controlling procedures on the entire collection of hypotheses fails to control the rate of false discovery of associated variants as well as the expected value of the average proportion of false discovery of phenotypes influenced by such variants. We propose a simple hierarchical testing procedure that allows control of both these error rates and provides a more reliable basis for the identification of variants with functional effects. We demonstrate the utility of this approach through simulation studies comparing various error rates and measures of power for genetic association studies of multiple traits. Finally, we apply the proposed method to identify genetic variants that impact flowering phenotypes in Arabidopsis thaliana, expanding the set of discoveries.
AB - The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and the false discovery rate (FDR) is frequently adopted as a measure of global error. In the interest of interpretability, results are often summarized so that reporting focuses on variants discovered to be associated to some phenotypes. We show that applying FDR-controlling procedures on the entire collection of hypotheses fails to control the rate of false discovery of associated variants as well as the expected value of the average proportion of false discovery of phenotypes influenced by such variants. We propose a simple hierarchical testing procedure that allows control of both these error rates and provides a more reliable basis for the identification of variants with functional effects. We demonstrate the utility of this approach through simulation studies comparing various error rates and measures of power for genetic association studies of multiple traits. Finally, we apply the proposed method to identify genetic variants that impact flowering phenotypes in Arabidopsis thaliana, expanding the set of discoveries.
KW - Error control
KW - False discovery rate
KW - Genetic association study
KW - Multiple hypothesis testing
KW - Multiple phenotypes
UR - http://www.scopus.com/inward/record.url?scp=84951746644&partnerID=8YFLogxK
U2 - 10.1002/gepi.21942
DO - 10.1002/gepi.21942
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84951746644
SN - 0741-0395
VL - 40
SP - 45
EP - 56
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 1
ER -