TY - JOUR
T1 - Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data
AU - Heller, Ruth
AU - Chatterjee, Nilanjan
AU - Krieger, Abba
AU - Shi, Jianxin
N1 - Publisher Copyright:
© 2018, © 2018 American Statistical Association.
PY - 2018/10/2
Y1 - 2018/10/2
N2 - In many genomic applications, hypotheses tests are performed for powerful identification of signals by aggregating test-statistics across units within naturally defined classes. Following class-level testing, it is naturally of interest to identify the lower level units which contain true signals. Testing the individual units within a class without taking into account the fact that the class was selected using an aggregate-level test-statistic, will produce biased inference. We develop a hypothesis testing framework that guarantees control for false positive rates conditional on the fact that the class was selected. Specifically, we develop procedures for calculating unit level p-values that allows rejection of null hypotheses controlling for two types of conditional error rates, one relating to family-wise rate and the other relating to false discovery rate. We use simulation studies to illustrate validity and power of the proposed procedure in comparison to several possible alternatives. We illustrate the power of the method in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project. Supplementary materials for this article are available online.
AB - In many genomic applications, hypotheses tests are performed for powerful identification of signals by aggregating test-statistics across units within naturally defined classes. Following class-level testing, it is naturally of interest to identify the lower level units which contain true signals. Testing the individual units within a class without taking into account the fact that the class was selected using an aggregate-level test-statistic, will produce biased inference. We develop a hypothesis testing framework that guarantees control for false positive rates conditional on the fact that the class was selected. Specifically, we develop procedures for calculating unit level p-values that allows rejection of null hypotheses controlling for two types of conditional error rates, one relating to family-wise rate and the other relating to false discovery rate. We use simulation studies to illustrate validity and power of the proposed procedure in comparison to several possible alternatives. We illustrate the power of the method in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project. Supplementary materials for this article are available online.
KW - Conditional p-value
KW - False discovery rate
KW - Multiple testing
KW - Selective inference
UR - http://www.scopus.com/inward/record.url?scp=85049152891&partnerID=8YFLogxK
U2 - 10.1080/01621459.2017.1375933
DO - 10.1080/01621459.2017.1375933
M3 - מאמר
AN - SCOPUS:85049152891
VL - 113
SP - 1770
EP - 1783
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
SN - 0162-1459
IS - 524
ER -