TY - JOUR
T1 - A comparison of phasing algorithms for trios and unrelated individuals
AU - Marchini, Jonathan
AU - Cutler, David
AU - Patterson, Nick
AU - Stephens, Matthew
AU - Eskin, Eleazar
AU - Halperin, Eran
AU - Lin, Shin
AU - Qin, Zhaohui S.
AU - Munro, Heather M.
AU - Abecasis, Gonçalo R.
AU - Donnelly, Peter
N1 - Funding Information:
We are grateful to Steve Schaffner for help and advice in using a sophisticated coalescent-based simulator, which allowed us to generate haplotype data with complex demographics. J.M. was supported by the Wellcome Trust. P.D. was supported by the Wellcome Trust, the National Institutes of Health (NIH), The SNP Consortium, the Wolfson Foundation, the Nuffield Trust, and the Engineering and Physical Sciences Research Council. M.S. is supported by NIH grant 1RO1HG/LM02585-01. N.P. is a recipient of a K-01 NIH career-transition award. G.R.A. is supported by NIH National Human Genome Research Institute grant HG02651. E.E. is supported by the California Institute for Telecommunications and Information Technology, Calit2. Computational resources for HAP were provided by Calit2 and National Biomedical Computational Resource grant P41 RR08605 (National Center for Research Resources, NIH).
PY - 2006/3
Y1 - 2006/3
N2 - Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million-SNP HapMap data set. Finally, we evaluated methods of estimating the value of r2 between a pair of SNPs and concluded that all methods estimated r2 well when the estimated value was ≥0.8.
AB - Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million-SNP HapMap data set. Finally, we evaluated methods of estimating the value of r2 between a pair of SNPs and concluded that all methods estimated r2 well when the estimated value was ≥0.8.
UR - http://www.scopus.com/inward/record.url?scp=33344458848&partnerID=8YFLogxK
U2 - 10.1086/500808
DO - 10.1086/500808
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:33344458848
SN - 0002-9297
VL - 78
SP - 437
EP - 450
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 3
ER -