TY - JOUR
T1 - HaploPool
T2 - Improving haplotype frequency estimation through DNA pools and phylogenetic modeling
AU - Kirkpatrick, Bonnie
AU - Armendariz, Carlos Santos
AU - Karp, Richard M.
AU - Halperin, Eran
N1 - Funding Information:
B.K. was supported by the DOE Computational Science Graduate Fellowship under grant number DE-FG02-97ER25308. E.H. and R.K. were supported by NSF grant IIS-0513599. We thank Gad Kimmel for the software to simulate haplotypes using recombination. In addition, we are grateful for the helpful suggestions made by the anonymous reviewers.
PY - 2007/11
Y1 - 2007/11
N2 - Motivation: The search for genetic variants that are linked to complex diseases such as cancer, Parkinson's, or Alzheimer's disease, may lead to better treatments. Since haplotypes can serve as proxies for hidden variants, one method of finding the linked variants is to look for case-control associations between the haplotypes and disease. Finding these associations requires a high-quality estimation of the haplotype frequencies in the population. To this end, we present, HaploPool, a method of estimating haplotype frequencies from blocks of consecutive SNPs. Results: HaploPool leverages the efficiency of DNA pools and estimates the population haplotype frequencies from pools of disjoint sets, each containing two or three unrelated individuals. We study the trade-off between pooling efficiency and accuracy of haplotype frequency estimates. For a fixed genotyping budget, HaploPool performs favorably on pools of two individuals as compared with a state-of-the-art non-pooled phasing method, PHASE. Of independent interest, HaploPool can be used to phase non-pooled genotype data with an accuracy approaching that of PHASE. We compared our algorithm to three programs that estimate haplotype frequencies from pooled data. HaploPool is an order of magnitude more efficient (at least six times faster), and considerably more accurate than previous methods. In contrast to previous methods, HaploPool performs well with missing data, genotyping errors and long haplotype blocks (of between 5 and 25 SNPs).
AB - Motivation: The search for genetic variants that are linked to complex diseases such as cancer, Parkinson's, or Alzheimer's disease, may lead to better treatments. Since haplotypes can serve as proxies for hidden variants, one method of finding the linked variants is to look for case-control associations between the haplotypes and disease. Finding these associations requires a high-quality estimation of the haplotype frequencies in the population. To this end, we present, HaploPool, a method of estimating haplotype frequencies from blocks of consecutive SNPs. Results: HaploPool leverages the efficiency of DNA pools and estimates the population haplotype frequencies from pools of disjoint sets, each containing two or three unrelated individuals. We study the trade-off between pooling efficiency and accuracy of haplotype frequency estimates. For a fixed genotyping budget, HaploPool performs favorably on pools of two individuals as compared with a state-of-the-art non-pooled phasing method, PHASE. Of independent interest, HaploPool can be used to phase non-pooled genotype data with an accuracy approaching that of PHASE. We compared our algorithm to three programs that estimate haplotype frequencies from pooled data. HaploPool is an order of magnitude more efficient (at least six times faster), and considerably more accurate than previous methods. In contrast to previous methods, HaploPool performs well with missing data, genotyping errors and long haplotype blocks (of between 5 and 25 SNPs).
UR - http://www.scopus.com/inward/record.url?scp=36448956040&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btm435
DO - 10.1093/bioinformatics/btm435
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:36448956040
SN - 1367-4803
VL - 23
SP - 3048
EP - 3055
JO - Bioinformatics
JF - Bioinformatics
IS - 22
ER -