Large scale reconstruction of haplotypes from genotype data

Eleazar Eskin*, Eran Halperin, Richard M. Karp

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

61 Scopus citations

Abstract

Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize an individual's variation, we must determine an individual's haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from geno-type data. Our method leverages a new insight into the underlying structure of haplotypes which shows that SNPs are organized in highly correlated "blocks". The majority of individuals have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks and for each block, we predict the common haplotypes and each individual's haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (0.47%) when taking into account the predictions for the uncommon haplotypes. Our method is extremely efficient compared to previous methods, (a matter of seconds where previous methods needed hours). Its efficiency allows us to find the block partition of the haplotypes, to cope with missing data and to work with large data sets such as genotypes for thousands of SNPs for hundreds of individuals. The algorithm is available via web-server at http://www.cs.columbia.edu/compbio/hap/.

Original languageEnglish
Pages104-113
Number of pages10
StatePublished - 2003
Externally publishedYes
EventSeventh Annual International Conference on Research in Computational Molecular Biology - Berlin, Germany
Duration: 10 Apr 200313 Apr 2003

Conference

ConferenceSeventh Annual International Conference on Research in Computational Molecular Biology
Country/TerritoryGermany
CityBerlin
Period10/04/0313/04/03

Fingerprint

Dive into the research topics of 'Large scale reconstruction of haplotypes from genotype data'. Together they form a unique fingerprint.

Cite this