TY - JOUR

T1 - Computational problems in perfect phylogeny haplotyping

T2 - Typing without calling the allele

AU - Barzuza, Tamar

AU - Beckmann, Jacques S.

AU - Shamir, Ron

AU - Pe'er, Itsik

N1 - Funding Information:
R. Shamir was supported in part by the German Israeli Fund (Grant 237/2005) and by the Israeli Science Foundation (Grant 309/02). I. Pe’er was supported in part by the National Institute of Health (Grant F32 DK070527-01).

PY - 2008

Y1 - 2008

N2 - A haplotype is an m-long binary vector. The XOR-genotype of two haplotypes is the m-vcctor of their coordinate-wise XOR. We study the following problem: Given a set of XOR-genotypes, reconstruct their haplotypes so that the set of resulting haplotypes can be mapped onto a perfect phylogeny (PP) tree. The question is motivated by studying population evolution in human genetics and is a variant of the PP haplotyping problem that has received intensive attention recently. Unlike the latter problem, in which the input is "full" genotypes, here, we assume less informative input and so may be more economical to obtain experimentally. Building on ideas of Gusfield, we show how to solve the problem in polynomial time by a reduction to the graph realization problem. The actual haplotypes are not uniquely determined by the tree they map onto and the tree itself may or may not be unique. We show that tree uniqueness implies uniquely determined haplotypes, up to inherent degrees of freedom, and give a sufficient condition for the uniqueness. To actually determine the haplotypes given the tree, additional information is necessary. We show that two or three full genotypes suffice to reconstruct all the haplotypes and present a linear algorithm for identifying those genotypes.

AB - A haplotype is an m-long binary vector. The XOR-genotype of two haplotypes is the m-vcctor of their coordinate-wise XOR. We study the following problem: Given a set of XOR-genotypes, reconstruct their haplotypes so that the set of resulting haplotypes can be mapped onto a perfect phylogeny (PP) tree. The question is motivated by studying population evolution in human genetics and is a variant of the PP haplotyping problem that has received intensive attention recently. Unlike the latter problem, in which the input is "full" genotypes, here, we assume less informative input and so may be more economical to obtain experimentally. Building on ideas of Gusfield, we show how to solve the problem in polynomial time by a reduction to the graph realization problem. The actual haplotypes are not uniquely determined by the tree they map onto and the tree itself may or may not be unique. We show that tree uniqueness implies uniquely determined haplotypes, up to inherent degrees of freedom, and give a sufficient condition for the uniqueness. To actually determine the haplotypes given the tree, additional information is necessary. We show that two or three full genotypes suffice to reconstruct all the haplotypes and present a linear algorithm for identifying those genotypes.

KW - Graph realization

KW - Haplotypes

KW - Perfect phylogeny

KW - XOR-genotypes

UR - http://www.scopus.com/inward/record.url?scp=38949115579&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2007.1063

DO - 10.1109/TCBB.2007.1063

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:38949115579

SN - 1545-5963

VL - 5

SP - 101

EP - 109

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

IS - 1

ER -