Perfect phytogeny and haplotype assignment

Eran Halperin*, Richard M. Karp

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

41 Scopus citations

Abstract

This paper is concerned with the reconstruction of perfect phytogenies from binary character data with missing values, and related problems of inferring complete haplotypes from haplotypes or genotypes with missing data. In cases where the problems considered are NP-hard we assume a rich data hypothesis under which they become tractable. Natural probabilistic models are introduced for the generation of character vectors, haplotypes or genotypes with missing data, and it is shown that these models support the rich data hypothesis. The principal results include: A near-linear time algorithm for inferring a perfect phylogeny from binary character data (or haplotype data) with missing values, under the rich data hypothesis; A quadratic-time algorithm for inferring a perfect phylogeny from genotype data with missing values with high probability, under certain distributional assumptions; Demonstration that the problems of maximum-likelihood inference of complete haplotypes from partial haplotypes or partial genotypes can be cast as minimum-entropy disjoint set cover problems; In the case where the haplotypes come from a perfect phylogeny, a representation of the set cover problem as minimum-entropy covering of subtrees of a tree by nodes; An exact algorithm for minimum-entropy subtree covering, and demonstration that it runs in polynomial time when the subtrees have small diameter; Demonstration that a simple greedy approximation algorithm solves the minimum-entropy subtree covering problem with relative error tending to zero when the number of partial haplotypes per complete haplotype is large; An asymptotically consistent method of estimating the frequencies of the complete haplotypes in a perfect phylogeny, under an iid model for the distribution of missing data; Computational results on real data demonstrating the effectiveness of a the greedy algorithm for inferring haplotypes from genotypes with missing data, even in the absence of a perfect phylogeny.

Original languageEnglish
Pages10-19
Number of pages10
StatePublished - 2004
Externally publishedYes
EventRECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology - San Diego, CA., United States
Duration: 27 Mar 200431 Mar 2004

Conference

ConferenceRECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology
Country/TerritoryUnited States
CitySan Diego, CA.
Period27/03/0431/03/04

Keywords

  • Entropy
  • Haplotypes
  • Perfect phytogeny
  • Phasing

Fingerprint

Dive into the research topics of 'Perfect phytogeny and haplotype assignment'. Together they form a unique fingerprint.

Cite this