Haplotyping with missing data via perfect path phylogenies

Jens Gramm, Till Nierhoff, Roded Sharan, Till Tantau*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Computational methods for inferring haplotype information from genotype data are used in studying the association between genomic variation and medical condition. Recently, Gusfield proposed a haplotype inference method that is based on perfect phylogeny principles. A fundamental problem arises when one tries to apply this approach in the presence of missing genotype data, which is common in practice. We show that the resulting theoretical problem is NP-hard even in very restricted cases. To cope with missing data, we introduce a variant of haplotyping via perfect phylogeny in which a path phylogeny is sought. Searching for perfect path phylogenies is strongly motivated by the characteristics of human genotype data: 70% of real instances that admit a perfect phylogeny also admit a perfect path phylogeny. Our main result is a fixed-parameter algorithm for haplotyping with missing data via perfect path phylogenies. We also present a simple linear-time algorithm for the problem on complete data.

Original languageEnglish
Pages (from-to)788-805
Number of pages18
JournalDiscrete Applied Mathematics
Volume155
Issue number6-7
DOIs
StatePublished - 1 Apr 2007

Keywords

  • Fixed-parameter algorithms
  • Genotypes
  • Haplotypes
  • Haplotyping
  • Incomplete data
  • Missing data
  • Path phylogenies
  • Perfect phylogenies
  • Phylogenetics

Fingerprint

Dive into the research topics of 'Haplotyping with missing data via perfect path phylogenies'. Together they form a unique fingerprint.

Cite this