Computational problems in noisy SNP and haplotype analysis: Block scores, block identification, and population stratification

Gad Kimmel*, Roded Sharan, Ron Shamir

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The study of haplotypes and their diversity in a population is central to disease-association research. We study several problems arising in haplotype block partitioning. Our objective function is the total number of distinct haplotypes in blocks. We show that the problem is NP-hard when there are errors or missing data, and provide approximation algorithms for several of its variants. We also give an algorithm that solves the problem with high probability under a probabilistic model that allows noise and missing data. In addition, we study the multipopulation case, where one has to partition the haplotypes into populations and seek a different block partition in each one. We provide a heuristic for that problem and use it to analyze simulated and real data. On simulated data, our blocks resemble the true partition more than the blocks generated by the LD-based algorithm of Gabriel et al (2002). On single-population real data, we generate a more concise block description than do extant approaches, with better average LD within blocks. The algorithm also gives promising results on real two-population genotype data.

Original languageEnglish
Pages (from-to)360-370
Number of pages11
JournalINFORMS Journal on Computing
Volume16
Issue number4
DOIs
StatePublished - 2004

Keywords

  • Algorithm
  • Block
  • Complexity
  • Genotype
  • Haplotype
  • SNP
  • Stratification
  • Subpopulation

Fingerprint

Dive into the research topics of 'Computational problems in noisy SNP and haplotype analysis: Block scores, block identification, and population stratification'. Together they form a unique fingerprint.

Cite this