TY - JOUR
T1 - On the inference of ancestries in admixed populations
AU - Sankararaman, Sriram
AU - Kimmel, Gad
AU - Halperin, Eran
AU - Jordan, Michael I.
PY - 2008/4
Y1 - 2008/4
N2 - Inference of ancestral information in recently admixed populations, in which every individual is composed of a mixed ancestry (e.g., African Americans in the United States), is a challenging problem. Several previous model-based approaches to admixture have been based on hidden Markov models (HMMs) and Markov hidden Markov models (MHMMs). We present an augmented form of these models that can be used to predict historical recombination events and can model background linkage disequilibrium (LD) more accurately. We also study some of the computational issues that arise in using such Markovian models on realistic data sets. In particular, we present an effective initialization procedure that, when combined with expectation-maximization (EM) algorithms for parameter estimation, yields high accuracy at significantly decreased computational cost relative to the Markov chain Monte Carlo (MCMC) algorithms that have generally been used in earlier studies. We present experiments exploring these modeling and algorithmic issues in two scenarios - the inference of locus-specific ancestries in a population that is assumed to originate from two unknown ancestral populations, and the inference of allele frequencies in one ancestral population given those in another.
AB - Inference of ancestral information in recently admixed populations, in which every individual is composed of a mixed ancestry (e.g., African Americans in the United States), is a challenging problem. Several previous model-based approaches to admixture have been based on hidden Markov models (HMMs) and Markov hidden Markov models (MHMMs). We present an augmented form of these models that can be used to predict historical recombination events and can model background linkage disequilibrium (LD) more accurately. We also study some of the computational issues that arise in using such Markovian models on realistic data sets. In particular, we present an effective initialization procedure that, when combined with expectation-maximization (EM) algorithms for parameter estimation, yields high accuracy at significantly decreased computational cost relative to the Markov chain Monte Carlo (MCMC) algorithms that have generally been used in earlier studies. We present experiments exploring these modeling and algorithmic issues in two scenarios - the inference of locus-specific ancestries in a population that is assumed to originate from two unknown ancestral populations, and the inference of allele frequencies in one ancestral population given those in another.
UR - http://www.scopus.com/inward/record.url?scp=41649121004&partnerID=8YFLogxK
U2 - 10.1101/gr.072751.107
DO - 10.1101/gr.072751.107
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 18353809
AN - SCOPUS:41649121004
SN - 1088-9051
VL - 18
SP - 668
EP - 675
JO - Genome Research
JF - Genome Research
IS - 4
ER -