TY - JOUR
T1 - Estimating Local Ancestry in Admixed Populations
AU - Sankararaman, Sriram
AU - Sridhar, Srinath
AU - Kimmel, Gad
AU - Halperin, Eran
N1 - Funding Information:
G.K, E.H, and S. Sankararaman were supported by National Science Foundation (NSF) grant IIS-0513599S. S. Sankararaman was also partially supported by grant R33 HG003070. G.K. was also partially supported by the Rothschild fellowship. S. Sridhar was supported by NSF grant IIS-0612099. E.H. was also partially supported by NSF grant IIS-0713254.
PY - 2008/2/8
Y1 - 2008/2/8
N2 - Large-scale genotyping of SNPs has shown a great promise in identifying markers that could be linked to diseases. One of the major obstacles involved in performing these studies is that the underlying population substructure could produce spurious associations. Population substructure can be caused by the presence of two distinct subpopulations or a single pool of admixed individuals. In this work, we focus on the latter, which is significantly harder to detect in practice. New advances in this research direction are expected to play a key role in identifying loci that are different among different populations and are still associated with a disease. We evaluated current methods for inference of population substructure in such cases and show that they might be quite inaccurate even in relatively simple scenarios. We therefore introduce a new method, LAMP (Local Ancestry in adMixed Populations), which infers the ancestry of each individual at every single-nucleotide polymorphism (SNP). LAMP computes the ancestry structure for overlapping windows of contiguous SNPs and combines the results with a majority vote. Our empirical results show that LAMP is significantly more accurate and more efficient than existing methods for inferrring locus-specific ancestries, enabling it to handle large-scale datasets. We further show that LAMP can be used to estimate the individual admixture of each individual. Our experimental evaluation indicates that this extension yields a considerably more accurate estimate of individual admixture than state-of-the-art methods such as STRUCTURE or EIGENSTRAT, which are frequently used for the correction of population stratification in association studies.
AB - Large-scale genotyping of SNPs has shown a great promise in identifying markers that could be linked to diseases. One of the major obstacles involved in performing these studies is that the underlying population substructure could produce spurious associations. Population substructure can be caused by the presence of two distinct subpopulations or a single pool of admixed individuals. In this work, we focus on the latter, which is significantly harder to detect in practice. New advances in this research direction are expected to play a key role in identifying loci that are different among different populations and are still associated with a disease. We evaluated current methods for inference of population substructure in such cases and show that they might be quite inaccurate even in relatively simple scenarios. We therefore introduce a new method, LAMP (Local Ancestry in adMixed Populations), which infers the ancestry of each individual at every single-nucleotide polymorphism (SNP). LAMP computes the ancestry structure for overlapping windows of contiguous SNPs and combines the results with a majority vote. Our empirical results show that LAMP is significantly more accurate and more efficient than existing methods for inferrring locus-specific ancestries, enabling it to handle large-scale datasets. We further show that LAMP can be used to estimate the individual admixture of each individual. Our experimental evaluation indicates that this extension yields a considerably more accurate estimate of individual admixture than state-of-the-art methods such as STRUCTURE or EIGENSTRAT, which are frequently used for the correction of population stratification in association studies.
UR - http://www.scopus.com/inward/record.url?scp=40749114839&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2007.09.022
DO - 10.1016/j.ajhg.2007.09.022
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 18252211
AN - SCOPUS:40749114839
SN - 0002-9297
VL - 82
SP - 290
EP - 303
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 2
ER -