TY - GEN
T1 - Learning natural selection from the site frequency spectrum
AU - Ronen, Roy
AU - Udpa, Nitin
AU - Halperin, Eran
AU - Bafna, Vineet
PY - 2013
Y1 - 2013
N2 - Genetic adaptation to external stimuli occurs through the combined action of mutation and selection. A central problem in genetics is to identify loci responsive to specific selective pressures. Over the last two decades, many tests have been proposed to identify genomic signatures of natural selection. However, the power of these tests changes unpredictably from one dataset to another, with no single dominant method. We build upon recent work that connects many of these tests in a common framework, by describing how positive selection strongly impacts the observed site frequency spectrum (SFS). Many of the proposed tests quantify the skew in SFS to predict selection. Here, we show that the skew depends on many parameters, including the selection coefficient, and time since selection. Moreover, for each of the different regimes of positive selection, informative features of the scaled SFS can be learned from simulated data and applied to population-scale variation data. Using support vector machines, we develop a test that is effective over all selection regimes. On simulated datasets, our test outperforms existing ones over the entire parameter space. We apply our test to variation data from Drosophila melanogaster populations adapted to hypoxia, and identify new loci that were missed by previous approaches, but strengthen the role of the Notch pathway in hypoxia tolerance.
AB - Genetic adaptation to external stimuli occurs through the combined action of mutation and selection. A central problem in genetics is to identify loci responsive to specific selective pressures. Over the last two decades, many tests have been proposed to identify genomic signatures of natural selection. However, the power of these tests changes unpredictably from one dataset to another, with no single dominant method. We build upon recent work that connects many of these tests in a common framework, by describing how positive selection strongly impacts the observed site frequency spectrum (SFS). Many of the proposed tests quantify the skew in SFS to predict selection. Here, we show that the skew depends on many parameters, including the selection coefficient, and time since selection. Moreover, for each of the different regimes of positive selection, informative features of the scaled SFS can be learned from simulated data and applied to population-scale variation data. Using support vector machines, we develop a test that is effective over all selection regimes. On simulated datasets, our test outperforms existing ones over the entire parameter space. We apply our test to variation data from Drosophila melanogaster populations adapted to hypoxia, and identify new loci that were missed by previous approaches, but strengthen the role of the Notch pathway in hypoxia tolerance.
UR - http://www.scopus.com/inward/record.url?scp=84875502754&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-37195-0_19
DO - 10.1007/978-3-642-37195-0_19
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84875502754
SN - 9783642371943
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 230
EP - 233
BT - Research in Computational Molecular Biology - 17th Annual International Conference, RECOMB 2013, Proceedings
T2 - 17th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2013
Y2 - 7 April 2013 through 10 April 2013
ER -