The Southwest Asian, circum-Mediterranean, and Southern European populations (collectively, SWAMSE) together with Northern European populations form one of five “continental” groups of global populations in many analyses of population relationships. This region is of great anthropologic and forensic interest but relationships of large numbers of populations within the region have not been able to be cleanly resolved with autosomal genetic markers. To examine the genetic boundaries to the SWAMSE region and whether internal structure can be detected we have assembled data for a total of 151 separate autosomal genetic markers on populations in this region and other parts of the world for a global set of 95 populations. The markers include 83 ancestry informative SNPs as singletons and 68 microhaplotype loci defined by 204 SNPs. The 151 loci are ancestry informative on a global scale, identifying at least five biogeographic clusters. One of those clusters is a clear grouping of 37 populations containing the SWAMSE plus northern European populations to the exclusion of populations in South Central Asia and populations from farther East. A refined analysis of the 37 populations shows the northern European populations clustering separately from the SWAMSE populations. Within Southwest Asia the Samaritans and Shabaks are distinct outliers. The Yemenite Jews, Saudi, Kuwaiti, Palestinian Arabs, and Southern Tunisians cluster together loosely while the remaining populations from Northern Iraq, Mediterranean Europe, the Caucasus region, and Iran cluster in a more complex graded fashion. The majority of the SWAMSE populations from the mainland of Southwest Asia form a cluster with little internal structure reflecting a very complex history of endogamy and migrations. The set of 151 DNA polymorphisms not only distinguishes major geographical regions globally but can distinguish ancestry to a small degree within geographical regions such as SWAMSE. We discuss forensic characteristics of the polymorphisms and also identify those that rank highest by Rosenberg's In measure for the SWAMSE region populations and for the global set of populations analyzed. Data availability: Genotypes on all 151 markers for all 3790 individuals typed in the Kidd Lab on the 72 Kidd lab populations have been deposited in the Zenodo archive and can be freely accessed at https://doi.org/10.5281/zenodo.4658892. Some of the data has been made public previously as supplemental files appended to publications. Data for the additional individuals included in the analyses was taken from already public datasets as indicated in the text.
- Population genetics