A Note on the Relations between Spatio-Genetic Models

Yael Baran, Eran Halperin

Research output: Contribution to journalArticlepeer-review

Abstract

Modeling human genetic variation along the continuous geographic space is a new research direction that has been stirring interest in the community during the past few years. Multiple recent works suggested different probabilistic models for the relation between geography and genetic sequence, and applied them to geographic localization, detection of selection, and correction of confounding in Genome-Wide Association Studies (GWAS). Prior to these developments, continuous representations of genetic structure were produced almost exclusively using dimensionality reduction techniques, mostly principal component analysis (PCA). Although fast and effective in some tasks, PCA suffers from multiple disadvantages, primarily stemming from a lack of explicit underlying genetic model. We begin this note by explaining the implicit spatio-genetic model that underlies PCA. Our presentation provides insights into some of the recently proposed spatial models; particularly, we show that two of these models can be formulated as modifications of PCA, each removing one of PCA's limitations in the context of genetic analysis. We build on one of the models to derive a nonsupervised procedure for the inference of spatial structure, and empirically demonstrate that it outperforms PCA in spatial inference. We then go on to review a few additional recent works in this unifying perspective.

Original languageEnglish
Pages (from-to)905-917
Number of pages13
JournalJournal of Computational Biology
Volume22
Issue number10
DOIs
StatePublished - 1 Oct 2015

Fingerprint

Dive into the research topics of 'A Note on the Relations between Spatio-Genetic Models'. Together they form a unique fingerprint.

Cite this