CONFINED: Distinguishing biological from technical sources of variation by leveraging multiple methylation datasets

Mike Thompson, Zeyuan Johnson Chen, Elior Rahmani, Eran Halperin*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Methylation datasets are affected by innumerable sources of variability, both biological (cell-type composition, genetics) and technical (batch effects). Here, we propose a reference-free method based on sparse canonical correlation analysis to separate the biological from technical sources of variability. We show through simulations and real data that our method, CONFINED, is not only more accurate than the state-of-the-art reference-free methods for capturing known, replicable biological variability, but it is also considerably more robust to dataset-specific technical variability than previous approaches. CONFINED is available as an R package as detailed at https://github.com/cozygene/CONFINED.

Original languageEnglish
Article number138
JournalGenome Biology
Volume20
Issue number1
DOIs
StatePublished - 12 Jul 2019
Externally publishedYes

Funding

FundersFunder number
Israel Science Foundation5851425/13
National Science Foundation1705197
National Institutes of Health
National Institute of Mental HealthR01MH115979
Edmond J. Safra Center for Ethics, Harvard University

    Fingerprint

    Dive into the research topics of 'CONFINED: Distinguishing biological from technical sources of variation by leveraging multiple methylation datasets'. Together they form a unique fingerprint.

    Cite this