Evaluating supervised and unsupervised background noise correction in human gut microbiome data

Leah Briscoe*, Brunilda Balliu, Sriram Sankararaman, Eran Halperin*, Nandita R. Garud*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

The ability to predict human phenotypes and identify biomarkers of disease from metagenomic data is crucial for the development of therapeutics for microbiome-associated diseases. However, metagenomic data is commonly affected by technical variables unrelated to the phenotype of interest, such as sequencing protocol, which can make it difficult to predict phenotype and find biomarkers of disease. Supervised methods to correct for background noise, originally designed for gene expression and RNA-seq data, are commonly applied to microbiome data but may be limited because they cannot account for unmeasured sources of variation. Unsupervised approaches address this issue, but current methods are limited because they are ill-equipped to deal with the unique aspects of microbiome data, which is compositional, highly skewed, and sparse. We perform a comparative analysis of the ability of different denoising transformations in combination with supervised correction methods as well as an unsupervised principal component correction approach that is presently used in other domains but has not been applied to microbiome data to date. We find that the unsupervised principal component correction approach has comparable ability in reducing false discovery of biomarkers as the supervised approaches, with the added benefit of not needing to know the sources of variation apriori. However, in prediction tasks, it appears to only improve prediction when technical variables contribute to the majority of variance in the data. As new and larger metagenomic datasets become increasingly available, background noise correction will become essential for generating reproducible microbiome analyses.

Original languageEnglish
Article numbere1009838
JournalPLoS Computational Biology
Volume18
Issue number2
DOIs
StatePublished - Feb 2022
Externally publishedYes

Funding

FundersFunder number
National Human Genome Research InstituteU01HG012079

    Fingerprint

    Dive into the research topics of 'Evaluating supervised and unsupervised background noise correction in human gut microbiome data'. Together they form a unique fingerprint.

    Cite this