Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

Marcus Alvarez, Elior Rahmani, Brandon Jew, Kristina M. Garske, Zong Miao, Jihane N. Benhammou, Chun Jimmie Ye, Joseph R. Pisegna, Kirsi H. Pietiläinen, Eran Halperin, Päivi Pajukanta*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

51 Scopus citations

Abstract

Single-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. We observe that snRNA-seq is commonly subject to contamination by high amounts of ambient RNA, which can lead to biased downstream analyses, such as identification of spurious cell types if overlooked. We present a novel approach to quantify contamination and filter droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: (1) human differentiating preadipocytes in vitro, (2) fresh mouse brain tissue, and (3) human frozen adipose tissue (AT) from six individuals. All three data sets showed evidence of extranuclear RNA contamination, and we observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq, our clustering strategy also successfully filtered single-cell RNA-seq data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.

Original languageEnglish
Article number11019
JournalScientific Reports
Volume10
Issue number1
DOIs
StatePublished - 1 Dec 2020
Externally publishedYes

Funding

FundersFunder number
DDRCDKP3041301
National Science Foundation1705197
National Institutes of HealthHL-095056, U01 DK105561, HL-28481
Howard Hughes Medical InstituteDK41301
National Human Genome Research Institute1R56MD013312, 5UL1TR001881, F31HL142180, DGE-1650604, T32HG002536, 1R01MH115979, HG010505-02
National Institute of General Medical SciencesR25GM112625
American Heart Association19PRE34430112
National Center for Advancing Translational Sciences
Indiana Clinical and Translational Sciences InstituteULTR001881
Helsingin Yliopisto
Helsingin ja Uudenmaan Sairaanhoitopiiri
Suomen Lääketieteen Säätiö
Academy of Finland272376, 315035, 266286, 314383
Signe ja Ane Gyllenbergin Säätiö
Sigrid Juséliuksen Säätiö
Novo Nordisk Fonden
Diabetestutkimussäätiö

    Fingerprint

    Dive into the research topics of 'Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM'. Together they form a unique fingerprint.

    Cite this