Synthetic DNA barcodes identify singlets in scRNA-seq datasets and evaluate doublet algorithms

Ziyang Zhang, Madeline E. Melzer*, Keerthana M. Arun, Hanxiao Sun, Carl Johan Eriksson, Itai Fabian, Sagi Shaashua, Karun Kiani, Yaara Oren, Yogesh Goyal*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Single-cell RNA sequencing (scRNA-seq) datasets contain true single cells, or singlets, in addition to cells that coalesce during the protocol, or doublets. Identifying singlets with high fidelity in scRNA-seq is necessary to avoid false negative and false positive discoveries. Although several methodologies have been proposed, they are typically tested on highly heterogeneous datasets and lack a priori knowledge of true singlets. Here, we leveraged datasets with synthetically introduced DNA barcodes for a hitherto unexplored application: to extract ground-truth singlets. We demonstrated the feasibility of our framework, “singletCode,” to evaluate existing doublet detection methods across a range of contexts. We also leveraged our ground-truth singlets to train a proof-of-concept machine learning classifier, which outperformed other doublet detection algorithms. Our integrative framework can identify ground-truth singlets and enable robust doublet detection in non-barcoded datasets.

Original languageEnglish
Article number100592
JournalCell Genomics
Volume4
Issue number7
DOIs
StatePublished - 10 Jul 2024

Funding

FundersFunder number
Northwestern University
Michael Ratz
LARRY
Caleb Weinreb and Allon Klein
Kunal Jindal and Samantha Morris
Aurelia Leona
TREX
National Institute for Theory and Mathematics
Burroughs Wellcome Fund
University of Pennsylvania
National Science FoundationDMS-2235451
Simons FoundationMPTMPS-00005320

    Keywords

    • barcoding
    • benchmarking
    • doublet detection
    • lineage tracing
    • machine learning
    • scRNA-seq
    • single-cell genomics
    • singletCode
    • singlets

    Fingerprint

    Dive into the research topics of 'Synthetic DNA barcodes identify singlets in scRNA-seq datasets and evaluate doublet algorithms'. Together they form a unique fingerprint.

    Cite this