CRISPR detection from short reads using partial overlap graphs

Ilan Ben-Bassat*, Benny Chor

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Clustered regularly interspaced short palindromic repeats (CRISPR) are structured regions in bacterial and archaeal genomes, which are part of an adaptive immune system against phages. CRISPRs are important for many microbial studies and are playing an essential role in current gene editing techniques. As such, they attract substantial research interest. The exponential growth in the amount of bacterial sequence data in recent years enables the exploration of CRISPR loci in more and more species. Most of the automated tools that detect CRISPR loci rely on fully assembled genomes. However, many assemblers do not handle repetitive regions successfully. The first tool to work directly on raw sequence data is Crass, which requires reads that are long enough to contain two copies of the same repeat. We present a method to identify CRISPR repeats from raw sequence data of short reads. The algorithm is based on an observation differentiating CRISPR repeats from other types of repeats, and it involves a series of partial constructions of the overlap graph. This enables us to avoid many of the difficulties that assemblers face, as we merely aim to identify the repeats that belong to CRISPR loci. A preliminary implementation of the algorithm shows good results and detects CRISPR repeats in cases where other existing tools fail to do so.

Original languageEnglish
Pages (from-to)461-471
Number of pages11
JournalJournal of Computational Biology
Volume23
Issue number6
DOIs
StatePublished - 1 Jun 2016

Keywords

  • CRISPR detection
  • filtering
  • k-mer counting
  • partial overlap graph
  • sampling

Fingerprint

Dive into the research topics of 'CRISPR detection from short reads using partial overlap graphs'. Together they form a unique fingerprint.

Cite this