Comparative testing of DNA segmentation algorithms using benchmark simulations

Eran Elhaik*, Dan Graur, Kreimir Josić

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain guanine and cytosine (GC) content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the literature. Our results show that recursive segmentation algorithms based on the Jensen-Shannon divergence outperform all other algorithms. However, even these algorithms perform poorly in certain instances because of the arbitrary choice of a segmentation-stopping criterion.

Original languageEnglish
Pages (from-to)1015-1024
Number of pages10
JournalMolecular Biology and Evolution
Volume27
Issue number5
DOIs
StatePublished - May 2010
Externally publishedYes

Keywords

  • Benchmark simulations
  • Entropy
  • GC content
  • Genome composition
  • Isochores
  • Jensen-Shannon divergence statistic
  • Segmentation algorithms

Fingerprint

Dive into the research topics of 'Comparative testing of DNA segmentation algorithms using benchmark simulations'. Together they form a unique fingerprint.

Cite this