TY - JOUR
T1 - Comparative testing of DNA segmentation algorithms using benchmark simulations
AU - Elhaik, Eran
AU - Graur, Dan
AU - Josić, Kreimir
PY - 2010/5
Y1 - 2010/5
N2 - Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain guanine and cytosine (GC) content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the literature. Our results show that recursive segmentation algorithms based on the Jensen-Shannon divergence outperform all other algorithms. However, even these algorithms perform poorly in certain instances because of the arbitrary choice of a segmentation-stopping criterion.
AB - Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain guanine and cytosine (GC) content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the literature. Our results show that recursive segmentation algorithms based on the Jensen-Shannon divergence outperform all other algorithms. However, even these algorithms perform poorly in certain instances because of the arbitrary choice of a segmentation-stopping criterion.
KW - Benchmark simulations
KW - Entropy
KW - GC content
KW - Genome composition
KW - Isochores
KW - Jensen-Shannon divergence statistic
KW - Segmentation algorithms
UR - http://www.scopus.com/inward/record.url?scp=77951536244&partnerID=8YFLogxK
U2 - 10.1093/molbev/msp307
DO - 10.1093/molbev/msp307
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 20018981
AN - SCOPUS:77951536244
SN - 0737-4038
VL - 27
SP - 1015
EP - 1024
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
IS - 5
ER -