Alignment errors strongly impact likelihood-based tests for comparing topologies

Eli Levy Karin, Edward Susko, Tal Pupko

Research output: Contribution to journalArticlepeer-review

Abstract

Estimating phylogenetic trees from sequence data is an extremely challenging and important statistical task. Within the maximum-likelihood paradigm, the best tree is a point estimate. To determine how strongly the data support such an evolutionary scenario, a hypothesis testing methodology is required. To this end, the Kishino-Hasegawa (KH) test was developed to determine whether one topology is significantly more supported by the sequence data than another one. This test and its derivatives are widely used in phylogenetics and phylogenomics. Here, we show that the KH test is biased in the presence of alignment error and can lead to erroneous conclusions. Using simulations we demonstrated that due to alignment errors the KH test often rejects one of the competing topologies, even though both topologies are equally supported by the data. Specifically, we show that the KH test favors the guide tree used to align the analyzed sequences. Further, branch length optimization renders the test too conservative. We propose two possible corrections for these biases. First, we evaluated the impact of removing unreliable alignment columns and found out that it decreases the bias at the cost of substantially reducing the test's power. Second, we developed a parametric test that entirely abolishes the biases without data filtering. This test incorporates the alignment construction step into the test's hypothesis, thus removing the above guide tree effect. We extend this methodology for the case of multiple-topology comparisons and demonstrate the applicability of the new methodology on an exemplary data set.

Original languageEnglish
Pages (from-to)3057-3067
Number of pages11
JournalMolecular Biology and Evolution
Volume31
Issue number11
DOIs
StatePublished - 1 Nov 2014

Keywords

  • KH test
  • SOWH test
  • alignment
  • alignment uncertainty
  • branch length optimization
  • likelihood
  • phylogeny
  • tree comparisons

Fingerprint

Dive into the research topics of 'Alignment errors strongly impact likelihood-based tests for comparing topologies'. Together they form a unique fingerprint.

Cite this