Analytic solutions for three taxon ML trees with variable rates across sites

Benny Chor, Michael Hendy, David Penny

Research output: Contribution to journalArticlepeer-review

Abstract

We consider the problem of finding the maximum likelihood rooted tree of three species under a molecular clock symmetric model of substitution of 2-state characters. For identically distributed rates per site this is probably the simplest phylogenetic estimation problem, and it is readily solved numerically. Analytic solutions, on the other hand, were obtained only recently by Yang [Complexity of the simplest phylogenetic estimation problem, Proc. Roy Soc. London Ser. B 267 (2000) 109-119]. In this work we provide analytic solutions for any distribution of rates across sites, provided the moment generating function of the distribution is strictly increasing over the negative real numbers. This class of distributions includes, among others, identical rates across sites, as well as the Gamma, the uniform, and the inverse Gaussian distributions. Our work therefore generalizes Yang's solution and our derivation of the analytic solution is substantially simpler. We use the Hadamard conjugation to prove a general statement about the edge lengths of any neighboring pair of leaves in any phylogenetic tree (on three or more taxa). We then employ this relation, in conjunction with the convexity of an entropy-like function, to derive the analytic solution.

Original languageEnglish
Pages (from-to)750-758
Number of pages9
JournalDiscrete Applied Mathematics
Volume155
Issue number6-7
DOIs
StatePublished - 1 Apr 2007

Keywords

  • 2-state model
  • Hadamard conjugation
  • Maximum likelihood
  • Molecular clock
  • Phylogenetic trees
  • Unequal rates across sites

Fingerprint

Dive into the research topics of 'Analytic solutions for three taxon ML trees with variable rates across sites'. Together they form a unique fingerprint.

Cite this