A structural EM algorithm for phylogenetic inference

N. Friedman, M. Ninio, I. Pe'er, T. Pupko

Research output: Contribution to conferencePaperpeer-review

Abstract

A central task in the study of evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. A well supported approach to tree reconstruction performs maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally expensive. In this paper, we describe a new algorithm that uses Structural-EM for learning maximum likelihood trees. This algorithm is similar to the standard EM method for estimating branch lengths, except that during iterations of this algorithms the topology is improved as well as the branch length. The algorithm performs iterations of two steps. In the E-Step, we use the current tree topology and branch lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. As we show, searching for better topologies inside the M-step can be done efficiently, as opposed to sta ndard search over topologies. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. We evaluate our new algorithm on both synthetic and real sequence data, and show that it is both dramatically faster and finds more plausible trees than standard search for maximum likelihood phylogenies.

Original languageEnglish
Pages132-140
Number of pages9
DOIs
StatePublished - 2001
Externally publishedYes
Event5th Annual Internatinal Conference on Computational Biology - Montreal, Que., Canada
Duration: 22 May 200126 May 2001

Conference

Conference5th Annual Internatinal Conference on Computational Biology
Country/TerritoryCanada
CityMontreal, Que.
Period22/05/0126/05/01

Fingerprint

Dive into the research topics of 'A structural EM algorithm for phylogenetic inference'. Together they form a unique fingerprint.

Cite this