TY - JOUR
T1 - Maximum likelihood of evolutionary trees
T2 - Hardness and approximation
AU - Chor, Benny
AU - Tuller, Tamir
N1 - Funding Information:
We wish to thank Tal Pupko, Metsada Pasmanik-Chor, and Mike Steel for helpful discussions. This research was supported by ISF grant 418/00.
PY - 2005/6
Y1 - 2005/6
N2 - Motivation: Maximum likelihood (ML) is an increasingly popular optimality criterion for selecting evolutionary trees. Yet the computational complexity of ML was open for over 20 years, and only recently resolved by the authors for the Jukes-Cantor model of substitution and its generalizations. It was proved that reconstructing the ML tree is computationally intractable (NP-hard). In this work we explore three directions, which extend that result. Results: (1) We show that ML under the assumption of molecular clock is still computationally intractable (NP-hard). (2) We show that not only is it computationally intractable to find the exact ML tree, even approximating the logarithm of the ML for any multiplicative factor smaller than 1.00175 is computationally intractable. (3) We develop an algorithm for approximating log-likelihood under the condition that the input sequences are sparse. It employs any approximation algorithm for parsimony, and asymptotically achieves the same approximation ratio. We note that ML reconstruction for sparse inputs is still hard under this condition, and furthermore many real datasets satisfy it.
AB - Motivation: Maximum likelihood (ML) is an increasingly popular optimality criterion for selecting evolutionary trees. Yet the computational complexity of ML was open for over 20 years, and only recently resolved by the authors for the Jukes-Cantor model of substitution and its generalizations. It was proved that reconstructing the ML tree is computationally intractable (NP-hard). In this work we explore three directions, which extend that result. Results: (1) We show that ML under the assumption of molecular clock is still computationally intractable (NP-hard). (2) We show that not only is it computationally intractable to find the exact ML tree, even approximating the logarithm of the ML for any multiplicative factor smaller than 1.00175 is computationally intractable. (3) We develop an algorithm for approximating log-likelihood under the condition that the input sequences are sparse. It employs any approximation algorithm for parsimony, and asymptotically achieves the same approximation ratio. We note that ML reconstruction for sparse inputs is still hard under this condition, and furthermore many real datasets satisfy it.
UR - http://www.scopus.com/inward/record.url?scp=29144484019&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bti1027
DO - 10.1093/bioinformatics/bti1027
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:29144484019
VL - 21
SP - i97-i106
JO - Bioinformatics
JF - Bioinformatics
SN - 1367-4803
IS - SUPPL. 1
ER -