TY - JOUR
T1 - Comparison of site-specific rate-inference methods for protein sequences
T2 - Empirical Bayesian methods are superior
AU - Mayrose, Itay
AU - Graur, Dan
AU - Ben-Tal, Nir
AU - Pupko, Tal
PY - 2004/9
Y1 - 2004/9
N2 - The degree to which an amino acid site is free to vary is strongly dependent on its structural and functional importance. An amino acid that plays an essential role is unlikely to change over evolutionary time. Hence, the evolutionary rate at an amino acid site is indicative of how conserved this site is and, in turn, allows evaluation of its importance in maintaining the structure/function of the protein. When using probabilistic methods for site-specific rate inference, few alternatives are possible. In this study we use simulations to compare the maximum-likelihood and Bayesian paradigms. We study the dependence of inference accuracy on such parameters as number of sequences, branch lengths, the shape of the rate distribution, and sequence length. We also study the possibility of simultaneously estimating branch lengths and site-specific rates. Our results show that a Bayesian approach is superior to maximum-likelihood under a wide range of conditions, indicating that the prior that is incorporated into the Bayesian computation significantly improves performance. We show that when branch lengths are unknown, it is better first to estimate branch lengths and then to estimate site-specific rates. This procedure was found to be superior to estimating both the branch lengths and site-specific rates simultaneously. Finally, we illustrate the difference between maximum-likelihood and Bayesian methods when analyzing site-conservation for the apoptosis regulator protein Bcl-X L.
AB - The degree to which an amino acid site is free to vary is strongly dependent on its structural and functional importance. An amino acid that plays an essential role is unlikely to change over evolutionary time. Hence, the evolutionary rate at an amino acid site is indicative of how conserved this site is and, in turn, allows evaluation of its importance in maintaining the structure/function of the protein. When using probabilistic methods for site-specific rate inference, few alternatives are possible. In this study we use simulations to compare the maximum-likelihood and Bayesian paradigms. We study the dependence of inference accuracy on such parameters as number of sequences, branch lengths, the shape of the rate distribution, and sequence length. We also study the possibility of simultaneously estimating branch lengths and site-specific rates. Our results show that a Bayesian approach is superior to maximum-likelihood under a wide range of conditions, indicating that the prior that is incorporated into the Bayesian computation significantly improves performance. We show that when branch lengths are unknown, it is better first to estimate branch lengths and then to estimate site-specific rates. This procedure was found to be superior to estimating both the branch lengths and site-specific rates simultaneously. Finally, we illustrate the difference between maximum-likelihood and Bayesian methods when analyzing site-conservation for the apoptosis regulator protein Bcl-X L.
KW - Bcl-X
KW - Bioinformatics
KW - Empirical Bayesian methods
KW - Evolutionary conservation
KW - Rate variation among sites
UR - http://www.scopus.com/inward/record.url?scp=4143051195&partnerID=8YFLogxK
U2 - 10.1093/molbev/msh194
DO - 10.1093/molbev/msh194
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 15201400
AN - SCOPUS:4143051195
SN - 0737-4038
VL - 21
SP - 1781
EP - 1791
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
IS - 9
ER -