TY - JOUR
T1 - Ancestral sequence reconstruction
T2 - Accounting for structural information by averaging over replacement matrices
AU - Moshe, Asher
AU - Pupko, Tal
N1 - Publisher Copyright:
© 2018 The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].
PY - 2019/8/1
Y1 - 2019/8/1
N2 - Motivation: Ancestral sequence reconstruction (ASR) is widely used to understand protein evolution, structure and function. Current ASR methodologies do not fully consider differences in evolutionary constraints among positions imposed by the three-dimensional (3D) structure of the protein. Here, we developed an ASR algorithm that allows different protein sites to evolve according to different mixtures of replacement matrices. We show that assigning replacement matrices to protein positions based on their solvent accessibility leads to ASR with higher log-likelihoods compared to naïve models that assume a single replacement matrix for all sites. Improved ASR log-likelihoods are also demonstrated when solvent accessibility is predicted from protein sequences rather than inferred from a known 3D structure. Finally, we show that using such structure-aware mixture models results in substantial differences in the inferred ancestral sequences. Availability and implementation: http://fastml.tau.ac.il. Supplementary information: Supplementary data are available at Bioinformatics online.
AB - Motivation: Ancestral sequence reconstruction (ASR) is widely used to understand protein evolution, structure and function. Current ASR methodologies do not fully consider differences in evolutionary constraints among positions imposed by the three-dimensional (3D) structure of the protein. Here, we developed an ASR algorithm that allows different protein sites to evolve according to different mixtures of replacement matrices. We show that assigning replacement matrices to protein positions based on their solvent accessibility leads to ASR with higher log-likelihoods compared to naïve models that assume a single replacement matrix for all sites. Improved ASR log-likelihoods are also demonstrated when solvent accessibility is predicted from protein sequences rather than inferred from a known 3D structure. Finally, we show that using such structure-aware mixture models results in substantial differences in the inferred ancestral sequences. Availability and implementation: http://fastml.tau.ac.il. Supplementary information: Supplementary data are available at Bioinformatics online.
UR - http://www.scopus.com/inward/record.url?scp=85070702176&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bty1031
DO - 10.1093/bioinformatics/bty1031
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85070702176
SN - 1367-4803
VL - 35
SP - 2562
EP - 2568
JO - Bioinformatics
JF - Bioinformatics
IS - 15
ER -