TY - JOUR
T1 - The Tree Reconstruction Game
T2 - Phylogenetic Reconstruction Using Reinforcement Learning
AU - Azouri, Dana
AU - Granit, Oz
AU - Alburquerque, Michael
AU - Mansour, Yishay
AU - Pupko, Tal
AU - Mayrose, Itay
N1 - Publisher Copyright:
© 2024 The Author(s). Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
PY - 2024/6/1
Y1 - 2024/6/1
N2 - The computational search for the maximum-likelihood phylogenetic tree is an NP-hard problem. As such, current tree search algorithms might result in a tree that is the local optima, not the global one. Here, we introduce a paradigm shift for predicting the maximum-likelihood tree, by approximating long-term gains of likelihood rather than maximizing likelihood gain at each step of the search. Our proposed approach harnesses the power of reinforcement learning to learn an optimal search strategy, aiming at the global optimum of the search space. We show that when analyzing empirical data containing dozens of sequences, the log-likelihood improvement from the starting tree obtained by the reinforcement learning-based agent was 0.969 or higher compared to that achieved by current state-of-the-art techniques. Notably, this performance is attained without the need to perform costly likelihood optimizations apart from the training process, thus potentially allowing for an exponential increase in runtime. We exemplify this for data sets containing 15 sequences of length 18,000 bp and demonstrate that the reinforcement learning-based method is roughly three times faster than the state-of-the-art software. This study illustrates the potential of reinforcement learning in addressing the challenges of phylogenetic tree reconstruction.
AB - The computational search for the maximum-likelihood phylogenetic tree is an NP-hard problem. As such, current tree search algorithms might result in a tree that is the local optima, not the global one. Here, we introduce a paradigm shift for predicting the maximum-likelihood tree, by approximating long-term gains of likelihood rather than maximizing likelihood gain at each step of the search. Our proposed approach harnesses the power of reinforcement learning to learn an optimal search strategy, aiming at the global optimum of the search space. We show that when analyzing empirical data containing dozens of sequences, the log-likelihood improvement from the starting tree obtained by the reinforcement learning-based agent was 0.969 or higher compared to that achieved by current state-of-the-art techniques. Notably, this performance is attained without the need to perform costly likelihood optimizations apart from the training process, thus potentially allowing for an exponential increase in runtime. We exemplify this for data sets containing 15 sequences of length 18,000 bp and demonstrate that the reinforcement learning-based method is roughly three times faster than the state-of-the-art software. This study illustrates the potential of reinforcement learning in addressing the challenges of phylogenetic tree reconstruction.
KW - artificial intelligence
KW - evolution
KW - machine learning
KW - molecular biology
KW - phylogenetics
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85196325657&partnerID=8YFLogxK
U2 - 10.1093/molbev/msae105
DO - 10.1093/molbev/msae105
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 38829798
AN - SCOPUS:85196325657
SN - 0737-4038
VL - 41
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
IS - 6
M1 - msae105
ER -