TY - JOUR
T1 - A machine-learning-based alternative to phylogenetic bootstrap
AU - Ecker, Noa
AU - Huchon, Dorothée
AU - Mansour, Yishay
AU - Mayrose, Itay
AU - Pupko, Tal
N1 - Publisher Copyright:
© The Author(s) 2024. Published by Oxford University Press.
PY - 2024/7/1
Y1 - 2024/7/1
N2 - Motivation: Currently used methods for estimating branch support in phylogenetic analyses often rely on the classic Felsenstein’s bootstrap, parametric tests, or their approximations. As these branch support scores are widely used in phylogenetic analyses, having accurate, fast, and interpretable scores is of high importance. Results: Here, we employed a data-driven approach to estimate branch support values with a probabilistic interpretation. To this end, we simulated thousands of realistic phylogenetic trees and the corresponding multiple sequence alignments. Each of the obtained alignments was used to infer the phylogeny using state-of-the-art phylogenetic inference software, which was then compared to the true tree. Using these extensive data, we trained machine-learning algorithms to estimate branch support values for each bipartition within the maximum-likelihood trees obtained by each software. Our results demonstrate that our model provides fast and more accurate probability-based branch support values than commonly used procedures. We demonstrate the applicability of our approach on empirical datasets.
AB - Motivation: Currently used methods for estimating branch support in phylogenetic analyses often rely on the classic Felsenstein’s bootstrap, parametric tests, or their approximations. As these branch support scores are widely used in phylogenetic analyses, having accurate, fast, and interpretable scores is of high importance. Results: Here, we employed a data-driven approach to estimate branch support values with a probabilistic interpretation. To this end, we simulated thousands of realistic phylogenetic trees and the corresponding multiple sequence alignments. Each of the obtained alignments was used to infer the phylogeny using state-of-the-art phylogenetic inference software, which was then compared to the true tree. Using these extensive data, we trained machine-learning algorithms to estimate branch support values for each bipartition within the maximum-likelihood trees obtained by each software. Our results demonstrate that our model provides fast and more accurate probability-based branch support values than commonly used procedures. We demonstrate the applicability of our approach on empirical datasets.
UR - http://www.scopus.com/inward/record.url?scp=85197158908&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btae255
DO - 10.1093/bioinformatics/btae255
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 38940166
AN - SCOPUS:85197158908
SN - 1367-4803
VL - 40
SP - i208-i217
JO - Bioinformatics
JF - Bioinformatics
ER -