TY - CONF
T1 - MULTIPLE SEQUENCE ALIGNMENT AS A SEQUENCE-TO-SEQUENCE LEARNING PROBLEM
AU - Dotan, Edo
AU - Belinkov, Yonatan
AU - Avram, Oren
AU - Wygoda, Elya
AU - Ecker, Noa
AU - Alburquerque, Michael
AU - Keren, Omri
AU - Loewenthal, Gil
AU - Pupko, Tal
N1 - Publisher Copyright:
© 2023 11th International Conference on Learning Representations, ICLR 2023. All rights reserved.
PY - 2023
Y1 - 2023
N2 - The sequence alignment problem is one of the most fundamental problems in bioinformatics and a plethora of methods were devised to tackle it. Here we introduce BetaAlign, a methodology for aligning sequences using an NLP approach. BetaAlign accounts for the possible variability of the evolutionary process among different datasets by using an ensemble of transformers, each trained on millions of samples generated from a different evolutionary model. Our approach leads to alignment accuracy that is similar and often better than commonly used methods, such as MAFFT, DIALIGN, ClustalW, T-Coffee, PRANK, and MUSCLE.
AB - The sequence alignment problem is one of the most fundamental problems in bioinformatics and a plethora of methods were devised to tackle it. Here we introduce BetaAlign, a methodology for aligning sequences using an NLP approach. BetaAlign accounts for the possible variability of the evolutionary process among different datasets by using an ensemble of transformers, each trained on millions of samples generated from a different evolutionary model. Our approach leads to alignment accuracy that is similar and often better than commonly used methods, such as MAFFT, DIALIGN, ClustalW, T-Coffee, PRANK, and MUSCLE.
UR - http://www.scopus.com/inward/record.url?scp=85199894991&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontoconference.paper???
AN - SCOPUS:85199894991
T2 - 11th International Conference on Learning Representations, ICLR 2023
Y2 - 1 May 2023 through 5 May 2023
ER -