TY - JOUR
T1 - A faster algorithm for simultaneous alignment and folding of RNA
AU - Ziv-Ukelson, Michal
AU - Gat-Viks, Irit
AU - Wexler, Ydo
AU - Shamir, Ron
PY - 2010/8/1
Y1 - 2010/8/1
N2 - The current pairwise RNA (secondary) structural alignment algorithms are based on Sankoff's dynamic programming algorithm from 1985. Sankoff's algorithm requires O(N6) time and O(N4) space, where N denotes the length of the compared sequences, and thus its applicability is very limited. The current literature offers many heuristics for speeding up Sankoff's alignment process, some making restrictive assumptions on the length or the shape of the RNA substructures. We show how to speed up Sankoff's algorithm in practice via non-heuristic methods, without compromising optimality. Our analysis shows that the expected time complexity of the new algorithm is O(N4ς(N)), where ς(N) converges to O(N), assuming a standard polymer folding model which was supported by experimental analysis. Hence, our algorithm speeds up Sankoff's algorithm by a linear factor on average. In simulations, our algorithm speeds up computation by a factor of 3-12 for sequences of length 25-250. Code and data sets are available, upon request.
AB - The current pairwise RNA (secondary) structural alignment algorithms are based on Sankoff's dynamic programming algorithm from 1985. Sankoff's algorithm requires O(N6) time and O(N4) space, where N denotes the length of the compared sequences, and thus its applicability is very limited. The current literature offers many heuristics for speeding up Sankoff's alignment process, some making restrictive assumptions on the length or the shape of the RNA substructures. We show how to speed up Sankoff's algorithm in practice via non-heuristic methods, without compromising optimality. Our analysis shows that the expected time complexity of the new algorithm is O(N4ς(N)), where ς(N) converges to O(N), assuming a standard polymer folding model which was supported by experimental analysis. Hence, our algorithm speeds up Sankoff's algorithm by a linear factor on average. In simulations, our algorithm speeds up computation by a factor of 3-12 for sequences of length 25-250. Code and data sets are available, upon request.
KW - Algorithms
KW - RNA
KW - computational molecular biology
KW - secondary structure
KW - sequence analysis
UR - http://www.scopus.com/inward/record.url?scp=77956077195&partnerID=8YFLogxK
U2 - 10.1089/cmb.2009.0197
DO - 10.1089/cmb.2009.0197
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:77956077195
SN - 1066-5277
VL - 17
SP - 1051
EP - 1065
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 8
ER -