TY - JOUR
T1 - Using a VOM model for reconstructing potential coding regions in EST sequences
AU - Shmilovici, Armin
AU - Ben-Gal, Irad
PY - 2007/4
Y1 - 2007/4
N2 - This paper presents a method for annotating coding and noncoding DNA regions by using variable order Markov (VOM) models. A main advantage in using VOM models is that their order may vary for different sequences, depending on the sequences' statistics. As a result, VOM models are more flexible with respect to model parameterization and can be trained on relatively short sequences and on low-quality datasets, such as expressed sequence tags (ESTs). The paper presents a modified VOM model for detecting and correcting insertion and deletion sequencing errors that are commonly found in ESTs. In a series of experiments the proposed method is found to be robust to random errors in these sequences.
AB - This paper presents a method for annotating coding and noncoding DNA regions by using variable order Markov (VOM) models. A main advantage in using VOM models is that their order may vary for different sequences, depending on the sequences' statistics. As a result, VOM models are more flexible with respect to model parameterization and can be trained on relatively short sequences and on low-quality datasets, such as expressed sequence tags (ESTs). The paper presents a modified VOM model for detecting and correcting insertion and deletion sequencing errors that are commonly found in ESTs. In a series of experiments the proposed method is found to be robust to random errors in these sequences.
KW - Coding and noncoding DNA
KW - Context tree
KW - Gene annotation
KW - Sequencing error detection and correction
KW - Variable order Markov model
UR - http://www.scopus.com/inward/record.url?scp=34247160350&partnerID=8YFLogxK
U2 - 10.1007/s00180-007-0021-8
DO - 10.1007/s00180-007-0021-8
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:34247160350
SN - 0943-4062
VL - 22
SP - 49
EP - 69
JO - Computational Statistics
JF - Computational Statistics
IS - 1
ER -