TY - GEN
T1 - Motif extraction and protein classification
AU - Kunik, Vered
AU - Solan, Zach
AU - Edelman, Shimon
AU - Ruppin, Eytan
AU - Horn, David
PY - 2005
Y1 - 2005
N2 - We present a novel unsupervised method for extracting meaningful motifs from biological sequence data. This de novo motif extraction (MEX) algorithm is data driven, finding motifs that are not necessarily over-represented in the data. Applying MEX to the oxidoreductases class of enzymes, containing approximately 7000 enzyme sequences, a relatively small set of motifs is obtained. This set spans a motif-space that is used for functional classification of the enzymes by an SVM classifier. The classification based on MEX motifs surpasses that of two other SVM based methods: SVMProt, a method based on the analysis of physical-chemical properties of a protein generated from its sequence of amino acids, and SVM applied to a Smith-Waterman distances matrix. Our findings demonstrate that the MEX algorithm extracts relevant motifs, supporting a successful sequence-to-function classification.
AB - We present a novel unsupervised method for extracting meaningful motifs from biological sequence data. This de novo motif extraction (MEX) algorithm is data driven, finding motifs that are not necessarily over-represented in the data. Applying MEX to the oxidoreductases class of enzymes, containing approximately 7000 enzyme sequences, a relatively small set of motifs is obtained. This set spans a motif-space that is used for functional classification of the enzymes by an SVM classifier. The classification based on MEX motifs surpasses that of two other SVM based methods: SVMProt, a method based on the analysis of physical-chemical properties of a protein generated from its sequence of amino acids, and SVM applied to a Smith-Waterman distances matrix. Our findings demonstrate that the MEX algorithm extracts relevant motifs, supporting a successful sequence-to-function classification.
KW - Enzyme classification
KW - Motif extraction
UR - http://www.scopus.com/inward/record.url?scp=33745495992&partnerID=8YFLogxK
U2 - 10.1109/CSB.2005.39
DO - 10.1109/CSB.2005.39
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
C2 - 16447965
AN - SCOPUS:33745495992
SN - 0769523447
SN - 9780769523446
T3 - Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005
SP - 80
EP - 85
BT - Proceedings - 2005 IEEE Computational SystemsBioinformatics Conference, CSB 2005
T2 - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005
Y2 - 8 August 2005 through 11 August 2005
ER -