TY - JOUR
T1 - Identification of DNA-binding Proteins Using Structural, Electrostatic and Evolutionary Features
AU - Nimrod, Guy
AU - Szilágyi, András
AU - Leslie, Christina
AU - Ben-Tal, Nir
N1 - Funding Information:
We thank Gilad Wainreb, Matan Kalman, Yanay Ofran, Eran Bacharach and Phaedra Agius for helpful discussions. We thank Roman Laskowski for conducting the ProFunc calculations on the dataset. A.S. was supported by grant PD73096 from the Hungarian Scientific Research Fund. This work was supported by the BLOOMNET ERA-PG grant.
PY - 2009/4/10
Y1 - 2009/4/10
N2 - DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.
AB - DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.
KW - DNA-binding proteins
KW - DNA-binding sites
KW - PatchFinder
KW - random forests
KW - structural genomics
UR - http://www.scopus.com/inward/record.url?scp=62649160355&partnerID=8YFLogxK
U2 - 10.1016/j.jmb.2009.02.023
DO - 10.1016/j.jmb.2009.02.023
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:62649160355
SN - 0022-2836
VL - 387
SP - 1040
EP - 1053
JO - Journal of Molecular Biology
JF - Journal of Molecular Biology
IS - 4
ER -