TY - JOUR
T1 - RAP
T2 - Accurate and fast motif finding based on protein-binding microarray data
AU - Orenstein, Yaron
AU - Mick, Eran
AU - Shamir, Ron
N1 - Funding Information:
We thank R. Stone for providing the CCD I-band images from the Lick 1 m telescope, and B. Schaeffer and W. Harrison for assistance with the observing at Keck. This work was partially supported by NSF award 9018527.
PY - 2013/5/1
Y1 - 2013/5/1
N2 - The novel high-throughput technology of protein-binding microarrays (PBMs) measures binding intensity of a transcription factor to thousands of DNA probe sequences. Several algorithms have been developed to extract binding-site motifs from these data. Such motifs are commonly represented by positional weight matrices. Previous studies have shown that the motifs produced by these algorithms are either accurate in predicting in vitro binding or similar to previously published motifs, but not both. In this work, we present a new simple algorithm to infer binding-site motifs from PBM data. It outperforms prior art both in predicting in vitro binding and in producing motifs similar to literature motifs. Our results challenge previous claims that motifs with lower information content are better models for transcription-factor binding specificity. Moreover, we tested the effect of motif length and side positions flanking the "core" motif in the binding site. We show that side positions have a significant effect and should not be removed, as commonly done. A large drop in the results quality of all methods is observed between in vitro and in vivo binding prediction. The software is available on acgt.cs.tau.ac.il/rap.
AB - The novel high-throughput technology of protein-binding microarrays (PBMs) measures binding intensity of a transcription factor to thousands of DNA probe sequences. Several algorithms have been developed to extract binding-site motifs from these data. Such motifs are commonly represented by positional weight matrices. Previous studies have shown that the motifs produced by these algorithms are either accurate in predicting in vitro binding or similar to previously published motifs, but not both. In this work, we present a new simple algorithm to infer binding-site motifs from PBM data. It outperforms prior art both in predicting in vitro binding and in producing motifs similar to literature motifs. Our results challenge previous claims that motifs with lower information content are better models for transcription-factor binding specificity. Moreover, we tested the effect of motif length and side positions flanking the "core" motif in the binding site. We show that side positions have a significant effect and should not be removed, as commonly done. A large drop in the results quality of all methods is observed between in vitro and in vivo binding prediction. The software is available on acgt.cs.tau.ac.il/rap.
KW - Motif finding
KW - Protein-binding microarray
KW - Protein-binding site
UR - http://www.scopus.com/inward/record.url?scp=84879971887&partnerID=8YFLogxK
U2 - 10.1089/cmb.2012.0253
DO - 10.1089/cmb.2012.0253
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84879971887
SN - 1066-5277
VL - 20
SP - 375
EP - 382
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 5
ER -