TY - JOUR
T1 - Analysis and refinement of the targeted QSPR method
AU - Kahrs, Olaf
AU - Brauner, Neima
AU - Cholakov, Georgi St
AU - Stateva, Roumiana P.
AU - Marquardt, Wolfgang
AU - Shacham, Mordechai
PY - 2008/7/24
Y1 - 2008/7/24
N2 - The targeted quantitative structure-property relationship (TQSPR) method of Brauner et al. [Brauner, N., Stateva, R. P., Cholakov, G. St., & Shacham, M. (2006). A structurally "targeted" QSPR method for property prediction. Industrial & Engineering Chemistry Research, 45, 8430-8437] is analyzed in this study with respect to its various algorithmic steps. It is shown that accurate QSPRs for predicting the critical temperature can be developed using a training set of 10 compounds that exhibit the highest level of similarity with the target compound (the compound for which a property has to be predicted). Alternative methods to compute the similarity of compounds and to assemble the training set are compared. The potential of a principal component analysis of the molecular descriptor data to improve the TQSPR performance is assessed and a new stopping criterion for QSPR refinement based on the discrepancy principle is introduced. It is shown that collinearity between molecular descriptors and the increase of the number of compounds and descriptors in the database do not have adverse effects on the performance of the TQSPR method.
AB - The targeted quantitative structure-property relationship (TQSPR) method of Brauner et al. [Brauner, N., Stateva, R. P., Cholakov, G. St., & Shacham, M. (2006). A structurally "targeted" QSPR method for property prediction. Industrial & Engineering Chemistry Research, 45, 8430-8437] is analyzed in this study with respect to its various algorithmic steps. It is shown that accurate QSPRs for predicting the critical temperature can be developed using a training set of 10 compounds that exhibit the highest level of similarity with the target compound (the compound for which a property has to be predicted). Alternative methods to compute the similarity of compounds and to assemble the training set are compared. The potential of a principal component analysis of the molecular descriptor data to improve the TQSPR performance is assessed and a new stopping criterion for QSPR refinement based on the discrepancy principle is introduced. It is shown that collinearity between molecular descriptors and the increase of the number of compounds and descriptors in the database do not have adverse effects on the performance of the TQSPR method.
KW - Cluster analysis
KW - Computational chemistry
KW - PCA
KW - Property prediction
KW - QSPR
KW - Stepwise regression
UR - http://www.scopus.com/inward/record.url?scp=42949173370&partnerID=8YFLogxK
U2 - 10.1016/j.compchemeng.2007.06.006
DO - 10.1016/j.compchemeng.2007.06.006
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:42949173370
SN - 0098-1354
VL - 32
SP - 1397
EP - 1410
JO - Computers and Chemical Engineering
JF - Computers and Chemical Engineering
IS - 7
ER -