TY - JOUR
T1 - Investigation of the relationships between molecular structure, molecular descriptors, and physical properties
AU - Paster, Inga
AU - Shacham, Mordechai
AU - Brauner, Neima
PY - 2009
Y1 - 2009
N2 - The use of databases containing thousands of molecular descriptors, including 3-D descriptors, for predicting physical properties is discussed. It is shown that the use of 3-D descriptors for property prediction via quantitative structure property relations (QSPR) limits considerably their applicability, as 3-D structure files must be obtained from the same reliable source for all predictive and target compounds. A modified targeted QSPR (TQSPR) algorithm is presented, which includes a new technique for selecting training sets belonging to the homologous series of the target compound (if such compounds are available in the database). The method is employed for predicting seven properties for five homologous series. It is shown that most properties can be predicted on experimental error level, using training sets of 10 compounds and a maximum of 2 (non 3-D) descriptors. The exclusion of the 3-D descriptors enhances considerably the applicability of the TQSPRs, and the use of a small number of descriptors reduces the probability of "chance correlations".
AB - The use of databases containing thousands of molecular descriptors, including 3-D descriptors, for predicting physical properties is discussed. It is shown that the use of 3-D descriptors for property prediction via quantitative structure property relations (QSPR) limits considerably their applicability, as 3-D structure files must be obtained from the same reliable source for all predictive and target compounds. A modified targeted QSPR (TQSPR) algorithm is presented, which includes a new technique for selecting training sets belonging to the homologous series of the target compound (if such compounds are available in the database). The method is employed for predicting seven properties for five homologous series. It is shown that most properties can be predicted on experimental error level, using training sets of 10 compounds and a maximum of 2 (non 3-D) descriptors. The exclusion of the 3-D descriptors enhances considerably the applicability of the TQSPRs, and the use of a small number of descriptors reduces the probability of "chance correlations".
UR - http://www.scopus.com/inward/record.url?scp=71649096025&partnerID=8YFLogxK
U2 - 10.1021/ie801318y
DO - 10.1021/ie801318y
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:71649096025
SN - 0888-5885
VL - 48
SP - 9723
EP - 9734
JO - Industrial and Engineering Chemistry Research
JF - Industrial and Engineering Chemistry Research
IS - 21
ER -