TY - JOUR
T1 - Property prediction by similarity of molecular structures - Practical application and consistency analysis
AU - Brauner, Neima
AU - Shacham, Mordechai
AU - St. Cholakov, Georgi
AU - Stateva, Roumiana P.
PY - 2005/10
Y1 - 2005/10
N2 - Linear dependency between vectors of molecular descriptors of various compounds is exploited to obtain high precision structure-structure correlations between a target compound and several predictive compounds. The linear structure-structure correlation is used for property prediction and consistency analysis of property data. Solid, liquid and gas phase properties can be predicted within the experimental error level. This method was applied to straight and branched alkane structures of increasing complexity. Adding more predictive compounds to the model generates more accurate models. However, there is a trade-off between the number of predictive compounds in the structure-structure correlation and the error accumulation in the property prediction stage. The needed number of predictive compounds increases with the complexity of the target structure. For simple targets, such as n-tetradecane, two predictive compounds are sufficient to obtain high precision structure-structure correlation, whereas for complex targets, such as pristane, seven aliphatic predictive compounds were required in order to obtain a medium precision correlation. If property data are available for both target and predictive compounds, the prediction error can serve as a measure of the consistency of the data. In most of the cases studied, the consistency levels obtained for the data taken from the DIPPR and NIST databases were higher than the reliability assigned by these sources. A few examples of inconsistent data are shown and potential causes for the inconsistency are provided. It is believed that the techniques presented will advance the property prediction considerably and property consistency analysis and will help understanding the complex between the molecular structure and the properties of pure compounds.
AB - Linear dependency between vectors of molecular descriptors of various compounds is exploited to obtain high precision structure-structure correlations between a target compound and several predictive compounds. The linear structure-structure correlation is used for property prediction and consistency analysis of property data. Solid, liquid and gas phase properties can be predicted within the experimental error level. This method was applied to straight and branched alkane structures of increasing complexity. Adding more predictive compounds to the model generates more accurate models. However, there is a trade-off between the number of predictive compounds in the structure-structure correlation and the error accumulation in the property prediction stage. The needed number of predictive compounds increases with the complexity of the target structure. For simple targets, such as n-tetradecane, two predictive compounds are sufficient to obtain high precision structure-structure correlation, whereas for complex targets, such as pristane, seven aliphatic predictive compounds were required in order to obtain a medium precision correlation. If property data are available for both target and predictive compounds, the prediction error can serve as a measure of the consistency of the data. In most of the cases studied, the consistency levels obtained for the data taken from the DIPPR and NIST databases were higher than the reliability assigned by these sources. A few examples of inconsistent data are shown and potential causes for the inconsistency are provided. It is believed that the techniques presented will advance the property prediction considerably and property consistency analysis and will help understanding the complex between the molecular structure and the properties of pure compounds.
KW - Computational chemistry
KW - Materials
KW - Parameter identification
KW - Process design
KW - Product design
KW - Property prediction
UR - http://www.scopus.com/inward/record.url?scp=22244442427&partnerID=8YFLogxK
U2 - 10.1016/j.ces.2005.03.069
DO - 10.1016/j.ces.2005.03.069
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:22244442427
SN - 0009-2509
VL - 60
SP - 5458
EP - 5471
JO - Chemical Engineering Science
JF - Chemical Engineering Science
IS - 20
ER -