Abstract
A large data set of over 30 thousand organic compounds containing carbon, nitrogen, oxygen, fluorine, and hydrogen was collected, and the density of each compound was predicted by 1D descriptors derived from its molecular formula and 2D descriptors derived from its constitutional structural features. The 2D structural features are composed of Benson’s groups, corrected groups, and 2D structural features of the whole molecular structures. All the descriptors were extracted by an in-house program in Java with a function to ensure that each atom (or bond) of molecules is represented by Benson’s groups once for atom-based (or bond-based) descriptors. Partial least square (PLS) and random forest (RF) methods were used separately to build models to predict the density. Further, the variable selection of descriptors was performed by variable importance of RF. For partial least square, the combination of the models constructed by descriptors based on the atoms and the bonds achieved the best results in this paper: for the cross-validation of the training set, the Pearson correlation coefficient (R) = 0.9270, mean absolute error (MAE) = 0.0270 g·cm−3, and root mean squared error (RMSE) = 0.0426 g·cm−3; for the prediction of the test set, R = 0.9454, MAE = 0.0263 g·cm−3, and RMSE = 0.0375 g·cm−3.
Similar content being viewed by others
Data availability
CCDC contains density data for various compounds. These data can be obtained from https://www.ccdc.cam.ac.uk/.
References
Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim, Germany, p 165
Kong DL, Luan Y, Zhao XW, Lu YH, Li W, Zhang QY, Pang AM (2023) Extended atom-based and bond-based group contribution descriptor and its application to melting point prediction of energetic compounds. Chemometr Intell Lab Syst 243
Katritzky AR, Mu L, Karelson M (1996) A QSPR study of the solubility of gases and vapors in water. J Chem Inf Comput Sci 36:1162
Li XL, Luan Y, Lu YL, Li W, Ma LH, Zhang QY, Pang AM (2023) Estimation of enthalpy of formation using Benson group addition and functional group correction. Chem Res Chin Univ 39(2):296–304
Luan F, Liu H, Wen Y, Li Q, Zhang X, Sun J (2010) QSPR study for estimation of density of some aromatic explosives by multiple linear regression approach. Propellants, Explos, Pyrotech 35(2):169–174
Liu ZY, Chen ZC (1995) Estimation of critical pressures of pure substances from data of density and vaporization heat of liquids. Chem Eng J Biochem Eng J 59:2
Karelson M, Perkson A (1999) QSPR prediction of densities of organic liquids. Comput Chem 23(1):49–59
Chen Z-X, Xiao H-M (2014) Quantum chemistry derived criteria for impact sensitivity. Propellants, Explos, Pyrotech 39(4):487–495
Yan Q-L, Zeman S (2013) Theoretical evaluation of sensitivity and thermal stability for high explosives based on quantum chemistry methods: a brief review. Int J Quantum Chem 113(8):1049–1061
Qiu L, Xiao H, Gong X, Ju X, Zhu W (2007) Crystal density predictions for nitramines based on quantum chemistry. J Hazard Mater 141(1):280–288
Osmont A, Catoire L, Goekalp I, Yang V (2007) Ab initio quantum chemical predictions of enthalpies of formation, heat capacities, and entropies of gas-phase energetic compounds. Combust Flame 151(1–2):262–273
Zahrt AF, Henle JJ, Rose BT, Wang Y, Darrow WT, Denmark SE (2019) Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363(6424):247
Lo Y-C, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23(8):1538–1546
Mitchell JBO (2014) Machine learning methods in chemoinformatics. Wiley Interdiscip Rev Comput Mol Sci 4(5):468–481
Keshavarz M, Ebadpour R, Jafari M (2020) A simple approach for predicting the density of high nitrogen organic compounds as materials for providing clean products and enormous energy release. Cent Eur J Energetic Mater 17(2):296–317
Keshavarz MH, Pouretedal HR (2009) A reliable simple method to estimate density of nitroaliphatics, nitrate esters and nitramines. J Hazard Mater 169(1–3):158–169
Zohari N, Abrishami F, Zeynali V (2017) Using the QSPR approach for estimating the density of azole-based energetic compounds. Z Anorg Allg Chem 643(24):2124–2137
Zohari N, Ranjbar I (2018) A novel correlation for predicting the density of tetrazole–N-oxide salts as green energetic materials through their molecular structure. Cent Eur J Energetic Mater 15(4):629–651
Keshavarz MH, Pouretedal HR, Saberi E (2016) A simple method for prediction of density of ionic liquids through their molecular structure. J Mol Liq 216:732–737
Ye C, Shreeve JM (2007) Rapid and accurate estimation of densities of room-temperature ionic liquids and salts. J Phys Chem A 111(8):1456–1461
Fathollahi M, Sajady H (2018) Prediction of density of energetic cocrystals based on QSPR modeling using artificial neural network. Struct Chem 29(4):1119–1128
Zohari N, Sheibani N (2016) Link between density and molecular structures of energetic azido compounds as green plasticizers. Z Anorg Allg Chem 642(24):1472–1479
Nguyen P, Loveland D, Kim JT, Karande P, Hiszpanski AM, Han TY (2021) Predicting energetics materials’ crystalline density from chemical structure by machine learning. J Chem Inf Model 61(5):2147–2158
Janbazi H, Hasemann O, Schulz C, Kempf A, Wlokas I, Peukert S (2018) Response surface and group additivity methodology for estimation of thermodynamic properties of organosilanes. Int J Chem Kinet 50(9):681–690
Ince A, Carstensen H-H, Sabbe M, Reyniers M-F, Marin GB (2017) Group additive modeling of substituent effects in monocyclic aromatic hydrocarbon radicals. AIChE J 63(6):2089–2106
Vandewiele NM, Van Geem KM, Reyniers M-F, Marin GB (2012) Genesys: kinetic model construction using chemo-informatics. Chem Eng J 207:526–538
Sagadeev EV, Gimadeev AA, Barabanov VP (2010) The enthalpies of formation and sublimation of amino acids and peptides. Russ J Phys Chem A 84(2):209–214
Verevkin SP (2002) Improved Benson increments for the estimation of standard enthalpies of formation and enthalpies of vaporization of alkyl ethers, acetals, ketals, and ortho esters. J Chem Eng Data 47(5):1071–1097
Argoub K, Benkouider AM, Yahiaoui A, Kessas R, Guella S, Bagui F (2014) Prediction of standard enthalpy of formation in the solid state by a third-order group contribution method. Fluid Phase Equilib 380:121–127
Kim CK, Cho SG, Kim CK, Park HY, Zhang H, Lee HW (2008) Prediction of densities for solid energetic molecules with molecular surface electrostatic potentials. J Comput Chem 29(11):1818–1824
Xiao KX, Chen MY, Zhao TF, Zhang QY (2018) Computer aided compound identification based on a highly selective topological index. Chemom Intell Lab Syst 178:56–64
Xiao KX, Chen MY, Zhao TF, Zhang QY (2017) Highly selective atomic chiral index and its application to automatic assignment of chiral centers in chiral compounds. Chemom Intell Lab Syst 169:100–109
Zhang QY, Wu CC, Suo JJ, Zhou YM, Xu L (2016) Development of a highly selective molecular topological index. J Chemom 30(2):70–74
Wu T, Chen MY, Xiao KX, Zhou YM, Zhang QY (2019) Highly selective topological index of chemical bonds and its applications. Chem J Chin Univ 40:1158–1163
Muravyev NV, Suponitsky KY, Fedyanin IV, Fomenkov IV, Pivkina NP, Dalinger IL (2022) Bis-(2-difluoroamino-2,2-dinitroethyl)nitramine – energetic oxidizer and high explosive. Chem Eng J 449:137816
Keshavarz MH (2007) Prediction of densities of acyclic and cyclic nitramines, nitrate esters and nitroaliphatic compounds for evaluation of their detonation performance. J Hazard Mater 143(1–2):437–442
Wiener H (1947) Structural determination of paraffin boiling points. J Am Chem Soc 69(1):17–20
Zhang W, Zhang T, Guo W, Wang L, Li Z, Zhang J (2019) Theoretical studies of pentazole-based compounds with high detonation performance. J Energ Mater 37(4):433–444
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3)
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
The R Project for Statistical Computing, Vienna, Austria. http://www.R.project.org. Accessed on 2023-02-21
Gu H, Cui YF, Xu L, Tu MY, Fu YJ, Fu HY, Zhou YP (2018) Bagging classification tree-based robust variable selection for radial basis function network modeling in metabonomics data analysis. Chemom Intell Lab Syst 174:76–84
Lu Y, Matheus SE, Hiromasa K, Kimito F (2017) Detection of nonlinearity in soil property prediction models based on near-infrared spectroscopy. Chemom Intell Lab Syst 167:139–151
Beskopylny AN, Stel’makh SA, Shcherban’ EM, Mailyan LR, Meskhi B, Razveeva I, Chernil’nik A, Beskopylny N (2022) Concrete strength prediction using machine learning methods CatBoost, k-nearest neighbors, support vector regression. Appl Sci 12(21):10864
Nirwan A, Devi A, Ghule VD (2018) Assessment of density prediction methods based on molecular surface electrostatic potential. J Mol Model 24(7):166
Muravyev NV, Wozniak DR, Piercey DG (2022) Progress and performance of energetic materials: open dataset, tool, and implications for synthesis. J Mater Chem A 20(10):11054–11073
Funding
This research work was financially supported by the Open Research Fund Program of Science and Technology on Aerospace Chemical Power Laboratory (No.: 120201B01) and the National Natural Science Foundation of China (Nos.: 21875061 and 21975066).
Author information
Authors and Affiliations
Contributions
Data collection and analysis were performed by Xianlan Li, Dingling Kong, Yue Luan, Lili Guo, Yanhua Lu, Wei Li and Meng Tang. The first draft of the manuscript was written by Qingyou Zhang and Aimin Pang. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, X., Kong, D., Luan, Y. et al. The prediction of crystal densities of a big data set using 1D and 2D structure features. Struct Chem (2024). https://doi.org/10.1007/s11224-024-02279-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11224-024-02279-4