Skip to main content
Log in

The prediction of crystal densities of a big data set using 1D and 2D structure features

  • Research
  • Published:
Structural Chemistry Aims and scope Submit manuscript

Abstract

A large data set of over 30 thousand organic compounds containing carbon, nitrogen, oxygen, fluorine, and hydrogen was collected, and the density of each compound was predicted by 1D descriptors derived from its molecular formula and 2D descriptors derived from its constitutional structural features. The 2D structural features are composed of Benson’s groups, corrected groups, and 2D structural features of the whole molecular structures. All the descriptors were extracted by an in-house program in Java with a function to ensure that each atom (or bond) of molecules is represented by Benson’s groups once for atom-based (or bond-based) descriptors. Partial least square (PLS) and random forest (RF) methods were used separately to build models to predict the density. Further, the variable selection of descriptors was performed by variable importance of RF. For partial least square, the combination of the models constructed by descriptors based on the atoms and the bonds achieved the best results in this paper: for the cross-validation of the training set, the Pearson correlation coefficient (R) = 0.9270, mean absolute error (MAE) = 0.0270 g·cm−3, and root mean squared error (RMSE) = 0.0426 g·cm−3; for the prediction of the test set, R = 0.9454, MAE = 0.0263 g·cm−3, and RMSE = 0.0375 g·cm−3.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

CCDC contains density data for various compounds. These data can be obtained from https://www.ccdc.cam.ac.uk/.

References

  1. Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim, Germany, p 165

    Book  Google Scholar 

  2. Kong DL, Luan Y, Zhao XW, Lu YH, Li W, Zhang QY, Pang AM (2023) Extended atom-based and bond-based group contribution descriptor and its application to melting point prediction of energetic compounds. Chemometr Intell Lab Syst 243

  3. Katritzky AR, Mu L, Karelson M (1996) A QSPR study of the solubility of gases and vapors in water. J Chem Inf Comput Sci 36:1162

    Article  CAS  Google Scholar 

  4. Li XL, Luan Y, Lu YL, Li W, Ma LH, Zhang QY, Pang AM (2023) Estimation of enthalpy of formation using Benson group addition and functional group correction. Chem Res Chin Univ 39(2):296–304

    Article  CAS  Google Scholar 

  5. Luan F, Liu H, Wen Y, Li Q, Zhang X, Sun J (2010) QSPR study for estimation of density of some aromatic explosives by multiple linear regression approach. Propellants, Explos, Pyrotech 35(2):169–174

    Article  CAS  Google Scholar 

  6. Liu ZY, Chen ZC (1995) Estimation of critical pressures of pure substances from data of density and vaporization heat of liquids. Chem Eng J Biochem Eng J 59:2

    Article  Google Scholar 

  7. Karelson M, Perkson A (1999) QSPR prediction of densities of organic liquids. Comput Chem 23(1):49–59

    Article  CAS  Google Scholar 

  8. Chen Z-X, Xiao H-M (2014) Quantum chemistry derived criteria for impact sensitivity. Propellants, Explos, Pyrotech 39(4):487–495

    Article  CAS  Google Scholar 

  9. Yan Q-L, Zeman S (2013) Theoretical evaluation of sensitivity and thermal stability for high explosives based on quantum chemistry methods: a brief review. Int J Quantum Chem 113(8):1049–1061

    Article  CAS  Google Scholar 

  10. Qiu L, Xiao H, Gong X, Ju X, Zhu W (2007) Crystal density predictions for nitramines based on quantum chemistry. J Hazard Mater 141(1):280–288

    Article  CAS  PubMed  Google Scholar 

  11. Osmont A, Catoire L, Goekalp I, Yang V (2007) Ab initio quantum chemical predictions of enthalpies of formation, heat capacities, and entropies of gas-phase energetic compounds. Combust Flame 151(1–2):262–273

    Article  ADS  CAS  Google Scholar 

  12. Zahrt AF, Henle JJ, Rose BT, Wang Y, Darrow WT, Denmark SE (2019) Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363(6424):247

    Article  Google Scholar 

  13. Lo Y-C, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23(8):1538–1546

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Mitchell JBO (2014) Machine learning methods in chemoinformatics. Wiley Interdiscip Rev Comput Mol Sci 4(5):468–481

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Keshavarz M, Ebadpour R, Jafari M (2020) A simple approach for predicting the density of high nitrogen organic compounds as materials for providing clean products and enormous energy release. Cent Eur J Energetic Mater 17(2):296–317

    Article  CAS  Google Scholar 

  16. Keshavarz MH, Pouretedal HR (2009) A reliable simple method to estimate density of nitroaliphatics, nitrate esters and nitramines. J Hazard Mater 169(1–3):158–169

    Article  CAS  PubMed  Google Scholar 

  17. Zohari N, Abrishami F, Zeynali V (2017) Using the QSPR approach for estimating the density of azole-based energetic compounds. Z Anorg Allg Chem 643(24):2124–2137

    Article  CAS  Google Scholar 

  18. Zohari N, Ranjbar I (2018) A novel correlation for predicting the density of tetrazole–N-oxide salts as green energetic materials through their molecular structure. Cent Eur J Energetic Mater 15(4):629–651

    Article  CAS  Google Scholar 

  19. Keshavarz MH, Pouretedal HR, Saberi E (2016) A simple method for prediction of density of ionic liquids through their molecular structure. J Mol Liq 216:732–737

    Article  CAS  Google Scholar 

  20. Ye C, Shreeve JM (2007) Rapid and accurate estimation of densities of room-temperature ionic liquids and salts. J Phys Chem A 111(8):1456–1461

    Article  CAS  PubMed  Google Scholar 

  21. Fathollahi M, Sajady H (2018) Prediction of density of energetic cocrystals based on QSPR modeling using artificial neural network. Struct Chem 29(4):1119–1128

    Article  CAS  Google Scholar 

  22. Zohari N, Sheibani N (2016) Link between density and molecular structures of energetic azido compounds as green plasticizers. Z Anorg Allg Chem 642(24):1472–1479

    Article  CAS  Google Scholar 

  23. Nguyen P, Loveland D, Kim JT, Karande P, Hiszpanski AM, Han TY (2021) Predicting energetics materials’ crystalline density from chemical structure by machine learning. J Chem Inf Model 61(5):2147–2158

    Article  CAS  PubMed  Google Scholar 

  24. Janbazi H, Hasemann O, Schulz C, Kempf A, Wlokas I, Peukert S (2018) Response surface and group additivity methodology for estimation of thermodynamic properties of organosilanes. Int J Chem Kinet 50(9):681–690

    Article  CAS  Google Scholar 

  25. Ince A, Carstensen H-H, Sabbe M, Reyniers M-F, Marin GB (2017) Group additive modeling of substituent effects in monocyclic aromatic hydrocarbon radicals. AIChE J 63(6):2089–2106

    Article  ADS  CAS  Google Scholar 

  26. Vandewiele NM, Van Geem KM, Reyniers M-F, Marin GB (2012) Genesys: kinetic model construction using chemo-informatics. Chem Eng J 207:526–538

    Article  Google Scholar 

  27. Sagadeev EV, Gimadeev AA, Barabanov VP (2010) The enthalpies of formation and sublimation of amino acids and peptides. Russ J Phys Chem A 84(2):209–214

    Article  CAS  Google Scholar 

  28. Verevkin SP (2002) Improved Benson increments for the estimation of standard enthalpies of formation and enthalpies of vaporization of alkyl ethers, acetals, ketals, and ortho esters. J Chem Eng Data 47(5):1071–1097

    Article  CAS  Google Scholar 

  29. Argoub K, Benkouider AM, Yahiaoui A, Kessas R, Guella S, Bagui F (2014) Prediction of standard enthalpy of formation in the solid state by a third-order group contribution method. Fluid Phase Equilib 380:121–127

    Article  CAS  Google Scholar 

  30. Kim CK, Cho SG, Kim CK, Park HY, Zhang H, Lee HW (2008) Prediction of densities for solid energetic molecules with molecular surface electrostatic potentials. J Comput Chem 29(11):1818–1824

    Article  CAS  PubMed  Google Scholar 

  31. Xiao KX, Chen MY, Zhao TF, Zhang QY (2018) Computer aided compound identification based on a highly selective topological index. Chemom Intell Lab Syst 178:56–64

    Article  CAS  Google Scholar 

  32. Xiao KX, Chen MY, Zhao TF, Zhang QY (2017) Highly selective atomic chiral index and its application to automatic assignment of chiral centers in chiral compounds. Chemom Intell Lab Syst 169:100–109

    Article  CAS  Google Scholar 

  33. Zhang QY, Wu CC, Suo JJ, Zhou YM, Xu L (2016) Development of a highly selective molecular topological index. J Chemom 30(2):70–74

    Article  Google Scholar 

  34. Wu T, Chen MY, Xiao KX, Zhou YM, Zhang QY (2019) Highly selective topological index of chemical bonds and its applications. Chem J Chin Univ 40:1158–1163

    CAS  Google Scholar 

  35. Muravyev NV, Suponitsky KY, Fedyanin IV, Fomenkov IV, Pivkina NP, Dalinger IL (2022) Bis-(2-difluoroamino-2,2-dinitroethyl)nitramine – energetic oxidizer and high explosive. Chem Eng J 449:137816

    Article  CAS  Google Scholar 

  36. Keshavarz MH (2007) Prediction of densities of acyclic and cyclic nitramines, nitrate esters and nitroaliphatic compounds for evaluation of their detonation performance. J Hazard Mater 143(1–2):437–442

    Article  CAS  PubMed  Google Scholar 

  37. Wiener H (1947) Structural determination of paraffin boiling points. J Am Chem Soc 69(1):17–20

    Article  CAS  PubMed  Google Scholar 

  38. Zhang W, Zhang T, Guo W, Wang L, Li Z, Zhang J (2019) Theoretical studies of pentazole-based compounds with high detonation performance. J Energ Mater 37(4):433–444

    Article  CAS  Google Scholar 

  39. Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3)

  40. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  41. The R Project for Statistical Computing, Vienna, Austria. http://www.R.project.org. Accessed on 2023-02-21

  42. Gu H, Cui YF, Xu L, Tu MY, Fu YJ, Fu HY, Zhou YP (2018) Bagging classification tree-based robust variable selection for radial basis function network modeling in metabonomics data analysis. Chemom Intell Lab Syst 174:76–84

    Article  CAS  Google Scholar 

  43. Lu Y, Matheus SE, Hiromasa K, Kimito F (2017) Detection of nonlinearity in soil property prediction models based on near-infrared spectroscopy. Chemom Intell Lab Syst 167:139–151

    Article  Google Scholar 

  44. Beskopylny AN, Stel’makh SA, Shcherban’ EM, Mailyan LR, Meskhi B, Razveeva I, Chernil’nik A, Beskopylny N (2022) Concrete strength prediction using machine learning methods CatBoost, k-nearest neighbors, support vector regression. Appl Sci 12(21):10864

  45. Nirwan A, Devi A, Ghule VD (2018) Assessment of density prediction methods based on molecular surface electrostatic potential. J Mol Model 24(7):166

    Article  PubMed  Google Scholar 

  46. Muravyev NV, Wozniak DR, Piercey DG (2022) Progress and performance of energetic materials: open dataset, tool, and implications for synthesis. J Mater Chem A 20(10):11054–11073

    Article  Google Scholar 

Download references

Funding

This research work was financially supported by the Open Research Fund Program of Science and Technology on Aerospace Chemical Power Laboratory (No.: 120201B01) and the National Natural Science Foundation of China (Nos.: 21875061 and 21975066).

Author information

Authors and Affiliations

Authors

Contributions

Data collection and analysis were performed by Xianlan Li, Dingling Kong, Yue Luan, Lili Guo, Yanhua Lu, Wei Li and Meng Tang. The first draft of the manuscript was written by Qingyou Zhang and Aimin Pang. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Qingyou Zhang or Aimin Pang.

Ethics declarations

Ethics approval

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (XLSX 101685 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Kong, D., Luan, Y. et al. The prediction of crystal densities of a big data set using 1D and 2D structure features. Struct Chem (2024). https://doi.org/10.1007/s11224-024-02279-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11224-024-02279-4

Keywords

Navigation