Skip to main content
Log in

Group Contribution-Based Graph Convolution Network: Pure Property Estimation Model

  • Published:
International Journal of Thermophysics Aims and scope Submit manuscript

Abstract

Properties data for chemical compounds are essential information for the design and operation of chemical processes. Experimental values are reported in the literature, but that are too scarce compared with exploding demand for data. When the data are not available, various estimation methods are employed. The group contribution method is one of the standards and simple techniques used today. However, these methods have inherent inaccuracy due to the simplified representation of the molecular structure. More advanced methods are emerging, including improved molecular representations and handling experimental data. However, such processes also suffer from a lack of valid data for adjusting many parameters. We suggest a compromise between a complex machine learning algorithm and a linear group contribution method in this contribution. Instead of representing a molecule using a graph of atoms, we employed bulkier blocks—a graph of functional groups. The new approach dramatically reduced the number of adjustable parameters for machine learning. The result shows higher accuracy than the conventional methods. The whole process was also examined in various aspects—incorporating uncertainties in the data, the robustness of the fitting process, and detecting outlier data.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. K.G. Joback, R.C. Reid, Chem. Eng. Commun. 57, 233 (1987)

    Article  Google Scholar 

  2. A.L. Lydersen, Univ. Wisconsin Coll. Eng., Eng. Exp. Stn. Rep. 3, 1 (1955).

  3. K.M. Klincewicz, R.C. Reid, AIChE J. 30, 137 (1984)

    Article  Google Scholar 

  4. L. Constantinou, S.E. Prickett, M.L. Mavrovouniotis, Ind. Eng. Chem. Res. 32, 1734 (1993)

    Article  Google Scholar 

  5. R. Gani, Curr. Opin. Chem. Eng. 23, 184 (2019)

    Article  Google Scholar 

  6. L. Constantinou, R. Gani, AIChE J. 40, 1697 (1994)

    Article  Google Scholar 

  7. J. Marrero, R. Gani, Fluid Phase Equilib. 183–184, 183 (2001)

    Article  Google Scholar 

  8. J. Marrero-Morejón, E. Pardillo-Fontdevila, AIChE J. 45, 615 (1999)

    Article  Google Scholar 

  9. E.S. Goll, P.C. Jurs, J. Chem. Inf. Comput. Sci. 39, 1081 (1999)

    Article  Google Scholar 

  10. F. Gharagheizi, R.F. Alamdari, M.T. Angaji, Energy Fuels 22, 1628 (2008)

    Article  Google Scholar 

  11. F. Yusuf, T. Olayiwola, C. Afagwu, Fluid Phase Equilib. 531, 112898 (2021)

    Article  Google Scholar 

  12. C. Lu, Q. Liu, C. Wang, Z. Huang, P. Lin, L. He, Proc. Conf. AAAI Artif. Intell. 1, 1052 (2019)

    Google Scholar 

  13. S. Kearnes, K. McCloskey, M. Berndl, V. Pande, P. Riley, J. Comput. Aided. Mol. Des. 30, 595 (2016)

    Article  ADS  Google Scholar 

  14. B. Sanchez-Lengeling, J. N. Wei, B. K. Lee, R.C. Gerkin, A. Aspuru-Guzik, A. B. Wiltschko, arXiv preprint arXiv:1910.10685 (2019).

  15. O. Wieder, S. Kohlbacher, M. Kuenemann, A. Garon, P. Ducrot, T. Seidel, T. Langer, Drug Discov. Today Technol. 37, 1 (2020)

    Article  Google Scholar 

  16. M. D. Frenkel, R. D. Chirico, V. Diky, K. Kroenlein, C. Muzny, A. F. Kazakov, J. Kang, ThermoData Engine (TDE) Version 9.0 (Pure Compounds, Binary Mixtures, Ternary Mixtures, and Chemical Reactions); NIST Standard Reference Database 103b (NIST, 2014), https://www.nist.gov/publications/thermodata-engine-tde-version-90-pure-compounds-binary-mixtures-ternary-mixtures-and (Accessed June 16, 2022)

  17. A. Fredenslund, Vapor-liquid equilibria using UNIFAC: a group-contribution method (Elsevier, Amsterdam, 2012), p. 40

    Google Scholar 

  18. J. Gilmer, S.S. Schoenholz, P.F. Riley, O. Vinyals, G.E. Dahl, Proc. Mach. Learn. Res. 1, 1263 (2017)

    Google Scholar 

  19. O. Vinyals, S. Bengio, M. Kudlur, arXiv preprint arXiv:1511.06391 (2015)

  20. Y. Ma, S. Wang, C.C. Aggarwal, D. Yin, J. Tang, Proc. (Conf. Data Mining, SIAM Int, 2019). https://doi.org/10.1137/1.9781611975673.74

    Book  Google Scholar 

  21. S. Kim et al., Nucleic Acids Res. 47, D1102 (2019). https://doi.org/10.1093/nar/gky1033

    Article  Google Scholar 

  22. L. Kirkup, R.B. Frenkel, An Introduction to Uncertainty in Measurement: Using the GUM (Guide to the Expression of Uncertainty in Measurement) (Cambridge University Press, Cambridge, 2006), p. 97

    Book  Google Scholar 

  23. D. P. Kingma, J. Ba, arXiv preprint arXiv:1412.6980 (2014) https://doi.org/10.48550/arXiv.1412.6980

  24. J.R. Rowley, R.L. Rowley, W.V. Wilding, Proc. Saf. Prog. 29(4), 353 (2010)

    Article  Google Scholar 

  25. V. Diky, C. Muzny, A. Smolyanitsky, A. Bazyleva, R. Chirico, J. Magee, Y. Paulechka, A. Kazakov, S. Townsend, E. Lemmon, M. D. Frenkel, K. Kroenlein, ThermoData Engine (TDE) Version 10.1 (Pure Compounds, Binary Mixtures, Ternary Mixtures, and Chemical Reactions): NIST Standard Reference Database 103b (NIST, 2016). https://www.nist.gov/publications/thermodata-engine-tde-version-101-pure-compounds-binary-mixtures-ternary-mixtures-and. Accessed June 15, 2022

  26. C.L. Yaws, Thermophysical Properties of Chemicals and Hydrocarbons (William Andrew, New York, 2008), pp. 75–84

    Google Scholar 

  27. J.M. Kuchta, Investigation of Fire and Explosion Accidents in the Chemical, Mining, and Fuel-Related Industries-a Manual (Bulletin (Bureau of Mines, Washington DC, 1986)

    Google Scholar 

  28. J. Bond, Sources of Ignition: Flammability Characteristics of Chemicals and Products (Elsevier, Amsterdam, 2017)

    Google Scholar 

  29. J.S. Brecher, Chimia 52, 658 (1998). https://doi.org/10.2533/chimia.1998.658

    Article  Google Scholar 

  30. W.I. Milwaukee, Catalog Handbook of Fine Chemicals (Aldrich Chemical Company, Burlington, 1990)

    Google Scholar 

  31. D. Kerr, Fire Protection Guide to Hazardous Materials, 10th edn. (National Fire Protection Association, Quincy, 1991)

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Research Foundation of Korea (NRF) grants funded by the Korean government (MSIT) [grant numbers NRF- 2021R1A5A6002853 and NRF-2019M3E6A1064876]

Funding

None.

Author information

Authors and Affiliations

Authors

Contributions

SYH contributed to method implementation, data preparation, and machine learning and wrote the main manuscript. JWK contributed to supervision, manuscript preparation, and manuscript review.

Corresponding author

Correspondence to Jeong Won Kang.

Ethics declarations

Competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

Description of the Molecular Graph Convolution Network (M-gcn) used in this work.

Similar to the proposed model GC-gcn, the compound is represented in molecular graph \({\varvec{G}} {^{\prime}}(d{^{\prime}},V{^{\prime}}, E{^{\prime}})\). \({\varvec{d}} {^{\prime}}\) represents the dimension of the graph, \({\varvec{V}} {^{\prime}}\) represents the vertex, and \({\varvec{E}} {^{\prime}}\) represents the edge. A vertex should not need to be connected to another vertex but must be connected once in the entire dimension \({\varvec{d}} {^{\prime}}\). The atom feature (or called as node feature) is represented at \({\varvec{G}} {^{\prime}}(V{^{\prime}}),\) while the edge feature is represented at \({\varvec{G}} {^{\prime}}(d{^{\prime}}, E{^{\prime}})\)

The rows of the \({\varvec{V}} {^{\prime}}\) indicate the vertex, and the columns indicate the atom types of the vertex. For consistency of model inputs, the shape of the atom feature matrix is fixed to \(({N}_{max}\times {N}_{A})\), where \({N}_{max}\) represents the maximum number of atoms and \({N}_{A}\) represents the number of atom descriptors. Dimension \({\varvec{d}} {^{\prime}}\) represents the type of connection between vertices. The description of atoms and bonds is summarized in Table 15. To consider the atom itself, the identity matrix is generally added to \({\varvec{G}} {^{\prime}}(d{^{\prime}}, E{^{\prime}})\). The functions of multi-dimensional graph convolution networks are also designed to have values between -1 and 1. So matrix must be normalized with symmetrical Laplacian.

$${\varvec{B}}{^{\prime}}={{\varvec{D}} {^{\prime}}}^{-1/2}({\varvec{G}} {^{\prime}}(d{^{\prime}}, E{^{\prime}})+{\varvec{I}}){{\varvec{D}} {^{\prime}}}^{-1/2}$$
(11)

where \({\varvec{B}}{^{\prime}}\) is called the bond feature matrix, \({\varvec{D}} {^{\prime}}\) is the degree matrix of \({\varvec{G}} {^{\prime}}(d{^{\prime}}, E{^{\prime}})+{\varvec{I}},\) and I is the identity matrix.

Table 15 Descriptors of the molecule used in M-gcn

1.1 An Example of a Molecular Graph Interpretation of a Molecule—Glycolonitrile

The maximum number of groups \({N}_{max}\) was set to 4, and the descriptor we used in this example is shown in Table 16.

Table 16 Simplified description used in M-gcn example

First, atoms are numbered. The process is shown in Fig. 6.

Fig. 6
figure 6

The atom labeling result of glycolonitrile

From Fig. 6, we can obtain the atom feature matrix as:

$${\varvec{V}} {^{\prime}}={\varvec{V}} {^{\prime}}\quad \begin{array}{cccc}& C& O& N\\ \{1\}& 0& 0& 1\\ \{2\}& 1& 0& 0\\ \{3\}& 1& 0& 0\\ \{4\}& 0& 1& 0\end{array}$$

\({\varvec{V}} {^{\prime}}\) is also can be expressed as \({{\varvec{V}} {^{\prime}}}^{0}\), where 0 indicates that any processing has been held. The bond feature matrix \({\varvec{B}} {^{\prime}}\)(d', j, k), where d' is the dimension, j and k are the indexes of the atom, can be obtained by Eq. 11 as:

$${{\varvec{G}}}^{ {{\prime}}}\left(1,{E}^{{\prime}}\right)=\left(\begin{array}{cccc}0& 0& 0& 0\\ 0& 0& 1& 0\\ 0& 1& 0& 1\\ 0& 0& 1& 0\end{array}\right), \quad{\varvec{B}} {^{\prime}}=\left(\begin{array}{cccc}0.50& 0& 0& 0\\ 0& 0.50& 0.41& 0\\ 0& 0.41& 0.33& 0.41\\ 0& 0& 0.41& 0.50\end{array}\right)$$
$${{\varvec{G}}}^{ {{\prime}}}\left(3,{E}^{{\prime}}\right)=\left(\begin{array}{cccc}0& 1& 0& 0\\ 1& 0& 0& 0\\ 0& 0& 0& 0\\ 0& 0& 0& 0\end{array}\right), \quad{\varvec{B}} {^{\prime}}=\left(\begin{array}{cccc}0.5& 0.5& 0& 0\\ 0.5& 0.5& 0& 0\\ 0& 0& 1& 0\\ 0& 0& 0& 1\end{array}\right)$$

Because there is no double bonds and rings in glycolonitrile, the dimensions 2,4 and 5 of \({{\varvec{G}}}^{ {{\prime}}}\left({\varvec{d}},{E}^{{\prime}}\right)\) are zero matrix.

$${{\varvec{G}}}^{ {{\prime}}}\left(2,{E}^{{\prime}}\right)\quad {{\varvec{G}}}^{ {{\prime}}}\left(4,{E}^{{\prime}}\right)= {{\varvec{G}}}^{ {{\prime}}}\left(5,{E}^{{\prime}}\right)= \left(\begin{array}{cccc}0& 0& 0& 0\\ 0& 0& 0& 0\\ 0& 0& 0& 0\\ 0& 0& 0& 0\end{array}\right)$$
$${\varvec{B}} {^{\prime}}={{\varvec{B}} {^{\prime}}}_{(4)}= {{\varvec{B}} {^{\prime}}}_{(5)}= \left(\begin{array}{cccc}1& 0& 0& 0\\ 0& 1& 0& 0\\ 0& 0& 1& 0\\ 0& 0& 0& 1\end{array}\right)$$

1.2 Multi-Dimensional Graph Convolution

In the multi-dimensional graph convolution process, or message passing phase, the contributions of atoms are updated according to their neighbor atoms and bond types. Unlike a graph convolution network, this procedure consists of two steps. First, the atoms are updated to each dimension. The result of this process is called the message passing function \({{\varvec{H}} {^{\prime}}}^{k+1}\), which can be expressed as follows:

$${{{\varvec{H}} {^{\prime}}}^{k+1}}_{(i)}=ReLU({{{\varvec{B}}}^{ {{\prime}}}}_{\left({\varvec{i}}\right)}\times {{{\varvec{V}}}^{ {{\prime}}}}^{{\varvec{k}}}\times {{{\varvec{W}}}^{ {{\prime}}}}^{k+1}+{{{\varvec{b}}}^{ {{\prime}}}}^{k+1})$$
(12)

where \({{\varvec{W}} {^{\prime}}}^{k}\) is a weight and \({{\varvec{b}} {^{\prime}}}^{k}\) is a bias for the layer k. ReLU is the Rectified Linear Unit function. In this work, we assume that there is no interaction across the graph dimensions. Then, the contribution of atoms \({{\varvec{V}} {^{\prime}}}^{{\varvec{k}}}\) is updated as:

$${{\varvec{V}} {^{\prime}}}^{{\varvec{k}}+1}=ReLU({{{\varvec{W}}}^{ {{\prime}} {^{\prime}}}}^{k+1}\times {{{{Concat}_{i=1}^{{\varvec{D}} {^{\prime}}}{\varvec{H}}}^{ {{\prime}}}}^{k+1}}_{(i)})$$
(13)

where \(Concat\) is concatenating the all the matrix, \({{\varvec{W}} {^{\prime}} {^{\prime}}}^{k}\) is a weight, ReLU is the Rectified Linear Unit function.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hwang, S.Y., Kang, J.W. Group Contribution-Based Graph Convolution Network: Pure Property Estimation Model. Int J Thermophys 43, 136 (2022). https://doi.org/10.1007/s10765-022-03060-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10765-022-03060-7

Keywords

Navigation