An artificial neural network model for the unary description of pure substances and its application on the thermodynamic modelling of pure iron

The aim of this work is to introduce a novel approach for the universal description of the thermodynamic functions of pure substances on the basis of artificial neural networks. The proposed approximation method is able to describe the thermodynamic functions (Cp(T),S(T),H(T)-H(Tref),G(T)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{p}(T), S(T), H(T)-H(T_{\mathrm{ref}}), G(T)$$\end{document}) of the different phases of unary material systems in a wide temperature range (between 0 and 6000 K). Phase transition temperatures and the respective enthalpies of transformation, which are computationally determined by the minimization of the Gibbs free energy, are also approximated. This is achieved by using artificial neural networks as models for the thermodynamic functions of the individual phases and by expressing the thermodynamic quantities in terms of the free network parameters. The resulting expressions are then optimized with machine learning algorithms on the basis of measurement data. A physical basis for the resulting approximation is given by the use of, among others, Planck–Einstein functions as activation function of the neurons of the network. This article provides a description of the method and as an example of a specific application the approximation of the thermodynamic functions of the different phases of pure iron. The article focuses on the problem of the representation of thermodynamic data and their practical application.


Introduction
The approximation of thermodynamic functions is an important task in material science and engineering. A consistent description of thermodynamic data helps to understand, improve and develop materials. With methods and formalisms from the field of computational thermodynamics, e.g. the CALPHAD method (Lukas et al. 2007;Saunders and Miodownik 1998), the numerical calculation of stable phases, phase equilibria, phase transitions, whole phase diagrams and phase diagram-related data is possible. The Gibbs energy G is the central quantity of such calculations, and an exact description of G in terms of the temperature T and Institute for Applied Thermo-and Fluiddynamics, Mannheim -University of Applied Sciences, Mannheim, Germany the systems composition is the key for a reliable description of material systems and their properties. G and other thermodynamic quantities are related to one another, and the different quantities of interest can be expressed as sets of partial derivatives of G. Expressions for G are usually derived by approximating the temperature-dependent isobaric heat capacity C p based on suitable models (Dinsdale 1991;Chase et al. 1995;Chen and Sundman 2001;Roslyakova et al. 2016), where the free model parameters are optimized to fit measurement data. G is then calculated by a subsequent integration of the optimized model for C p .
Artificial neural networks (ANN) can be used for function regression. With the property of being universal function approximators as described in Hornik (1991) ANNs have the ability to map a set of independent input variables to a set of dependent output variables and thus can detect and model any linear or nonlinear relationship between these. In recent years ANNs are used to solve tasks in science and engineering in many scientific fields and among them in material science. ANNs have been used extensively to model physical properties of materials (Hemmat 2017;Hemmat Esfe et al. 2016). The latest work of Avrutskiy (2017) delivers the framework for approximating functions and their derivatives simultaneously, and it can therefore be used to solve this specific task. In the present work the question is therefore investigated if thermodynamic functions can be modelled on the basis of ANNs. To answer this question a neural network model for the approximation of thermodynamic functions is introduced. The thermodynamic functions of iron between 0 and 6000 K are approximated and validated as a challenging example.

Modelling of thermodynamic functions
Unary systems consist of only one compound. The thermodynamic state variables depend therefore only on the pressure p and temperature T . The Gibbs energy G(T , p) is the central quantity when calculating phase diagrams or phase diagramrelated data. It is given in Eq. (2.1) by (2.1) A relationship between G and the entropy S can be established by the derivation of G w.r.t. T at constant pressure as given in Eq. (2.2) by (2.2) A relationship between G and the enthalpy H can also be derived and is known as Gibb-Helmholtz equation as given in Eq. (2.3) by The isobaric heat capacity C p , in the following denoted as C, can also be derived from the Gibbs energy and is given in Eq. (2.4) by (2.4) Equations (2.1)-(2.4) can be used to characterize a material system. In the present work this resulting set of partial derivatives is modelled and solved on the basis of artificial neural networks.
G(T , p) as a thermodynamic potential can be used to calculate the stable phase for a chosen state, phase equilibria and whole phase diagrams. A detailed description of the underlying physical principles is above the scope of this work.

Neural networks for the approximation of functions and their derivatives
Artificial neural networks (ANNs) consist of a large number of elementary processing units, the so-called neurons. The different neurons are interconnected to each other, and the connections are weighted individually. Neural networks can be trained to a desired behaviour by trial-and-error procedures based on training data. The purpose of the training is to adjust the individual weights of the network, so the network exhibits a desired behaviour. The neural network model presented in this work is a feedforward network and is trained with the resilient propagation (rProp) learning algorithm. Expressions for the derivatives of the networks output w.r.t. its inputs, the calculation of the different thermodynamic quantities as well as the training procedure itself will be discussed in the following section.

Notation
The proposed model can be used with an arbitrary number of layers but will be fixed to three consisting of one input layer (1), one hidden layer (2) and one output layer (3). Neurons of the different layers of a network will be denoted by the Greek letters α, for layer 1 and β for layer 2. The general structure of an arbitrary neuron β is shown in Fig. 1. The number of neurons for the output layer will be fixed to 1. Network parameters are numerated due to their multiple use. The weight matrices connecting the neurons of different layers will be in this sense referred to as W 21,βα that connects a neuron α from layer 1 with a neuron β from layer 2 and W 32,1β , respectively. Vector-valued network parameters apply only on one layer and are addressed by the number of the layer and one Greek letter for the neuron, like b 2,β . The value a neuron receives before applying its activation function will be denoted as net input s. The net input of each layer is organized as vector s t 2,β where the superscript t runs through the number of training examples in general. The absolute number of iterable objects will be denoted by amount lines, like |γ | or |t|. The activation function of the input and output layer is linear. The activation function of the hidden layer will be denoted as f for the general description and will be specified later. The activation functions are applied on every neuron of a layer. Expressions for the neural network representations of the different thermodynamic quantities are in the following denoted by the subscript N . The overall input pattern is denoted as x t α and the overall output pattern as y t .

Representation of thermodynamic functions with neural network variables
Feedforward networks are suitable networks for solving regression tasks (Hornik 1991). Every layer is fully connected with its adjacent layer, and the information flows monodirectionally from the input through the hidden layer to the output of the network. In the following section the net input of a arbitrary layer, e.g. s t 2,β , is defined as the weighted sum of all input values and an additional threshold value b 2,β as given in Eq. (2.5) by Using this definition the overall output of the network y t as function of its input x t for any input pattern t is calculated as given in Eq. (2.6) by Expressions for the first and the second derivative of y t w.r.t. x t are derived by the application of the chain rule as given in Eqs. (2.7) and (2.8) by and, respectively, (2.8) Using Eqs. (2.6)-(2.8) one can express the Gibbs energy and the related quantities given by Eqs. (2.1)-(2.4) solely by neural network variables. In the proposed model, the input to the network is directly given by the temperature T and the output of the network y t is directly connected to the Gibbs energy G. Therefore, the neural network representation G t N of the Gibbs energy G t at a given temperature T is calculated as given in Eq. (2.9) by The entropy as defined in Eq. (2.2) is calculated as the first derivative of G w.r.t. T . The neural network representation of the entropy S N is therefore given as in Eq. (2.10) by The neural network representation of the enthalpy from Eq. (2.3) is given as in Eq. (2.11) by And finally the neural network representation of the isobaric heat capacity from Eq. (2.4) is given as in Eq. 2.12 by Equations (2.9)-(2.12) consist now solely of neural network parameters and reflect at the same time the physical relations of the different thermodynamic quantities. As described in the work of Avrutskiy (2017) the expressions for the derivatives of a neural network outputs w.r.t. its inputs can be considered as standalone neural networks and could be trained individually. Due to the fact that each of the expressions above is formed from the same network parameters the calculation of G N , S N and H N is also possible when the training data only contain values for C. This results in two major advantages: firstly, all available measurement data can be used for the modelling process and, secondly, the f W 21,β1 Adding node

Activationfunction
Weights Inputs Threshold s 2,β Net-input self-consistency of resulting network for the different thermodynamic quantities is always fulfilled.

Activation function
Activation functions within neural network regression predefines the shape of the function approximation. In general any shape can be approximated by a neural network using the standard sigmoid activation function and a sufficient number of layers and neurons as proven in Hornik (1991). A major disadvantage is the extrapolation ability of a neural network. The only information about the function to be approximated is provided by the training data. Without further assumptions and constrains a reliable prediction of function values outside the borders of the training data is not possible. When approximating thermodynamic functions and especially when calculating stable phases, phase transitions or phase equilibria a model must deliver reasonable values for the thermodynamic functions of the different phases even outside their stable temperature regimes. The extrapolation abilities of the proposed model are provided by the use of two different activation functions f a and f b as also shown in Fig. 2: 14) The expression from Eq. (2.13) is derived by the integration of the Chen and Sundman model for the isobaric heat capacity, where each term describes a different physical contribution. For a detailed description of this model the author refers to Chen and Sundman (2001). By using Eq. (2.13) as an activation function a physical basis for the approxima-tion of thermodynamic functions is provided which describes the general shape of the approximated thermodynamic functions. In Eq. (2.13) R stands for the universal gas constant.
The parameters E 0 , E , a and b are treated as additional network parameters and are optimized during the learning process. In many material systems, such as iron, which will be approximated in this work, effects, for example magnetic ordering effects, can occur locally and affect the thermodynamic functions. The local deviations are approximated by a second network with its activation function f b as given in Eq. (2.14). f b is based on the so-called softplus activation function (Nair and Hinton 2010). The reason to use Eq. 2.14 lies in the bell shape of the second derivative of f b w.r.t. s that allows to model local and peak-shaped effects in the heat capacity curve.

Training neural networks and derivatives with resilient propagation
The training data consist of sets of pairs { x t , G t } defining a desired network output G t at given x t . When dealing with partial derivatives there need to be additional data provided for every unknown quantity of the system. The sets The main goal of the training process is now to find an optimal set of parameters so that the error defined in Eq. (2.15) reaches a minimum. By repeated application of the chain rule ∂ E/∂ p α can be calculated. A step in the opposite gradient direction is then performed to move towards the minimum of the error surface. The factors q G , q S , q H and q C are introduced to equalize the contributions of the different thermodynamic quantities, which differ in orders of magnitude, to the error expression. Usually the magnitude and the sign of the different partial derivatives are used for the optimization of the network parameters. This can be a problem when the error surface described by E is too rugged and nonlinear. Under such conditions the magnitude of ∂ E/∂ p α can vary by orders of magnitude from one iteration to the next resulting in a unstable learning process and even in the failing  Keesom and Kurrelmeyer (1939) Eucken and Werth (1930) Stepakoff and Kaufman (1968) Kelley (1943) Simon and Swain (1922) Griffiths and Griffiths (1914) Reddy and Redyy (1974) ) Kollie et al. (1969) Awbery and Griffiths (1940) Pallister (1949) Keesom and Kurrelmeyer (1939) Eucken and Werth (1930) Stepakoff and Kaufman (1968) Kelley (1943) Simon and Swain (1922) Griffiths and Griffiths (1914) Reddy and Redyy (1974) Lederman et al. (1974) Kraftmakher and Romashina (1965) Rodebush and Michalek (1925) 0 20 40 0 10 (b) Fig. 4 a The neural network representation of the isobaric heat capacity of Fe BCC and the underlying trainings data, b relative differences C/C = (C N − C)/C N of the trainings data from the neural network approximation for Fe BCC of the optimization routine. The resilient propagation algorithm (rProp) (Riedmiller and Braun 1993) was developed to become independent from the magnitude of ∂ E/∂ p α . rProp adapts local stepsizes based only on the sign of ∂ E/∂ p α of the current iteration k and of the previous iteration k − 1.
The rProp learning algorithm introduces an individual update value η α for each of the p α network parameters. The individual η α are changed during the learning process, and its (2.16) (2.17) As in regular gradient descent optimization procedures the update is made in opposite gradient direction. The updates are applied as given in Eq. (2.18) by Learning algorithms can be used in online or batch modes. During online learning only one training example per iteration is used at a time for updating the free network parameters, while batch learning uses a mean error over more than one training example per iteration. One cycle through all the available training data is called epoch. It is worth to mention that the rProp algorithm does only work in batch mode and with a large batch size.

A neural network for the approximation of thermodynamic functions
The proposed model for the approximation of thermodynamic functions consists of two interconnected subnetworks as shown in Fig. 3. The first subnetwork consists of a 1−1−1 ANN and has f a (Eq. 2.13) as its activation function for the hidden neurons. As described in the previous section this first subnetwork provides a base level for the approximation. The second subnetwork has a 1 − N − 1 structure and uses f b (Eq. 2.14) as activation function for the hidden neurons. The number of hidden neurons is depending on the case. The proposed method is implemented in Python and uses Theano (Theano Development Team 2016) as tensor library as well as Theanos built-in automated differentiation algorithm for the derivation of the gradients needed for the calculation of the derived thermodynamic quantities and during the learning procedure. rProp is used in batch mode with a batch size equal to the number of available measurement data. The different phases of a chemical element are approximated each with a separate neural network. The optimization of the network parameters is carried out at the same time for all phases of the system. This is achieved by formulating the error for every phase separately and adding up the different phase-wise error expressions to an overall system error. The reason for this approach lies in the calculation thermodynamic quantities like the Gibbs energy G a→b (T tr ) or the enthalpy change H a→b (T tr ) at the transitions from an arbitrary phase a to a phase b at the transition temperature T tr .
H a→b (T tr ) = H b (T tr ) − H a (T tr ) for example depends on the values for H a (T tr ) and H b (T tr ). The optimization of the neural network representation is based on measurements for H a (T tr ), H b (T tr ) and H a→b (T tr ), and it is more than likely that measurements from different sources are not fully consistent. The optimization has the goal to minimize the error FeGAS from Chase (1998) (d) Fig. 8 The thermodynamic functions of Fe GAS between 3000 K < T < 6000 K together with the training data from Chase (1998) between the network output and its underlying training data leading to an error E > 0 even for a fully optimized network. Optimizing H a (T tr ) in a first step and H b (T tr ) and H a→b (T tr ) in the second step would distribute the error E > 0 due to inconsistencies of the measurement data only on the second phase and would lead to a reduced quality of its approximation. By optimizing all the different phases of a system simultaneously on the basis of a overall system error the error due to inconsistencies of the measurement data is distributed evenly among the different phases of the system.

Approximation of the thermodynamic functions of pure iron
The thermodynamic functions of Fe were considered to evaluate the performance of the proposed neural network model. The thermodynamic functions of each phase of Fe are approximated for the temperature range between 0 < T < 6000 K in its stable and metastable regime. Between 0 and 1184 K Fe has a BCC crystal structure and is denoted α-Fe in the literature. At 1184 K there is a phase transition Fe BCC → Fe FCC . The FCC phase is denoted γ -Fe and is stable between 1184 and 1665 K. At 1665 K a second phase transition occurs in the solid state Fe FCC → Fe BCC . The second BCC phase is referred to as δ-Fe, but since the crystal structure is the same as for α − Fe, the two BCC phases are modelled by a single ANN. Fe has its melting point at 1809 K and the transition to the gaseous phase at 3134 K. The approximation of the thermodynamic functions of Fe is challenging due to the strong magnetic peak in the C p curve of BCC iron at 1042 K and to the four phase transitions between 0 and 6000 K. There exist reviews of the available measurements for Fe, among which Desai (1986) and Chen and Sundman (2001) are used in this work to gather the training data for the neural network model. The used training data, if not mentioned explicitly, consist solely of raw measurement data.
Published values from other already optimized models are not taken as training data but are used to compare the obtained results to. The calculation of phase transitions is based on a bisection method which is implemented in python. Figure 4a shows the ANN representation of the specific heat C N of Fe BCC iron together with the training data used for the approximation and the results from the FactSage 7.0 FactPS database (Bale et al. 2016). Figure 4b shows the respective relative differences between C N and the training data.

BCC
The calculated thermodynamic function is over wide ranges in good agreement with the measurement data. The overall standard deviation of the neural network approximation to the training data is σ BCC = 2.9 J/mol K. Between 500 and 700 K the calculated curve tends to predict slightly lower heat capacity values than the pure training data would suggest. This behaviour can be explained by the nature of the neural network regression itself. The error expression combines the error of different quantities of the different phases.
The overall system error has a minimum, but can still be seen as the best compromise between the individual errors the system error consists of. The calculated heat capacity function from this work is in good agreement with curve from the FactSage 7.0 FactPS database (Bale et al. 2016). Both approximations show deviations from the measurement data between 500 and 700 K. The values near the magnetic peak at 1042 K are better represented by the neural network representation than by the FactSage 7.0 FactPS database (Bale et al. 2016). Furthermore, the FactSage 7.0 FactPS database (Bale et al. 2016) representation has a jump at the transition between Fe BCC and Fe LIQ at 1809 K. This jump is avoided in the results obtained in this work, and the transition into the metastable regime is smooth and continuous. In addition, Fig. 5a shows the results for the ANN representation of the enthalpy H N together with the training data and the representation from the FactSage 7.0 FactPS database (Bale et al. 2016). Figure 5b shows in addition the relative difference between H N and the training data. The calculated values of H N are in good agreement with the measurement data and the results from Bale et al. (2016). Another interesting aspect is the model's ability to approximate the different thermodynamic quantities, especially C N , between 0 and 298.15 K, which is a problem for the polynomial-based models as reported in Roslyakova et al. (2016). Figure 6a shows the ANN representation of the specific heat C N of Fe FCC iron together with the underlying training data and the results obtained from the FactSage 7.0 FactPS database (Bale et al. 2016). The relative differences between C N and the training data for Fe FCC iron are shown in figure (6b). The available measurement data are scattered. For example the difference between C p (1600 K) from Bendick and Pepperhoff (1982) and C p (1600 K) from Rogez and Le Coze (1980) is C p = 11.44 J/ (mol K). The neural network approximation lies in good agreement with the measurement data in view of their strong scattering. The overall standard deviation of the neural network approximation to the training data is σ FCC = 2.1 J/ (mol K). The slope of the neural network approximation of the heat capacity in its stable regime between 1184 and 1665 K is lower than the representation from the FactSage 7.0 FactPS database (Bale et al. 2016). Like in the Fe BCC phase the FactPS database approximation has a jump above at the melting temperature. This Table 1 Comparison between calculated and experimental temperatures of phase transformations for iron T tr in K Source Fe BCC→FCC 1184 Bendick and Pepperhoff (1982) 1184 Tsuchiya et al. (1971) 1184 Anderson and Hultgren (1962) 1184 Vollmer et al. (1966) 1184.08 This work Fe FCC→BCC 1665 Bendick andPepperhoff (1982) Vollmer et al. (1966) 1665.09 This work Fe BCC→LIQ 1811 Desai (1986) 1809 Anderson and Hultgren (1962) 1809 Vollmer et al. (1966) 1808.99 This work Fe LIQ→GAS 3134 Haynes (2015) 3134.0 This work jump does not occur in the neural network approximation of this work. Figure 5a shows the enthalpy of Fe BCC and Fe FCC near the transition Fe BCC → Fe FCC at 1184.08 K together with the underlying training data and with the SGTE approximation as a comparison. The relative differences of the training data and H N are shown in Fig. 5b. The neural network approximations obtained in this work lie in good agreement with the measurement data. For the Fe BCC phase the neural network approximation and the SGTE approximation are almost identical. For the Fe FCC phase, the results from this work predict lower values in the metastable regime under 1184.08 K than the SGTE approximation from Bale et al. (2016) and the difference increases with decreasing temperature.

Liquid
The neural network approximation for liquid iron Fe LIQ is based on the work of Desai (1986), who suggests a value for the isobaric heat capacity of liquid iron of 46.632 J/(mol K), and additionally on measurement values for the enthalpy. The optimization procedure for liquid iron is slightly different from the optimization of the solid phases. The isobaric heat capacity of liquids is constant. For the learning algorithm the stable temperature regime of iron between 1809 K and 3134 K is represented by 100 points in the interval [1809,3134]. The target value for each of the 100 points is the 46.632 J/(mol K) from Desai (1986). Calculations have shown that this condition alone is not sufficient for a slope of 0 J/ mol K 2 of C in of the liquid phase. The error expression for liquid iron is therefore extended by the expression as given in Eq. (3.1) by dC dT = 0.
(3.1) Figure 7a shows the results for the enthalpy of Fe BCC and Fe LIQ near the transition Fe BCC → Fe LIQ at 1808.9 K together with the results from the FactSage 7.0 FactPS database (Bale et al. 2016). The relative differences between H N and the measurement data for Fe BCC and Fe are shown in Fig. 7b. The approximation for Fe LIQ above the melting temperature lies in good agreement with the available measurement values and is almost identical to the approximation from the FactPS database (Bale et al. 2016). In the metastable temperature regimes, for Fe LIQ above and for Fe BCC below the melting temperature, the neural network approximation predicts slightly lower values for C as the approximation from Bale et al. (2016).

Gaseous phase
The approximation of the gaseous phase is based on the NIST JANAF thermochemical tables (Chase 1998). This means the basis for the approximation of the thermodynamic functions does not consist of measurement values as for the other phases of iron but on already optimized values. Nevertheless, and for the sake of completeness, the gaseous phase is still incorporated in the proposed results. A important aspect of the approximation of the gaseous phase is that the NIST JANAF thermochemical tables provide values for G, H , S and C. Figure 8a-d shows the approximated thermodynamic functions of gaseous iron. The results for the gaseous phase show that every available thermodynamic quantities can be used for the approximation of the thermodynamic functions

Further remarks
For the optimization of the neural network model values for H , C and S were used. Values for G were used indirectly to approximate transition temperatures. This demonstrates the models ability to use any of these quantities for the approximation of thermodynamic functions, which is clearly an advantage over the polynomial-based models. Figure 8a-d shows the curves of G, H , S and C for the whole considered temperature range and for all of the different phases of iron. The dashed lines indicate the metastable regimes of respective phase. Table 1 lists the calculated transition temperatures of iron for the different phase transitions together with available literature data. The calculated values from this work lie in good agreement with the literature data. Additionally Table 2 lists the enthalpies of transformation for the different phase transformations of iron. The calculated values from this work are in good agreement with the data from the literature. It is worth to mention that these values are also part of the training data and are learned by the network during the optimization phase. Nevertheless, these values show clearly that the proposed method can be used for the approximation of the thermodynamic functions of unary systems and the obtained results are self-consistent.

Summary and outlook
This work presents a new model for the approximation of thermodynamic functions of unary systems based on artificial neural networks. For iron as a complex example the different thermodynamic functions were successfully approximated. The comparison with literature data and already existing approximations of the thermodynamic functions of pure iron shows the suitability of the proposed method. The proposed method solves the underlying optimization problem with a minimum user input and almost automatically. One major disadvantage of the proposed method is the black-box character of ANNs. As a consequence the ability of the network to deliver correct values can only be verified through trial and error and not by investigating the optimized network itself. The extension of the proposed method on material systems with more than one constituents and the investigation in how far the beneficial properties of the proposed method can be extended are the main questions for further investigations (Fig. 9).