1 Introduction

The approximation of thermodynamic functions is an important task in material science and engineering. A consistent description of thermodynamic data helps to understand, improve and develop materials. With methods and formalisms from the field of computational thermodynamics, e.g. the CALPHAD method (Lukas et al. 2007; Saunders and Miodownik 1998), the numerical calculation of stable phases, phase equilibria, phase transitions, whole phase diagrams and phase diagram-related data is possible. The Gibbs energy G is the central quantity of such calculations, and an exact description of G in terms of the temperature T and the systems composition is the key for a reliable description of material systems and their properties. G and other thermodynamic quantities are related to one another, and the different quantities of interest can be expressed as sets of partial derivatives of G. Expressions for G are usually derived by approximating the temperature-dependent isobaric heat capacity \(C_{{p}}\) based on suitable models (Dinsdale 1991; Chase et al. 1995; Chen and Sundman 2001; Roslyakova et al. 2016), where the free model parameters are optimized to fit measurement data. G is then calculated by a subsequent integration of the optimized model for \(C_{{p}}\).

Artificial neural networks (ANN) can be used for function regression. With the property of being universal function approximators as described in Hornik (1991) ANNs have the ability to map a set of independent input variables to a set of dependent output variables and thus can detect and model any linear or nonlinear relationship between these. In recent years ANNs are used to solve tasks in science and engineering in many scientific fields and among them in material science. ANNs have been used extensively to model physical properties of materials (Hemmat 2017; Hemmat Esfe et al. 2016). The latest work of Avrutskiy (2017) delivers the framework for approximating functions and their derivatives simultaneously, and it can therefore be used to solve this specific task. In the present work the question is therefore investigated if thermodynamic functions can be modelled on the basis of ANNs. To answer this question a neural network model for the approximation of thermodynamic functions is introduced. The thermodynamic functions of iron between 0 and 6000 K are approximated and validated as a challenging example.

2 Methods

2.1 Modelling of thermodynamic functions

Unary systems consist of only one compound. The thermodynamic state variables depend therefore only on the pressure p and temperature T. The Gibbs energy G(Tp) is the central quantity when calculating phase diagrams or phase diagram-related data. It is given in Eq. (2.1) by

$$\begin{aligned} G=H-TS. \end{aligned}$$
(2.1)

A relationship between G and the entropy S can be established by the derivation of G w.r.t. T at constant pressure as given in Eq. (2.2) by

$$\begin{aligned} S = -\left. \frac{\partial G}{\partial T}\right| _p. \end{aligned}$$
(2.2)

A relationship between G and the enthalpy H can also be derived and is known as Gibb–Helmholtz equation as given in Eq. (2.3) by

$$\begin{aligned} H = -T^2\frac{\partial }{\partial T} \left. \frac{G}{T} \right| _p. \end{aligned}$$
(2.3)

The isobaric heat capacity \(C_{{p}}\), in the following denoted as C, can also be derived from the Gibbs energy and is given in Eq. (2.4) by

$$\begin{aligned} C=-T\left. \frac{\partial ^2G}{\partial T^2}\right| _p. \end{aligned}$$
(2.4)

Equations (2.1)–(2.4) can be used to characterize a material system. In the present work this resulting set of partial derivatives is modelled and solved on the basis of artificial neural networks.

G(Tp) as a thermodynamic potential can be used to calculate the stable phase for a chosen state, phase equilibria and whole phase diagrams. A detailed description of the underlying physical principles is above the scope of this work.

2.2 Neural networks for the approximation of functions and their derivatives

Artificial neural networks (ANNs) consist of a large number of elementary processing units, the so-called neurons. The different neurons are interconnected to each other, and the connections are weighted individually. Neural networks can be trained to a desired behaviour by trial-and-error procedures based on training data. The purpose of the training is to adjust the individual weights of the network, so the network exhibits a desired behaviour. The neural network model presented in this work is a feedforward network and is trained with the resilient propagation (rProp) learning algorithm. Expressions for the derivatives of the networks output w.r.t. its inputs, the calculation of the different thermodynamic quantities as well as the training procedure itself will be discussed in the following section.

2.2.1 Notation

The proposed model can be used with an arbitrary number of layers but will be fixed to three consisting of one input layer (1), one hidden layer (2) and one output layer (3). Neurons of the different layers of a network will be denoted by the Greek letters \(\alpha \), for layer 1 and \(\beta \) for layer 2. The general structure of an arbitrary neuron \(\beta \) is shown in Fig. 1. The number of neurons for the output layer will be fixed to 1. Network parameters are numerated due to their multiple use. The weight matrices connecting the neurons of different layers will be in this sense referred to as \(W_{21,\beta \alpha }\) that connects a neuron \(\alpha \) from layer 1 with a neuron \(\beta \) from layer 2 and \(W_{32,1\beta }\), respectively. Vector-valued network parameters apply only on one layer and are addressed by the number of the layer and one Greek letter for the neuron, like \(b_{2,\beta }\). The value a neuron receives before applying its activation function will be denoted as net inputs. The net input of each layer is organized as vector \(s_{2,\beta }^t\) where the superscript t runs through the number of training examples in general. The absolute number of iterable objects will be denoted by amount lines, like \(|\gamma |\) or |t|. The activation function of the input and output layer is linear. The activation function of the hidden layer will be denoted as f for the general description and will be specified later. The activation functions are applied on every neuron of a layer. Expressions for the neural network representations of the different thermodynamic quantities are in the following denoted by the subscript N. The overall input pattern is denoted as \(x_\alpha ^t\) and the overall output pattern as \(y^t\).

2.2.2 Representation of thermodynamic functions with neural network variables

Feedforward networks are suitable networks for solving regression tasks (Hornik 1991). Every layer is fully connected with its adjacent layer, and the information flows monodirectionally from the input through the hidden layer to the output of the network. In the following section the net input of a arbitrary layer, e.g. \(s_{2,\beta }^t\), is defined as the weighted sum of all input values and an additional threshold value \(b_{2,\beta }\) as given in Eq. (2.5) by

$$\begin{aligned} s_{2,\beta }^t=\sum _{\alpha =1}^{|\alpha |}W_{21,\beta \alpha }\cdot x_\alpha ^t+b_{2,\beta } \end{aligned}$$
(2.5)

Using this definition the overall output of the network \(y^t\) as function of its input \(x^t\) for any input pattern t is calculated as given in Eq. (2.6) by

$$\begin{aligned} y^t(x)&=W_{2,1\beta } \cdot f\left( \sum _{\alpha =1}^{|\alpha |} W_{1,\beta \alpha } \cdot x_\alpha ^t + b_{2,\beta } \right) +b_{3,1} \nonumber \\&=W_{2,1\beta } \cdot f\left( s_\beta ^t \right) +b_{3,1}. \end{aligned}$$
(2.6)

Expressions for the first and the second derivative of \(y^t\) w.r.t. \(x^t\) are derived by the application of the chain rule as given in Eqs. (2.7) and (2.8) by

$$\begin{aligned} \frac{\partial y^t}{\partial x_{{\tilde{\alpha }}}}=W_{2,1\beta }\cdot f'\left( \sum _{\alpha =1}^{|\alpha |} W_{1,\beta \alpha } \cdot x_\alpha ^t + b_{2,\beta } \right) \cdot W_{1,\beta {\tilde{\alpha }}} \end{aligned}$$
(2.7)

and, respectively,

$$\begin{aligned} \frac{\partial ^2 y^t}{\partial x_{{\tilde{\alpha }}}\partial x_{\mathop {\alpha }\limits ^{\approx }}}= & {} W_{2,1\beta }\cdot f''\left( \sum _{\alpha =1}^{|\alpha |} W_{1,\beta \alpha } \cdot x_\alpha ^t + b_{2,\beta } \right) \cdot W_{1,\beta {\tilde{\alpha }}}\nonumber \\&\cdot W_{1,\beta \mathop {\alpha }\limits ^{\approx }}. \end{aligned}$$
(2.8)
Fig. 1
figure 1

Structure of an arbitrary artificial neuron \(\beta \)

Fig. 2
figure 2

Normalized prototype functions for the different thermodynamic quantities for the ANN approximation a\(g_a(x)=f_a(s_a(x))\) with \(s_a(x)=x\) and b\(g_b(x)=f_b(s_b(x))\) with \(s_b(x)=x-5\)

Using Eqs. (2.6)–(2.8) one can express the Gibbs energy and the related quantities given by Eqs. (2.1)–(2.4) solely by neural network variables. In the proposed model, the input to the network is directly given by the temperature T and the output of the network \(y^t\) is directly connected to the Gibbs energy G. Therefore, the neural network representation \(G_\mathrm{N}^t\) of the Gibbs energy \(G^t\) at a given temperature T is calculated as given in Eq. (2.9) by

$$\begin{aligned} G_\mathrm{N}^t=y^t =W_{2,1\beta } \cdot f\left( s_\beta ^t \right) +b_{3,1} \end{aligned}$$
(2.9)

The entropy as defined in Eq. (2.2) is calculated as the first derivative of G w.r.t. T. The neural network representation of the entropy \(S_\mathrm{N}\) is therefore given as in Eq. (2.10) by

$$\begin{aligned} S_\mathrm{N}^t&=-\frac{\mathrm {d} y^t}{\mathrm {d} x^t}\nonumber \\&=-W_{2,1\beta }\cdot f'\left( s_\beta ^t \right) \cdot W_{1,\beta 1} \end{aligned}$$
(2.10)

The neural network representation of the enthalpy from Eq. (2.3) is given as in Eq. (2.11) by

$$\begin{aligned} H_\mathrm{N}^t&=G_\mathrm{N}^t+x^tS_\mathrm{N}^t \nonumber \\&= W_{2,1\beta } \cdot f\left( s_\beta ^t \right) +b_{3,1}-x^t\left( W_{2,1\beta }\cdot f'\left( s_\beta ^t \right) \cdot W_{1,\beta 1} \right) \end{aligned}$$
(2.11)

And finally the neural network representation of the isobaric heat capacity from Eq. (2.4) is given as in Eq. 2.12 by

$$\begin{aligned} C_\mathrm{N}^t&=-x^t\frac{\mathrm {d}^2 y^t}{\mathrm {d} \left( x^t \right) ^2} \nonumber \\&=-x^t\cdot W_{2,1\beta }\cdot f''\left( s_\beta ^t \right) \cdot \left( W_{1,\beta 1} \right) ^2 \end{aligned}$$
(2.12)

Equations (2.9)–(2.12) consist now solely of neural network parameters and reflect at the same time the physical relations of the different thermodynamic quantities. As described in the work of Avrutskiy (2017) the expressions for the derivatives of a neural network outputs w.r.t. its inputs can be considered as standalone neural networks and could be trained individually. Due to the fact that each of the expressions above is formed from the same network parameters the calculation of \(G_\mathrm{N}\), \(S_\mathrm{N}\) and \(H_\mathrm{N}\) is also possible when the training data only contain values for C. This results in two major advantages: firstly, all available measurement data can be used for the modelling process and, secondly, the self-consistency of resulting network for the different thermodynamic quantities is always fulfilled.

2.2.3 Activation function

Activation functions within neural network regression predefines the shape of the function approximation. In general any shape can be approximated by a neural network using the standard sigmoid activation function and a sufficient number of layers and neurons as proven in Hornik (1991). A major disadvantage is the extrapolation ability of a neural network. The only information about the function to be approximated is provided by the training data. Without further assumptions and constrains a reliable prediction of function values outside the borders of the training data is not possible. When approximating thermodynamic functions and especially when calculating stable phases, phase transitions or phase equilibria a model must deliver reasonable values for the thermodynamic functions of the different phases even outside their stable temperature regimes. The extrapolation abilities of the proposed model are provided by the use of two different activation functions \(f_a\) and \(f_b\) as also shown in Fig. 2:

$$\begin{aligned} f_a\left( s \right)&=E_0+\frac{3}{2}R\Theta _E+3Rs{\mathrm {log}}\left( 1- \mathrm {e}^{-\Theta _E/s} \right) \nonumber \\&\quad -\,\frac{1}{2}a s^2-\frac{1}{6}b s^3 \end{aligned}$$
(2.13)
$$\begin{aligned} f_b\left( s\right)&=\mathrm {log}\left( \mathrm {e}^s+1\right) \end{aligned}$$
(2.14)

The expression from Eq. (2.13) is derived by the integration of the Chen and Sundman model for the isobaric heat capacity, where each term describes a different physical contribution. For a detailed description of this model the author refers to Chen and Sundman (2001). By using Eq. (2.13) as an activation function a physical basis for the approximation of thermodynamic functions is provided which describes the general shape of the approximated thermodynamic functions. In Eq. (2.13) R stands for the universal gas constant. The parameters \(E_0\), \(\Theta _E\), a and b are treated as additional network parameters and are optimized during the learning process. In many material systems, such as iron, which will be approximated in this work, effects, for example magnetic ordering effects, can occur locally and affect the thermodynamic functions. The local deviations are approximated by a second network with its activation function \(f_b\) as given in Eq. (2.14). \(f_b\) is based on the so-called softplus activation function (Nair and Hinton 2010). The reason to use Eq. 2.14 lies in the bell shape of the second derivative of \(f_b\) w.r.t. s that allows to model local and peak-shaped effects in the heat capacity curve.

2.2.4 Training neural networks and derivatives with resilient propagation

The training data consist of sets of pairs \(\lbrace \left( x^t,G^t\right) \rbrace \) defining a desired network output \(G^t\) at given \(x^t\). When dealing with partial derivatives there need to be additional data provided for every unknown quantity of the system. The sets \(\lbrace \left( x^u, S^u \right) \rbrace \), \(\lbrace \left( x^v, H^v\right) \rbrace \) and \(\lbrace \left( x^w,C^w\right) \rbrace \) provide these additional trainings data for each of the derived thermodynamic quantities approximated in this work. The different indices t, u, v and w stand for the fact that the absolute number of trainings examples and the locations \(x^t\), \(x^u\), \(x^v\) and \(x^w\) do not necessarily need to be the same. To measure the error between the resulting output from the network and the desired output defined by the training data, a cost function \(E=e\left( p^\alpha \right) \) with \(p^\alpha =\left[ W^{\gamma \beta }, W^{\delta \gamma }, b^\gamma , \ldots \right] ^\mathrm{T}\) that depends on all free network parameters is defined. In the present work the least squares sum is chosen as cost function as given in Eq. (2.15) by

$$\begin{aligned} E&=\frac{q_G}{|t|}\sum _tE_G^t+\frac{q_S}{|u|}\sum _uE_S^u+\frac{q_H}{|v|}\sum _vE_H^v+\frac{q_C}{|w|}\sum _wE_C^{w}\nonumber \\&=\frac{q_G}{|t|}\sum _t \left( G_\mathrm{N}^t\left( x^t \right) -G^t \right) ^2\nonumber \\&\quad +\,\cdots \frac{q_S}{|u|}\sum _u \left( S_\mathrm{N}^u\left( x^u \right) -S^u \right) ^2\nonumber \\&\quad +\,\cdots \frac{q_H}{|v|}\sum _v \left( H_\mathrm{N}^v\left( x^v \right) -H^v \right) ^2\nonumber \\&\quad +\,\cdots \frac{q_C}{|w|}\sum _w \left( C_\mathrm{N}^w\left( x^w \right) -C^w \right) ^2 \end{aligned}$$
(2.15)

The main goal of the training process is now to find an optimal set of parameters so that the error defined in Eq. (2.15) reaches a minimum. By repeated application of the chain rule \(\partial E/\partial p^\alpha \) can be calculated. A step in the opposite gradient direction is then performed to move towards the minimum of the error surface. The factors \(q_G\), \(q_S\), \(q_H\) and \(q_C\) are introduced to equalize the contributions of the different thermodynamic quantities, which differ in orders of magnitude, to the error expression.

Fig. 3
figure 3

Structure of the ANN model for the approximation of thermodynamic functions

Fig. 4
figure 4

a The neural network representation of the isobaric heat capacity of \({\text {Fe}}_{\mathrm{BCC}}\) and the underlying trainings data, b relative differences \(\Delta C / C=(C_\mathrm{N}-C)/C_\mathrm{N}\) of the trainings data from the neural network approximation for \({\text {Fe}}_{\mathrm{BCC}}\)

Usually the magnitude and the sign of the different partial derivatives are used for the optimization of the network parameters. This can be a problem when the error surface described by E is too rugged and nonlinear. Under such conditions the magnitude of \(\partial E/\partial p^\alpha \) can vary by orders of magnitude from one iteration to the next resulting in a unstable learning process and even in the failing of the optimization routine. The resilient propagation algorithm (rProp) (Riedmiller and Braun 1993) was developed to become independent from the magnitude of \(\partial E/\partial p^\alpha \). rProp adapts local stepsizes based only on the sign of \(\partial E/\partial p^\alpha \) of the current iteration k and of the previous iteration \(k-1\).

Fig. 5
figure 5

a The neural network representation of the enthalpy of \({\text {Fe}}_{\mathrm{BCC}}\) and \({\text {Fe}}_{\mathrm{FCC}}\) together with their underlying trainings data for \(300\,\text {K}<T<1600\,\text {K}\), b relative differences \(\Delta H / H=(H_\mathrm{N}-H)/H_\mathrm{N}\) of the trainings data from the neural network approximation for \({\text {Fe}}_{\mathrm{BCC}}\) and \({\text {Fe}}_{\mathrm{FCC}}\)

Fig. 6
figure 6

a The neural network representation of the isobaric heat capacity of \({\text {Fe}}_{\mathrm{FCC}}\) and the underlying training data, b relative differences \(\Delta C / C=(C_\mathrm{N}-C)/C_\mathrm{N}\) of the trainings data from the neural network approximation for \({\text {Fe}}_{\mathrm{FCC}}\)

Fig. 7
figure 7

a The neural network representation of the enthalpy of \({\text {Fe}}_{\mathrm{BCC}}\) and \({\text {Fe}}_{\mathrm{LIQ}}\) together with their underlying trainings data for \(1600\,\text {K}<T<2200\,\text {K}\), b relative differences \(\Delta H / H=(H_\mathrm{N}-H)/H_\mathrm{N}\) of the trainings data from the neural network approximation for \({\text {Fe}}_{\mathrm{BCC}}\) and \({\text {Fe}}_\mathrm{LIQ}\)

The rProp learning algorithm introduces an individual update value \(\eta ^\alpha \) for each of the \(p^\alpha \) network parameters. The individual \(\eta ^\alpha \) are changed during the learning process, and its evolution is influenced only by the sign of \(\partial E/\partial p^\alpha \). According to Riedmiller and Braun (1993) the change of the different \(\eta ^\alpha \) is given by

$$\begin{aligned} \eta ^\alpha _k={\left\{ \begin{array}{ll} \eta ^+\cdot \eta ^\alpha _{k-1}, &{}\quad \text {if}\,\left( \frac{\partial E}{\partial p^\alpha }\right) _{k-1}\cdot \left( \frac{\partial E}{\partial p^\alpha }\right) _k>0 \\ \eta ^-\cdot \eta ^\alpha _{k-1}, &{}\quad \text {if}\,\left( \frac{\partial E}{\partial p^\alpha }\right) _{k-1}\cdot \left( \frac{\partial E}{\partial p^\alpha }\right) _k<0 \\ \eta ^\alpha _{k-1}, &{}\quad \text {else}. \end{array}\right. } \end{aligned}$$
(2.16)

After the update values the individual \(p^\alpha \) are updated as given in Eq. (2.17) by

$$\begin{aligned} \Delta p^{\alpha }_k={\left\{ \begin{array}{ll} -\eta ^\alpha _k, &{}\quad \text {if}\,\left( \frac{\partial E}{\partial p^\alpha }\right) _k>0 \\ \eta ^\alpha _k, &{}\quad \text {if}\,\left( \frac{\partial E}{\partial p^\alpha }\right) _k<0 \\ 0, &{}\quad \text {else}. \end{array}\right. } \end{aligned}$$
(2.17)

As in regular gradient descent optimization procedures the update is made in opposite gradient direction. The updates are applied as given in Eq. (2.18) by

$$\begin{aligned} p^{\alpha }_k=p^{\alpha }_{k-1}+\Delta p^{\alpha }_k \end{aligned}$$
(2.18)

Learning algorithms can be used in online or batch modes. During online learning only one training example per iteration is used at a time for updating the free network parameters, while batch learning uses a mean error over more than one training example per iteration. One cycle through all the available training data is called epoch. It is worth to mention that the rProp algorithm does only work in batch mode and with a large batch size.

Fig. 8
figure 8

The thermodynamic functions of Fe\(_{\mathrm{GAS}}\) between \(3000\,\text {K}<T<6000\,\text {K}\) together with the training data from Chase (1998). aG(T), bH(T), cS(T), dC(T)

3 Results

3.1 A neural network for the approximation of thermodynamic functions

The proposed model for the approximation of thermodynamic functions consists of two interconnected subnetworks as shown in Fig. 3. The first subnetwork consists of a \(1-1-1\) ANN and has \(f_a\) (Eq. 2.13) as its activation function for the hidden neurons. As described in the previous section this first subnetwork provides a base level for the approximation. The second subnetwork has a \(1-N-1\) structure and uses \(f_b\) (Eq. 2.14) as activation function for the hidden neurons. The number of hidden neurons is depending on the case.

The proposed method is implemented in Python and uses Theano (Theano Development Team 2016) as tensor library as well as Theanos built-in automated differentiation algorithm for the derivation of the gradients needed for the calculation of the derived thermodynamic quantities and during the learning procedure. rProp is used in batch mode with a batch size equal to the number of available measurement data. The different phases of a chemical element are approximated each with a separate neural network. The optimization of the network parameters is carried out at the same time for all phases of the system. This is achieved by formulating the error for every phase separately and adding up the different phase-wise error expressions to an overall system error. The reason for this approach lies in the calculation thermodynamic quantities like the Gibbs energy \(G_{a\rightarrow b}(T_{\mathrm{tr}})\) or the enthalpy change \(\Delta H_{a\rightarrow b}(T_{\mathrm{tr}})\) at the transitions from an arbitrary phase a to a phase b at the transition temperature \(T_{\mathrm{tr}}\). \(\Delta H_{a\rightarrow b}(T_{\mathrm{tr}})=H_b(T_{\mathrm{tr}})-H_a(T_{\mathrm{tr}})\) for example depends on the values for \(H_a(T_{\mathrm{tr}})\) and \(H_b(T_{\mathrm{tr}})\). The optimization of the neural network representation is based on measurements for \(H_a(T_{\mathrm{tr}})\), \(H_b(T_{\mathrm{tr}})\) and \(\Delta H_{a\rightarrow b}(T_{\mathrm{tr}})\), and it is more than likely that measurements from different sources are not fully consistent. The optimization has the goal to minimize the error between the network output and its underlying training data leading to an error \(E>0\) even for a fully optimized network. Optimizing \(H_a(T_{\mathrm{tr}})\) in a first step and \(H_b(T_{\mathrm{tr}})\) and \(\Delta H_{a\rightarrow b}(T_{\mathrm{tr}})\) in the second step would distribute the error \(E>0\) due to inconsistencies of the measurement data only on the second phase and would lead to a reduced quality of its approximation. By optimizing all the different phases of a system simultaneously on the basis of a overall system error the error due to inconsistencies of the measurement data is distributed evenly among the different phases of the system.

3.2 Approximation of the thermodynamic functions of pure iron

The thermodynamic functions of Fe were considered to evaluate the performance of the proposed neural network model. The thermodynamic functions of each phase of Fe are approximated for the temperature range between \(0<T<6000\) K in its stable and metastable regime. Between 0 and 1184 K Fe has a BCC crystal structure and is denoted \(\alpha \)-Fe in the literature. At 1184 K there is a phase transition \({\text {Fe}}_{\mathrm{BCC}}\rightarrow {\text {Fe}}_{\mathrm{FCC}}\). The FCC phase is denoted \(\gamma \)-Fe and is stable between 1184 and 1665 K. At 1665 K a second phase transition occurs in the solid state \({\text {Fe}}_{\mathrm{FCC}}\rightarrow {\text {Fe}}_{\mathrm{BCC}}\). The second BCC phase is referred to as \(\delta \)-Fe, but since the crystal structure is the same as for \(\alpha -{\text {Fe}}\), the two BCC phases are modelled by a single ANN. Fe has its melting point at 1809 K and the transition to the gaseous phase at 3134 K. The approximation of the thermodynamic functions of Fe is challenging due to the strong magnetic peak in the \(C_{{p}}\) curve of BCC iron at 1042 K and to the four phase transitions between 0 and 6000 K. There exist reviews of the available measurements for Fe, among which Desai (1986) and Chen and Sundman (2001) are used in this work to gather the training data for the neural network model. The used training data, if not mentioned explicitly, consist solely of raw measurement data. Published values from other already optimized models are not taken as training data but are used to compare the obtained results to. The calculation of phase transitions is based on a bisection method which is implemented in python.

3.2.1 BCC

Figure 4a shows the ANN representation of the specific heat \(C_\mathrm{N}\) of \({\text {Fe}}_{\mathrm{BCC}}\) iron together with the training data used for the approximation and the results from the FactSage 7.0 FactPS database (Bale et al. 2016). Figure 4b shows the respective relative differences between \(C_\mathrm{N}\) and the training data. The calculated thermodynamic function is over wide ranges in good agreement with the measurement data. The overall standard deviation of the neural network approximation to the training data is \(\sigma _{\mathrm{BCC}}=2.9\,\text {J}/\text {mol K}\). Between 500 and \(700\,\text {K}\) the calculated curve tends to predict slightly lower heat capacity values than the pure training data would suggest. This behaviour can be explained by the nature of the neural network regression itself. The error expression combines the error of different quantities of the different phases. The overall system error has a minimum, but can still be seen as the best compromise between the individual errors the system error consists of.

Table 1 Comparison between calculated and experimental temperatures of phase transformations for iron
Table 2 Comparison between calculated and experimental phase transition enthalpies for iron

The calculated heat capacity function from this work is in good agreement with curve from the FactSage 7.0 FactPS database (Bale et al. 2016). Both approximations show deviations from the measurement data between 500 and \(700\,\text {K}\). The values near the magnetic peak at \(1042\,\text {K}\) are better represented by the neural network representation than by the FactSage 7.0 FactPS database (Bale et al. 2016). Furthermore, the FactSage 7.0 FactPS database (Bale et al. 2016) representation has a jump at the transition between \({\text {Fe}}_{\mathrm{BCC}}\) and \({\text {Fe}}_{\mathrm{LIQ}}\) at \(1809\,\text {K}\). This jump is avoided in the results obtained in this work, and the transition into the metastable regime is smooth and continuous. In addition, Fig. 5a shows the results for the ANN representation of the enthalpy \(H_\mathrm{N}\) together with the training data and the representation from the FactSage 7.0 FactPS database (Bale et al. 2016). Figure 5b shows in addition the relative difference between \(H_\mathrm{N}\) and the training data. The calculated values of \(H_\mathrm{N}\) are in good agreement with the measurement data and the results from Bale et al. (2016). Another interesting aspect is the model’s ability to approximate the different thermodynamic quantities, especially \(C_\mathrm{N}\), between 0 and 298.15 K, which is a problem for the polynomial-based models as reported in Roslyakova et al. (2016).

Fig. 9
figure 9

Overview over the thermodynamic functions from this work of Fe\(_{\mathrm{BCC}}\), Fe\(_{\mathrm{FCC}}\), Fe\(_{\mathrm{LIQ}}\) and Fe\(_{\mathrm{GAS}}\) between \(0\,\text {K}<T<6000\,\text {K}\). aG(T), bH(T), cS(T), dC(T). The solid lines stand for the stable, the dashed lines for the metastable regime (color figure online)

3.2.2 FCC

Figure 6a shows the ANN representation of the specific heat \(C_\mathrm{N}\) of \({\text {Fe}}_{\mathrm{FCC}}\) iron together with the underlying training data and the results obtained from the FactSage 7.0 FactPS database (Bale et al. 2016). The relative differences between \(C_\mathrm{N}\) and the training data for \({\text {Fe}}_{\mathrm{FCC}}\) iron are shown in figure (6b). The available measurement data are scattered. For example the difference between \(C_{{p}}(1600\,\text {K})\) from Bendick and Pepperhoff (1982) and \(C_{{p}}(1600\,\text {K})\) from Rogez and Le Coze (1980) is \(\Delta C_{{p}}=11.44\,\text {J}/\left( \text {mol K}\right) \). The neural network approximation lies in good agreement with the measurement data in view of their strong scattering. The overall standard deviation of the neural network approximation to the training data is \(\sigma _{\mathrm{FCC}}=2.1\,\text {J}/\left( \text {mol K}\right) \). The slope of the neural network approximation of the heat capacity in its stable regime between 1184 and \(1665\,\text {K}\) is lower than the representation from the FactSage 7.0 FactPS database (Bale et al. 2016). Like in the \({\text {Fe}}_{\mathrm{BCC}}\) phase the FactPS database approximation has a jump above at the melting temperature. This jump does not occur in the neural network approximation of this work.

Figure 5a shows the enthalpy of \({\text {Fe}}_{\mathrm{BCC}}\) and \({\text {Fe}}_{\mathrm{FCC}}\) near the transition \({\text {Fe}}_{\mathrm{BCC}}\rightarrow {\text {Fe}}_{\mathrm{FCC}}\) at \(1184.08\,\text {K}\) together with the underlying training data and with the SGTE approximation as a comparison. The relative differences of the training data and \(H_\mathrm{N}\) are shown in Fig. 5b. The neural network approximations obtained in this work lie in good agreement with the measurement data. For the \({\text {Fe}}_{\mathrm{BCC}}\) phase the neural network approximation and the SGTE approximation are almost identical. For the \({\text {Fe}}_{\mathrm{FCC}}\) phase, the results from this work predict lower values in the metastable regime under \(1184.08\text { K}\) than the SGTE approximation from Bale et al. (2016) and the difference increases with decreasing temperature.

3.2.3 Liquid

The neural network approximation for liquid iron \({\text {Fe}}_{\mathrm{LIQ}}\) is based on the work of Desai (1986), who suggests a value for the isobaric heat capacity of liquid iron of \(46.632\,\text {J}/\left( \text {mol K}\right) \), and additionally on measurement values for the enthalpy. The optimization procedure for liquid iron is slightly different from the optimization of the solid phases. The isobaric heat capacity of liquids is constant. For the learning algorithm the stable temperature regime of iron between \(1809\,\text {K}\) and \(3134\,\text {K}\) is represented by 100 points in the interval [1809, 3134]. The target value for each of the 100 points is the \(46.632\,\text {J}/\left( \text {mol K}\right) \) from Desai (1986). Calculations have shown that this condition alone is not sufficient for a slope of \(0\,\text {J}/\left( \text {mol}\,\text {K}^2\right) \) of C in of the liquid phase. The error expression for liquid iron is therefore extended by the expression as given in Eq. (3.1) by

$$\begin{aligned} \frac{\mathrm{d}C}{\mathrm{d}T}=0. \end{aligned}$$
(3.1)

Figure 7a shows the results for the enthalpy of \({\text {Fe}}_{\mathrm{BCC}}\) and \({\text {Fe}}_{\mathrm{LIQ}}\) near the transition \({\text {Fe}}_{\mathrm{BCC}}\rightarrow {\text {Fe}}_{\mathrm{LIQ}}\) at \(1808.9\text { K}\) together with the results from the FactSage 7.0 FactPS database (Bale et al. 2016). The relative differences between \(H_\mathrm{N}\) and the measurement data for \({\text {Fe}}_{\mathrm{BCC}}\) and \({\text {Fe}}\) are shown in Fig. 7b. The approximation for \({\text {Fe}}_{\mathrm{LIQ}}\) above the melting temperature lies in good agreement with the available measurement values and is almost identical to the approximation from the FactPS database (Bale et al. 2016). In the metastable temperature regimes, for \({\text {Fe}}_{\mathrm{LIQ}}\) above and for \({\text {Fe}}_{\mathrm{BCC}}\) below the melting temperature, the neural network approximation predicts slightly lower values for C as the approximation from Bale et al. (2016).

3.2.4 Gaseous phase

The approximation of the gaseous phase is based on the NIST JANAF thermochemical tables (Chase 1998). This means the basis for the approximation of the thermodynamic functions does not consist of measurement values as for the other phases of iron but on already optimized values. Nevertheless, and for the sake of completeness, the gaseous phase is still incorporated in the proposed results. A important aspect of the approximation of the gaseous phase is that the NIST JANAF thermochemical tables provide values for G, H, S and C. Figure 8a–d shows the approximated thermodynamic functions of gaseous iron. The results for the gaseous phase show that every available thermodynamic quantities can be used for the approximation of the thermodynamic functions

3.2.5 Further remarks

For the optimization of the neural network model values for H, C and S were used. Values for G were used indirectly to approximate transition temperatures. This demonstrates the models ability to use any of these quantities for the approximation of thermodynamic functions, which is clearly an advantage over the polynomial-based models. Figure 8a–d shows the curves of G, H, S and C for the whole considered temperature range and for all of the different phases of iron. The dashed lines indicate the metastable regimes of respective phase. Table 1 lists the calculated transition temperatures of iron for the different phase transitions together with available literature data. The calculated values from this work lie in good agreement with the literature data. Additionally Table 2 lists the enthalpies of transformation for the different phase transformations of iron. The calculated values from this work are in good agreement with the data from the literature. It is worth to mention that these values are also part of the training data and are learned by the network during the optimization phase. Nevertheless, these values show clearly that the proposed method can be used for the approximation of the thermodynamic functions of unary systems and the obtained results are self-consistent.

4 Summary and outlook

This work presents a new model for the approximation of thermodynamic functions of unary systems based on artificial neural networks. For iron as a complex example the different thermodynamic functions were successfully approximated. The comparison with literature data and already existing approximations of the thermodynamic functions of pure iron shows the suitability of the proposed method. The proposed method solves the underlying optimization problem with a minimum user input and almost automatically. One major disadvantage of the proposed method is the black-box character of ANNs. As a consequence the ability of the network to deliver correct values can only be verified through trial and error and not by investigating the optimized network itself. The extension of the proposed method on material systems with more than one constituents and the investigation in how far the beneficial properties of the proposed method can be extended are the main questions for further investigations (Fig. 9).