1 Introduction

For scientific, but particularly for industrial applications, it is absolutely valuable to provide equations of state (EOS) that accurately describe the thermodynamic behavior of the concerned fluid substances within a narrow time frame. However, accurate thermodynamic models are usually based on reliable experimental data, thus, the development can require quite a long time since accurate measurements over wide temperature and pressure ranges are typically a time-consuming endeavor. Here, we demonstrate the advantages of applying optimal experimental design (OED) in the process of planning measurement series for thermodynamic properties with the goal to deliberately reduce the experimental effort, thus, decreasing the overall time for model development. Therefore, the issue with typical experimental design and the approach of OED will be explained using our density data of ethylene glycol published by Yang et al. [1] and the density data of propylene glycol measured with the same instrument published by Sampson et al. [2].

The \((p, \rho , T)\) behavior of ethylene glycol was investigated over the temperature range from \(T =\) (283.3 to 393.1) K at pressures from \(p =\) (4.8 to 100.1) MPa utilizing a high-pressure vibrating-tube densimeter; as shown in Fig. 1 , a total of 89 \((p, \rho , T)\) data points were studied with a combined expanded uncertainty (\(k=2\)) of 1.57 kg·m−3 (equivalent to a max. relative uncertainty of 0.151 %). To model the experimental data, Yang et al. [1] fitted two empirical Schilling-type correlation equations [3]: one of the same form (same number of terms, same exponents) as for propylene glycol in the work of Sampson et al. [2], and one using the “artificial intelligence powered” software tool Eureqa [4, 5]. Eureqa is one of many tools that perform machine learning-based symbolic regression. It simultaneously finds the functional form of a model and identifies the model parameters. The first approach was motivated by the assumed thermodynamic similarity of both substances, and the second approach served to investigate the applicability of symbolic regression and machine learning for the optimization of correlation equations. Section 2 of the present paper provides a small overview of different types of correlation models for liquid-phase densities to support the understanding of their characteristics.

Fig. 1
figure 1

Experimental densities (symbols) and interpolated densities (dashed lines) measured by Yang et al. [1]. , \(T\approx \) 283.32 K; , \(T\approx \) 293.14K; , \(T\approx \) 293.14 K; , \(T\approx \) 298.18 K; , \(T\approx \) 313.13 K; , \(T\approx \) 333.09 K; , \(T\approx \) 353.13 K; , \(T\approx \) 373.22 K; , \(T\approx \) 393.09 K

In the present study, we illustrate how OED could have been used for the planning of the measurement series conducted by Yang et al. [1] in order to reduce the amount of density measurements of ethylene glycol needed to reliably adjust the parameters of a Schilling-type equation. To confirm our results regarding the benefits of OED, the calculations were also applied to measured densities of propylene glycol (combined expanded uncertainty (\(k=2\)): 1.78 kg·m−3) [2]. In spite of its capabilities to select measurements with the highest information content, OED is seldom applied in the field of thermodynamic property research. We conjecture that the primary reasons for this are:

  • a lack of knowledge about OED technologies,

  • the lack of software tools for easy application of OED,

  • the requirement to specify the underlying model beforehand.

For a detailed overview of the use of OED in different areas of chemical and thermal engineering research, we refer the reader to Francheschini and Macchietto [6]. With respect to thermodynamic property research, we point out the work of Bardow and colleagues [7, 8], which deals with the experimental characterization of liquid–liquid equilibria utilizing OED. In contrast to those investigations, the present work focuses on the temperature and pressure values for density measurements, with the aim to minimize the parameter variance of the correlation model. The method described here can be used equivalently for measurements of other thermodynamic and transport properties (e.g., speed of sound, specific heat capacity, viscosity, etc.).

When changing the isotherm in a usual measurement series, establishing thermal equilibrium with a typical \((p, \rho , T)\) apparatus is rather time-consuming compared to setting new pressures along an isotherm. For this reason, we decided to use OED to select a subset of the most informative isotherms. OED can be imagined as an in situ tool for linear or nonlinear modeling. In the latter, case, the following sequential workflow is employed:

  1. (a)

    choose an initial set of (pT) state points to be studied (e.g., two isotherms with five pressure values each),

  2. (b)

    conduct the measurements (e.g., density),

  3. (c)

    fit the model (e.g., the Schilling-type model),

  4. (d)

    answer the following questions:

    • Can the model reproduce the measured data with sufficient accuracy (e.g., within the experimental uncertainty)?

    • Can a significant amount of information be gained from additional measurements?

  5. (e)

    If the latter is not the case, the experiment is terminated. Otherwise, select the next most relevant isotherm using OED (3) and proceed with 1.

In this paper, we utilize in a first step, linear OED to select a subset of isothermal experiments from two given sets of measurements, which are already available. We use the measurements selected to create new models for ethylene glycol and propylene glycol based on the same functional forms as used before. The results are compared to the existing models (4).

2 Modeling Approach

For simulations in process engineering, accurate and easily applicable models that describe the real fluid behavior are needed. Considering the lack of knowledge about more complex compounds at the molecular level, empirical equations of state have become an established way of modeling thermodynamic behavior. In industrial applications, the empirical Tait equation (1888) is still widely used to model densities of liquids [9]. However, a contemporary approach to model liquid-phase densities was developed by Schilling et al. [3] (Table 1). This is an equation in form of polynomial terms; this type of terms is also used within fundamental equations of state.

For the modeling of fundamental equations of state, which provide the possibility to derive all thermodynamic properties by differentiation of these equations, functional forms such as polynomial and exponential terms are involved [10, 11]. In 1989, within the scope of developing fundamental equations, Setzmann and Wagner [10, 12] established the modeling tool OPTIM, which combines a modified step-wise regression analysis based on a “bank of terms” with elements of evolutionary optimization methods. Nowadays, this tool is mostly unused. However, it was applied to model the density of liquid n-heptane, n-nonane, 2,4-dichlorotoluene and bromobenzene in the temperature range from 233.15 to 473.15 K at pressures up to 30 MPa by Schilling et al. [3]. Due to its good representation of liquid-phase experimental data, further works, e.g., of Sommer et al. [13], Sampson et al. [2] and Yang et al. [1], also relate to this equation. Table 4 (M1 and M1.1) contains the characteristics and Table 5 (M1 and M1.1) provides the parameter values of the published models for ethylene glycol and propylene glycol, which are used in this work. The so-called Schilling-type equation has the following functional form:

$$\begin{aligned}&\frac{\rho }{\rho _0}=\sum _{j=1}^{I_{\text{Pol}} } n_j \, \sigma ^{t_j} \pi ^{p_j} \end{aligned}$$
(2.1)
$$\begin{aligned}&\sigma =(T/T_0-1) \text { and } \pi =(p/p_0+1) \end{aligned}$$
(2.2)

The parameters \(\sigma \) and \(\pi \) represent the reduced temperature and the reduced pressure, respectively, while \(T_0\), \(p_0\) and \(\rho _0\) are chosen by the measured values and \(I_\text{{Pol}} \) defines the number of terms. In Sampson et al. [2], the parameters were set to \(T_0 =\) 150 K, \(p_0=\) 100 MPa and \(\rho _0=\) 1000 kg·m−3, and \(n_j\), \(t_j\) and \(p_j\) were fitted to the experimental data of propylene glycol setting the number of terms \(I_\text{{Pol}} \) 8. With this model, the measured densities for propylene glycol can be reproduced within a maximum error of \(\varepsilon _\text{R} =0.015\,\%\). Due to the similarity between both substances, the same setup was used to fit the parameters \(n_j\) for ethylene glycol in Yang et al. [1], keeping the exponents \(t_j\) and \(p_j\) fixed (see Tables 4 (M1) and 5). The experimental data of ethylene glycol is represented within a maximum error of \(\varepsilon _\text{R} =0.025\,\%\).

Most current modeling approaches, especially for nonlinear models, have a high development effort and the requirement of thorough background knowledge in common. In the work of Yang et al., also a new approach utilizing Eureqa (Table 1) was employed to investigate the suitability of machine learning-based symbolic regression for nonlinear EOS modeling [4, 5]. The form of terms was chosen as polynomials to create a Schilling-type equation 2.1 and 2.2 equivalent to the existing model, while the aim was to reduce the number of terms, to fit the exponents and to stay within an adequate uncertainty (for model characteristics see Table 4 (M1.1) and for model parameters see Table 5 (M1.1)). This model is included in the evaluation, on the one hand to better interpret the deviations between modeled and measured densities and on the other hand to better illustrate the influence of the functional form on the extrapolation behavior. In addition to the less complex functional form, a lower maximum error of \(\varepsilon _\text{R} =0.020\,\%\) and a better extrapolation behavior are achieved.

Table 1 Features of different EOS modeling approaches for liquid-phase densities (Tait equation [9], Schilling-type equation [3], Eureqa modeling [4, 5])

3 Background on Optimal Experimental Design

Optimal experimental design is a technique to select experiments which are most informative about the unknown parameters of a given model. We refer the interested reader to Pázman, Uciński and Atkinson et al. [14,15,16] for a comprehensive background. We wish to emphasize that OED differs from the statistical design of experiments, where one seeks, e.g., space-filling experiments by latin hypercube sampling. Here, we use OED to reduce the experimental effort required to identify the parameters \(n_j\), \(j = 1, \ldots , I_{\text{Pol}} \), in Eqs. 2.1 and 2.2 with exponents \(t_j\) and \(p_j\) fixed. In this case, the dependent variable \(\rho /\rho _0\) is a linear function of the parameters, which simplifies the subsequent description. To be specific, we deduce from Eqs. 2.1 and 2.2 that each triple of measured data \((p_i, \rho _i, T_i)\), after conversion to reduced quantities \((\pi _i, \rho _i/\rho _0, \sigma _i)\), contributes a prediction of the form \(\rho _i \approx \rho _0 \, j(\pi _i,\sigma _i) \, n\), where \(n = (n_1, \ldots , n_8)^\text{{T}} \in \mathbb {R}^8\) denotes the parameter vector and

$$\begin{aligned} j(\pi _i,\sigma _i) = \begin{bmatrix} \sigma _i^{t_1} \pi _i^{p_1}&\cdots&\sigma _i^{t_8} \pi _i^{p_8} \end{bmatrix} \end{aligned}$$
(3.1)

is the so-called elementary Jacobian associated with the ith measurement. Following the theory of OED, the information content of a single measurement is expressed through the elementary Fisher information matrix (FIM) \(I_i = j(\pi _i, \sigma _i)^\text{{T}} j(\pi _i, \sigma _i) \in \mathbb {R}^{8 \times 8}\). Each elementary FIM is a symmetric, positive semi-definite rank–1 matrix. Moreover, since we can assume individual measurements to be statistically independent, the FIM associated with a series of experiments is obtained as \(I = \sum _i I_i\).

It is common to apply a scalar objective function, which converts the FIM associated with any collection of experiments into a single number and, thus, allows a comparison with any other collection of experiments. We utilize here the A-criterion

$$\begin{aligned} \Psi _{A} (I) ={{\,\text{trace}\,}}(I^{-1}) = \sum _{j=1}^8 \frac{1}{\lambda _j}, \end{aligned}$$
(3.2)

where \(\lambda _j\) is the jth eigenvalue of I. Clearly, at least eight individual measurements are required to render the FIM positive definite and the criterion \(\Psi _{A} (I)\) finite. In this case, the value of \(\Psi _{A} (I)\) is proportional to the sum of the squared semi-axes of confidence ellipsoids in the 8-dimensional parameter space. Consequently, we seek to minimize \(\Psi _{A} (I)\), possibly subject to constraints on the experimental budget, in order to maximize the information content of the collection of experiments selected and simultaneously minimize parameter variation in the face of measurement errors.

As we argued in the introduction, it is advantageous from a practical point of view to take measurements along an isotherm. In the following section, we will therefore optimize over experiments each of which comprises several measurements obtained by varying the pressure along an isotherm.

4 Numerical Results

The approach described in Sect. 3 is used to optimize the measurement series by maximizing the information contained in a set of eight isothermal measurements at approximate temperatures \(T = \) (283, 293, 298, 313, 333, 353, 373, 393) K drawn from the experiments conducted by Yang et al. [1]. The measurement plan for each isotherm consisted of nine pressure values, approximately \(p = \) (5, 10, 15, 20, 30, 50, 70, 90, 100) MPa. Table 2 and Fig. 2a show the most informative selection of isotherms (black dots) to fit the parameters \(n_1, \ldots , n_8\) using a varying number of isotherms. As expected, the value of the objective (Eq. 3.2) decreases as we allow more isotherms to be included. However, when transitioning from five to six isotherms, the further decrease in the objective is small compared to the previous steps. For propylene glycol, measured at approximate temperatures \(T = \) (273, 283, 293, 298, 313, 333, 353, 373, 393) K, Fig. 2b and Table 3 show a similar selection of best isotherms and for the decay of the objective function values. In this case, the measurement plan for each isotherm consisted of eight pressure values, approximately at \(p = \) (5, 10, 15, 20, 30.5, 50.5, 71, 91) MPa.

Fig. 2
figure 2

Selection of the best isotherms \(\bullet \) from the measured isotherms and from a free choice a) for ethylene glycol between \(T = \) (283 to 393) K and b) for propylene glycol between \(T = \) (273 to 393) K, within a raster of 2 K using the A-criterion (Eq. 3.2) to minimize the parameter uncertainty

Table 2 Selection of the best isotherms for different numbers of isotherms and the objective value (when fitted to that number of isotherms) for ethylene glycol
Table 3 Selection of the best isotherms for different numbers of isotherms and the objective value (when fitted to that number of isotherms) for propylene glycol
Table 4 Characteristics of the different Schilling type equations 2.1 and 2.2 (models M1, M1.1, M2 and M3) for ethylene glycol and propylene glycol used in this work

We aim to compare the accuracies of the existing models, see Table 4 (M1), with new models of the same functional form, where the parameter \(n_1, \ldots , n_8\) of Eqs. 2.1 and 2.2 are fitted using only the five best isotherms, see Table 4 (M2). To this end, we show in Fig. 3a and c the relative deviations of calculated densities from experimental values as a function of pressure along all eight isotherms. It can be seen that using only five out of eight isotherms for the fitting process yields a greatest deviation only larger by approximately \({0.003}{\,\%}\). The same approach is used to fit a new model for propylene glycol to the best five isotherms, see the lower part of Table 5 (M2). With an absolute difference in the greatest deviation of \({0.0015}{\,\%}\) for propylene glycol between the models M1 and M2 (Fig. 4a and b), the result for OED is even more promising than for ethylene glycol.

Next, we investigate the information gain obtained by allowing the temperatures of the isothermal experiments to be chosen freely within the interval from \(T = {283}\,{\rm K}\) to \(T = 393\,{\rm K}\) for ethylene glycol and \(T = {273}\,{\rm K}\) to \(T = {393}\,{\rm K}\) for propylene glycol. To this end, we introduce an equidistant grid of possible temperature values with a spacing of 2 K; the pressure values for each group of isothermal experiments are the same as measured. The optimal selections of up to five isotherms can be found in Fig. 2a and the bottom half of Table 2 for ethylene glycol as well as in Fig. 2b and the bottom half of Table 3 for propylene glycol.

By comparing both selection approaches, the similarity of temperatures for the best two and three isotherms as well as the difference between those for the four and five best isotherms are noticeable. This leads us to expect an improvement of the model with a fit to the free best five isotherms. However, due to the missing measured values, no comparable model can be developed for this case.

Instead, we investigate the importance of the functional form by fitting a new model to the best five measured isotherms, where also the exponents are optimized, see Fig. 3d, Tables 4(M3) and 5(M3). Here, even with five isotherms, the maximum deviations can be reduced significantly. However, for propylene glycol, the model M3 shows large deviations for the isotherms not selected (greatest deviation \({0.0431}{\,\%}\)) compared to the deviation of the selected isotherms (greatest deviation \({0.0046}{\,\%}\)) (Fig. 4c) and compared to the model published by Sampson et al. [2] (Tables 4 and 5, propylene glycol - M1). This behavior is in conjunction with the conclusion of Yang et al. [1] concerning the Eureqa model (Fig. 3b), which demonstrates the importance of a comprehensive model optimization including the functional form. When fitting the exponents, the model is not linear in the parameters anymore, which affects the selection of the best isotherms. To further exploit the potential of free exponent modeling, we plan to employ the sequential OED approach, described at the end of Sect. 1, for calculating the necessary isotherms based on nonlinear models in future work.

Fig. 3
figure 3

Relative deviations of ethylene glycol densities calculated with the different models in Table 4 from experimental values, (a) M1, (b) M1.1, (c) M2, best five measured isotherms (bold marker), (d) M3, best five measured isotherms (bold marker). , \(T\approx \) 283.32 K; , \(T\approx \) 293.142 K; , \(T\approx \) 293.140 K; , \(T\approx \) 298.18 K; , \(T\approx \) 313.13 K; , \(T\approx \) 333.09 K; , \(T\approx \) 353.13 K; , \(T\approx \) 373.22 K; , \(T\approx \) 393.09 K

Fig. 4
figure 4

Relative deviations of propylene glycol densities calculated with the different models in Table 4 from experimental values, (a) M1, (b) M2, best five measured isotherms (bold marker), (c) M3, best five measured isotherms (bold marker). , \(T\approx \) 272.73 K; , \(T\approx \) 283.18 K; , \(T\approx \) 293.18 K; , \(T\approx \) 298.12 K; , \(T\approx \) 313.12 K; , \(T\approx \) 333.05 K; , \(T\approx \) 352.99 K; , \(T\approx \) 373.34 K; , \(T\approx \) 392.95 K

Table 5 Parameters of Eqs. 2.1 and 2.2 for the models M1 to M3 in Table 4 for ethylene glycol and propylene glycol

For a more comprehensive model examination between the modeling approaches, the extrapolation behavior was investigated taking the example of ethylene glycol; this was first done with an optimization of the functional form utilizing Eureqa, as shown in Table 4 with model M1.1, and second with parameter identification for a given functional form, as shown in Table 4 with models M1, M2 and M3. Therefore, the models were compared to an unpublished fundamental equation of state in form of the Helmholtz energy as implemented in REFPROP version 10.0 [17]. This model is valid within the following limits:

  • temperature: \(T = \) 260.6 to 750.0 K,

  • pressure: up to \(p = \) 100 MPa,

  • density: up to \(\rho = \) 1136.5 kg·m−3,

  • vapor-liquid saturation line.

The vapor-liquid saturation line defines the limit for extrapolation up to the maximum temperature and the minimum pressure. To calculate the liquid densities with the different models at the saturation line, the corresponding temperatures and pressures from REFPROP Fig. 5 are used. As already shown in the paper from Yang et al. [1], the M1.1 Fig. 5 shows the best extrapolation behavior within \({\pm 2}{\,\%}\) up to \(p=\) 4.5 MPa and \(T=\) 655 K with \(\rho \approx \) 740 kg·m−3. The graphs for models M1 and M2 in Fig. 5 are similar to each other, while the model M3 shows the largest deviations from the densities calculated with the fundamental equation in REFPROP. It is important to note that none of the models reproduces the curvature of densities calculated from the fundamental equation in REFPROP, which means that they are all strictly limited to the calculation of liquid-phase densities.

Fig. 5
figure 5

Saturated liquid densities calculated with the models M1, M1.1, M2 and M3 (see Table 4) for (a) the saturation pressure and (b) the saturation temperature as both computed with the fundamental equation of state as implemented in REFPROP [17]. Saturated liquid densities and the critical point were also calculated with the same fundamental equation of state

Many technical applications (e.g., heating and cooling processes, polymer production and solvation in process technology)work at pressures below 5 MPa. For this reason, we have extrapolated the measured isotherms to ambient pressure. All models reproduce the values obtained by the fundamental equation as available in the current version of REFPROP within relative deviations of − 0.28 to − 0.1 %. Each model shows the largest deviation at the 393 K isotherm, and again, the Eureqa model yields the best result with a maximum deviation of − 0.20 %. We aim to consider boundary conditions, such as the extrapolation behavior, in further investigations.

5 Conclusions

To decrease the time and even financial expenditure for accurate thermodynamic property modeling, it is necessary to gather suitable experimental data with respect to accuracy and information content. Therefore, we investigated the application of OED to liquid-phase densities of ethylene glycol by calculating the information content of existing experiments along different isotherms. With our resulting model M3 fitted to the most informative five isotherms, we reproduce the measured data with a maximum relative deviation of 0.02542 %. Compared to the maximum relative deviation of the model involving all eight isotherms M1 of 0.02243 %, this is a very promising result. These results are confirmed using the measured data of propylene glycol 0.01571 % compared to 0.01434 %). The smaller deviations for propylene glycol can be explained by the model development originally based on these data. By fitting two new models (M2 for ethylene glycol and propylene glycol) using the best calculated selection of isotherms, we demonstrate that, with OED, sufficiently accurate models can be developed with fewer experiments than traditionally used.

Due to the limitation to already measured data, we are not able to compare these results with a model fitted to the best five freely selected isotherms. At this point, new measurements become necessary to compare the models with the best five measured and the best five freely selected isotherms, where we are confident to achieve even better results. Finally, the comparison of the model M3, where also the exponents were fitted, shows a significant improvement for ethylene glycol compared to the model M1. However, this is not the case for propylene glycol, where the model M3 shows large deviations regarding the not selected isotherms. This result highlights the importance of the sequential optimal experimental design described at the end of Sect. 1 to select experiments for nonlinear models and the fundamental choice of the functional form. The investigation of the extrapolation behavior underlines this even more, where the use of different thermodynamic criteria beyond the actual measurement data are crucial.

Considering the typical temperature range of thermodynamic measurement devices such as the high-pressure vibrating tube densimeter used in our previous studies, i.e., (273 to 473) K, numerical studies have shown that isotherms near the ambient temperature are often selected last. From an experimental point of view this is advantageous because these are measurements with larger uncertainty since a thermostating task at ambient temperature is often tricky (e.g., a circulating thermostat has to switch between heating and cooling against the ambient temperature).

Our next steps are:

  • Measuring the free calculated isotherms, fit the data to the new model and compare the results to the existing models.

  • Defining different objective functions in addition to those describing the parameter uncertainty.

  • Developing methods to include thermodynamic criteria as boundaries for the model (e.g., extrapolation behavior).

  • Using OED based on the nonlinear model (with free exponents) using the sequential process described in Sect. 1.

  • Investigating the influence of the number of terms to deal with the problem of over-fitting.

With these next steps, we aim to create the basis for an OED setup specialized for the measurement of thermodynamic properties. The data for this paper and the calculations performed are provided in the supporting information.