Planning of Measurement Series for Thermodynamic Properties Based on Optimal Experimental Design

Decreasing the time required for accurate thermodynamic property measurements is extremely desirable for model development that can respond to the needs of science and industry within a short time frame. Here, we demonstrate the application of optimal experimental design to measurements of thermodynamic properties. The technique is explored using the fitting of a Schilling-type equation from published (p,ρ,T)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p, \rho , T)$$\end{document}-measurements of ethylene glycol and propylene glycol. The analysis shows that a fixed-exponent fit using (p, T)-measurements along the five most informative isotherms produces models of relative density errors comparable to those obtained using the data along all investigated isotherms, i.e., eight or nine. It is also argued that a calculation of optimal isotherms prior to the measurement series can further increase the precision at no additional experimental effort.


Introduction
For scientific, but particularly for industrial applications, it is absolutely valuable to provide equations of state (EOS) that accurately describe the thermodynamic behavior of the concerned fluid substances within a narrow time frame. However, accurate 1 Experimental densities (symbols) and interpolated densities (dashed lines) measured by Yang et al. [1]. , T ≈ 283.32 K; , T ≈ 293.14K; , T ≈ 293.14 K; , T ≈ 298.18 K; , T ≈ 313.13 K; , T ≈ 333.09 K; , T ≈ 353.13 K; , T ≈ 373.22 K; , T ≈ 393.09 K types of correlation models for liquid-phase densities to support the understanding of their characteristics.
In the present study, we illustrate how OED could have been used for the planning of the measurement series conducted by Yang et al. [1] in order to reduce the amount of density measurements of ethylene glycol needed to reliably adjust the parameters of a Schilling-type equation. To confirm our results regarding the benefits of OED, the calculations were also applied to measured densities of propylene glycol (combined expanded uncertainty ( k = 2 ): 1.78 kg·m −3 ) [2]. In spite of its capabilities to select measurements with the highest information content, OED is seldom applied in the field of thermodynamic property research. We conjecture that the primary reasons for this are: -a lack of knowledge about OED technologies, -the lack of software tools for easy application of OED, -the requirement to specify the underlying model beforehand.
For a detailed overview of the use of OED in different areas of chemical and thermal engineering research, we refer the reader to Francheschini and Macchietto [6]. With respect to thermodynamic property research, we point out the work of Bardow and colleagues [7,8], which deals with the experimental characterization of liquid-liquid equilibria utilizing OED. In contrast to those investigations, the present work focuses on the temperature and pressure values for density measurements, with the aim to minimize the parameter variance of the correlation model. The method described here can be used equivalently for measurements of other thermodynamic and transport properties (e.g., speed of sound, specific heat capacity, viscosity, etc.).
When changing the isotherm in a usual measurement series, establishing thermal equilibrium with a typical (p, , T) apparatus is rather time-consuming compared to setting new pressures along an isotherm. For this reason, we decided to use OED to select a subset of the most informative isotherms. OED can be imagined as an in situ tool for linear or nonlinear modeling. In the latter, case, the following sequential workflow is employed: (a) choose an initial set of (p, T) state points to be studied (e.g., two isotherms with five pressure values each), (b) conduct the measurements (e.g., density), (c) fit the model (e.g., the Schilling-type model), (d) answer the following questions: -Can the model reproduce the measured data with sufficient accuracy (e.g., within the experimental uncertainty)? -Can a significant amount of information be gained from additional measurements?
(e) If the latter is not the case, the experiment is terminated. Otherwise, select the next most relevant isotherm using OED (3) and proceed with 1. In this paper, we utilize in a first step, linear OED to select a subset of isothermal experiments from two given sets of measurements, which are already available. We use the measurements selected to create new models for ethylene glycol and propylene glycol based on the same functional forms as used before. The results are compared to the existing models (4).

Modeling Approach
For simulations in process engineering, accurate and easily applicable models that describe the real fluid behavior are needed. Considering the lack of knowledge about more complex compounds at the molecular level, empirical equations of state have become an established way of modeling thermodynamic behavior. In industrial applications, the empirical Tait equation (1888) is still widely used to model densities of liquids [9]. However, a contemporary approach to model liquid-phase densities was developed by Schilling et al. [3] (Table 1). This is an equation in form of polynomial terms; this type of terms is also used within fundamental equations of state. For the modeling of fundamental equations of state, which provide the possibility to derive all thermodynamic properties by differentiation of these equations, functional forms such as polynomial and exponential terms are involved [10,11]. In 1989, within the scope of developing fundamental equations, Setzmann and Wagner [10,12] established the modeling tool OPTIM, which combines a modified step-wise regression analysis based on a "bank of terms" with elements of evolutionary optimization methods. Nowadays, this tool is mostly unused. However, it was applied to model the density of liquid n-heptane, n-nonane, 2,4-dichlorotoluene and bromobenzene in the temperature range from 233.15 to 473.15 K at pressures up to 30 MPa by Schilling et al. [3]. Due to its good representation of liquid-phase experimental data, further works, e.g., of Sommer et al. [13], Sampson et al. [2] and Yang et al. [1], also relate to this equation. Table 4 (M1 and M1.1) contains the characteristics and Table 5 (M1 and M1.1) provides the parameter values of the published models for ethylene glycol and propylene glycol, which are used in this work. The so-called Schilling-type equation has the following functional form: The parameters and represent the reduced temperature and the reduced pressure, respectively, while T 0 , p 0 and 0 are chosen by the measured values and I Pol defines the number of terms. In Sampson et al. [2], the parameters were set to T 0 = 150 K, p 0 = 100 MPa and 0 = 1000 kg·m −3 , and n j , t j and p j were fitted to the experimental data of propylene glycol setting the number of terms I Pol 8. With this model, the measured densities for propylene glycol can be reproduced within a maximum error of R = 0.015 % . Due to the similarity between both substances, the same setup was used to fit the parameters n j for ethylene glycol in Yang et al. [1], keeping the exponents t j and p j fixed (see Tables 4 (M1) and 5). The experimental data of ethylene glycol is represented within a maximum error of R = 0.025 %.
Most current modeling approaches, especially for nonlinear models, have a high development effort and the requirement of thorough background knowledge in common. In the work of Yang et al., also a new approach utilizing Eureqa (Table 1) was employed to investigate the suitability of machine learning-based symbolic regression for nonlinear EOS modeling [4,5]. The form of terms was chosen as polynomials to create a Schilling-type equation 2.1 and 2.2 equivalent to the existing model, while the aim was to reduce the number of terms, to fit the exponents and to stay within an adequate uncertainty (for model characteristics see Table 4 (M1.1) and for model parameters see Table 5 (M1.1)). This model is included in the evaluation, on the one hand to better interpret the deviations between modeled and measured densities and on the other hand to better illustrate the influence of the functional form on the extrapolation behavior. In addition to the less complex functional form, a lower maximum error of R = 0.020 % and a better extrapolation behavior are achieved.

Background on Optimal Experimental Design
Optimal experimental design is a technique to select experiments which are most informative about the unknown parameters of a given model. We refer the interested reader to Pázman, Uciński and Atkinson et al. [14][15][16] for a comprehensive background. We wish to emphasize that OED differs from the statistical design of experiments, where one seeks, e.g., space-filling experiments by latin hypercube sampling. Here, we use OED to reduce the experimental effort required to identify the parameters n j , j = 1, … , I Pol , in Eqs. 2.1 and 2.2 with exponents t j and p j fixed. In this case, the dependent variable ∕ 0 is a linear function of the parameters, which simplifies the subsequent description. To be specific, we deduce from Eqs. 2.1 and 2.2 that each triple of measured data (p i , i , T i ) , after conversion to reduced  is the so-called elementary Jacobian associated with the ith measurement. Following the theory of OED, the information content of a single measurement is expressed through the elementary Fisher information matrix (FIM) Each elementary FIM is a symmetric, positive semidefinite rank-1 matrix. Moreover, since we can assume individual measurements to be statistically independent, the FIM associated with a series of experiments is obtained as I = ∑ i I i . It is common to apply a scalar objective function, which converts the FIM associated with any collection of experiments into a single number and, thus, allows a comparison with any other collection of experiments. We utilize here the A-criterion where j is the jth eigenvalue of I. Clearly, at least eight individual measurements are required to render the FIM positive definite and the criterion Ψ A (I) finite. In this case, the value of Ψ A (I) is proportional to the sum of the squared semi-axes of confidence ellipsoids in the 8-dimensional parameter space. Consequently, we seek to minimize Ψ A (I) , possibly subject to constraints on the experimental budget, in order to maximize the information content of the collection of experiments selected and simultaneously minimize parameter variation in the face of measurement errors.
As we argued in the introduction, it is advantageous from a practical point of view to take measurements along an isotherm. In the following section, we will therefore optimize over experiments each of which comprises several measurements obtained by varying the pressure along an isotherm.

Numerical Results
The approach described in Sect. 3 is used to optimize the measurement series by maximizing the information contained in a set of eight isothermal measurements at approximate temperatures T = (283, 293, 298, 313, 333, 353, 373, 393) K drawn from the experiments conducted by Yang et al. [1]. The measurement plan for each isotherm consisted of nine pressure values, approximately p = (5, 10, 15, 20, 30, 50, 70, 90, 100) MPa. Table 2 and Fig. 2a show the most informative selection of isotherms (black dots) to fit the parameters n 1 , … , n 8 using a varying number of isotherms. As expected, the value of the objective (Eq. 3.2) decreases as we allow more isotherms to be included. However, when transitioning from five to six isotherms, the further decrease in the objective is small compared to the previous steps. For propylene glycol, measured at approximate temperatures T = (273, 283, 293, 298, 313, 333, 353, 373, 393) K, Fig. 2b and Table 3 show a similar selection of best isotherms and for the decay of the objective function values. In this case, the measurement plan for each isotherm consisted of eight pressure values, approximately at p = (5, 10, 15, 20, 30.5, 50.5, 71, 91) MPa.
We aim to compare the accuracies of the existing models, see Table 4 (M1), with new models of the same functional form, where the parameter n 1 , … , n 8 of Eqs. 2.1 and 2.2 are fitted using only the five best isotherms, see Table 4 (M2). To this end, we show in Fig. 3a and c the relative deviations of calculated densities from experimental values as a function of pressure along all eight isotherms. It can be seen that using only five out of eight isotherms for the fitting process yields a greatest deviation only larger by approximately 0.003 % . The same approach is used to fit a new model for propylene glycol to the best five isotherms, see the lower part of Table 5 (M2). With an absolute difference in the greatest deviation of 0.0015 % for propylene glycol between the models M1 and M2 ( Fig. 4a and b), the result for OED is even more promising than for ethylene glycol.
Next, we investigate the information gain obtained by allowing the temperatures of the isothermal experiments to be chosen freely within the interval from T = 283 K to T = 393 K for ethylene glycol and T = 273 K to T = 393 K for propylene glycol. To this end, we introduce an equidistant grid of possible temperature values with a spacing of 2 K; the pressure values for each group of isothermal experiments are the same as measured. The optimal selections of up to five isotherms can be found in Fig. 2a and the bottom half of Table 2 for ethylene glycol as well as in Fig. 2b and the bottom half of Table 3 for propylene glycol.
By comparing both selection approaches, the similarity of temperatures for the best two and three isotherms as well as the difference between those for the four and five best isotherms are noticeable. This leads us to expect an improvement of the model with a fit to the free best five isotherms. However, due to the missing measured values, no comparable model can be developed for this case. Instead, we investigate the importance of the functional form by fitting a new model to the best five measured isotherms, where also the exponents are optimized, see Fig. 3d, Tables 4(M3) and 5(M3). Here, even with five isotherms, the maximum deviations can be reduced significantly. However, for propylene glycol, the model M3 shows large deviations for the isotherms not selected (greatest deviation 0.0431 % ) compared to the deviation of the selected isotherms (greatest deviation 0.0046 % ) (Fig. 4c) and compared to the model published by Sampson et al. [2] (Tables 4 and 5, propylene glycol -M1). This behavior is in conjunction with the conclusion of Yang et al. [1] concerning the Eureqa model (Fig. 3b), which demonstrates the importance of a comprehensive model optimization including the functional form. When fitting the exponents, the model is not linear in the parameters anymore, which affects the selection of the best isotherms. To further exploit the potential of free exponent modeling, we plan to employ the sequential OED approach, described at the end of Sect. 1, for calculating the necessary isotherms based on nonlinear models in future work.
For a more comprehensive model examination between the modeling approaches, the extrapolation behavior was investigated taking the example of ethylene glycol; this was first done with an optimization of the functional form utilizing Eureqa, as shown in Table 4 with model M1.1, and second with parameter identification for a given functional form, as shown in Table 4 with models M1, M2 and M3. Therefore, the models were compared to an unpublished fundamental equation of state in form of the Helmholtz energy as implemented in REFPROP version 10.0 [17]. This model is valid within the following limits:  It is important to note that none of the models reproduces the curvature of densities calculated from the fundamental equation in REFPROP, which means that they are all strictly limited to the calculation of liquid-phase densities. Many technical applications (e.g., heating and cooling processes, polymer production and solvation in process technology)work at pressures below 5 MPa. For this reason, we have extrapolated the measured isotherms to ambient pressure. All models reproduce the values obtained by the fundamental equation as available in the current version of REFPROP within relative deviations of − 0.28 to − 0.1 %. Each model shows the largest deviation at the 393 K isotherm, and again, the Eureqa model yields the best result with a maximum deviation of − 0.20 %. We aim to consider boundary conditions, such as the extrapolation behavior, in further investigations.

Conclusions
To decrease the time and even financial expenditure for accurate thermodynamic property modeling, it is necessary to gather suitable experimental data with respect to accuracy and information content. Therefore, we investigated the application of OED to liquid-phase densities of ethylene glycol by calculating the information content of existing experiments along different isotherms. With our resulting model M3 fitted to the most informative five isotherms, we reproduce the measured data with a maximum relative deviation of 0.02542 %. Compared to the maximum relative deviation of the model involving all eight isotherms M1 of 0.02243 %, this is a very promising result. These results are confirmed using the measured data of propylene glycol 0.01571 % compared to 0.01434 %). The smaller deviations for propylene glycol can be explained by the model development originally based on these data. By fitting two new models (M2 for ethylene glycol and propylene glycol) using the best calculated selection of isotherms, we demonstrate that, with OED, sufficiently accurate models can be developed with fewer experiments than traditionally used. Due to the limitation to already measured data, we are not able to compare these results with a model fitted to the best five freely selected isotherms. At this point, new  Table 4 from experimental values, (a) M1, (b) M2, best five measured isotherms (bold marker), (c) M3, best five measured isotherms (bold marker). , T ≈ 272.73 K; , T ≈ 283.18 K; , T ≈ 293.18 K; , T ≈ 298.12 K; , T ≈ 313.12 K; , T ≈ 333.05 K; , T ≈ 352.99 K; , T ≈ 373.34 K; , T ≈ 392.95 K The investigation of the extrapolation behavior underlines this even more, where the use of different thermodynamic criteria beyond the actual measurement data are crucial.
Considering the typical temperature range of thermodynamic measurement devices such as the high-pressure vibrating tube densimeter used in our previous studies, i.e., (273 to 473) K, numerical studies have shown that isotherms near the ambient temperature are often selected last. From an experimental point of view this is advantageous because these are measurements with larger uncertainty since a thermostating task at ambient temperature is often tricky (e.g., a circulating thermostat has to switch between heating and cooling against the ambient temperature).
Our next steps are: -Measuring the free calculated isotherms, fit the data to the new model and compare the results to the existing models. -Defining different objective functions in addition to those describing the parameter uncertainty. -Developing methods to include thermodynamic criteria as boundaries for the model (e.g., extrapolation behavior). -Using OED based on the nonlinear model (with free exponents) using the sequential process described in Sect. 1. -Investigating the influence of the number of terms to deal with the problem of overfitting.
With these next steps, we aim to create the basis for an OED setup specialized for the measurement of thermodynamic properties. The data for this paper and the calculations performed are provided in the supporting information.  Table 4) for (a) the saturation pressure and (b) the saturation temperature as both computed with the fundamental equation of state as implemented in REFPROP [17]. Saturated liquid densities and the critical point were also calculated with the same fundamental equation of state