Calibration and Uncertainty Estimation for Water Content Measurement in Solids

In the field of water content measurement, the calibration of coulometric methods (e.g., coulometric Karl Fischer titration or evolved water vapor analysis) is often overlooked. However, as coulometric water content measurement methods are used to calibrate secondary methods, their results must be obtained with the highest degree of confidence. The utility of calibrating such instruments has been recently demonstrated. Both single and multiple point calibration methods have been suggested. This work compares these calibration methods for the evolved water vapor analysis technique. Two uncertainty estimation approaches (Kragten’s spreadsheet and M-CARE software tool) were compared as well, both based on the ISO GUM method.


Introduction
Moisture metrology has experienced rapid development in recent years, as more stakeholders have realised that improved moisture content control can provide significant gains in process efficiency and/or product quality [1]. However, moisture determination in solids poses more challenges than in gases or liquids. The analyte (most importantly, water) can be bound to the matrix with varying degrees of strength (due to different types of interactions), the material is often difficult to homogenize and the controlled release of moisture for the measurement can be complicated [2].
Thermal methods are used for a variety of purposes, from oxidation studies to measuring phase changes [3]. Differential scanning calorimetry (DSC), for example, has been used from measuring the energetic effects of protein folding [4] to polymer biodegradability [5]. Another thermal method, thermogravimetric analysis (TGA), has found uses in studies looking into the properties of polymers [6] and pyrolysis processes [7]. Also, the most common technique for moisture content analysis is a thermal method, loss on drying (LoD), where the mass of the sample is measured before and after heating, with the difference being attributed to moisture [8]. Using heat to access the moisture bound within a material is very robust. Given sufficient time and temperature, moisture will be released from the solid sample, yielding results representative of the whole sample. In LoD other volatiles, in addition to water, evaporate upon heating as well-thus LoD measurements yield moisture content, a broader term than water content. In some instances, it may be necessary to measure water content specifically, which requires more selective methods [9]. This need has been met by hybrid approaches combining the heating step of LoD, for a robust release of water, with detection cells that are selective for water. One such technique is evolved water vapor analysis (EWV), which has been used in this study.
Primary measurement methods (PMMs) are methods for which all relevant measurement uncertainty sources are known, quantified and their measurement uncertainty can be estimated using the ISO GUM method [10]. A PMMs' measurement model is sufficient for predicting the analytical signal [11] and thus does not require calibration with an analyte, therefore lacking a calibration curve. In chemistry, PMMs are gravimetry, titrimetry, isotope dilution mass spectrometry and coulometry [12], an example of the latter being EWV. Despite being PMMs, recent publications have demonstrated the usefulness of calibrating EWV and Karl Fischer titration (KFT), when highly accurate results are desired [13][14][15]. As PMMs, EWV and KFT are used to calibrate secondary measurement methods, to ensure traceability to SI units [10,16]. Therefore, it is paramount that their measurement results and uncertainty estimates are as reliable and representative as possible.
The current work evaluates the usefulness of calibrating PMMs, in particular EWV, and the associated uncertainty of the measurements.

Experimental Setup
The experimental part of this work was performed with an EWV instrument "easyH 2 O" (Berghof Products + Instruments GmbH, Germany) controlled by Aqualys software. This instrument uses coulometry, i.e., measurement of current and time, for the direct measurement of water content. First, the sample is placed in an oven and heated according to a predefined temperature program. The oven is continuously flushed with a stream of dried gas that carries the evaporated components to the sensor, which is constantly electrolyzing water molecules it comes in contact with. The sensor is coated with a layer of anhydrous P 2 O 5 , which is highly hygroscopic. It absorbs water molecules from the stream of gas, which are thereafter electrolysed into H 2 and O 2 and the amount of water is calculated via Faraday's law (Eq. 1) [17].
where m is the mass of water (g), M is the molar mass of water (18.015 g·mol −1 ), Z is the number of released electrons per molecule (2 for water), i(t) is the electrolytic current consumed between t 2 and t 1 and F is the Faraday constant (96 485.33 C·mol −1 ).
The instrument measures the change in current and this is also used to determine the measurement end-point by setting a threshold current. This value is determined during method development and also depends on the condition of the instrument (e.g. partially depleted molecular sieves to dry the carrier gas result in a higher background current, necessitating a higher value for the threshold). The manufacturer recommends checking the condition of the system daily, due to possible background drift related to sensor fouling and molecular sieve saturation in the drying column. This is done by regularly analysing a set of reference or calibration standards. The EWV was operated according to the manufacturer recommendations, the calibration samples were used to calculate a correction factor, called cell constant, (Eq. 2) to compensate for possible bias of the system-a single point calibration. If the measured water content differed by more than ± 5 % from the reference value, the sensor was regenerated and the sample oven cleaned. Work was resumed once calibration sample measurements were within ± 5 % of the reference value. To account for the residual water remaining on the sample boat or in the system, a "tare" (empty sample boat) was measured before each measurement sequence. The obtained value was subtracted from subsequent measurement results.
Kaolinite clay, wood pellet and cardboard samples were analysed as case studies to assess the performance of the EWV analysis and data treatment methods with real life samples. All samples were analysed in replicates and the measurement series were repeated on several days. Initial results have been published earlier [15]. In this work a more detailed study on the effects of the calculation methods was undertaken. An added benefit of including more complex samples was the possibility to reveal sources of error that occur during sample preparation. For example, wood pellet samples had to be homogenised before analysis, unlike calibration samples which were homogenous powders.

Reference Materials
Two standards samples were used, water oven standard 1 % (product n. 1.88054.0005, hereinafter RM) and α-d-lactose monohydrate (hereinafter ADL). The RM is a reference material produced by Merck KGaA, its water content and the associated uncertainty estimate were provided in the certificate of analysis. The ADL was sourced from Acros Organics, its reference value was calculated from its molecular formula. The uncertainty of ADLs water content was estimated from METefnet project results [18], the uncertainty estimate was derived from an interlaboratory comparison measurement series with said sample material. The water content and associated expanded uncertainty (U) estimates at 95 % (coverage factor k = 2) confidence level for both standards are given in Table 1.

Calibration Methods
For the results presented in this article, all calculations were performed with the amount (mass) of water measured in a sample. This applied to both calibration and uncertainty estimation calculations. The EWV instrument is a coulometer, it counts the number of electrons in time, which is linked to the amount of water molecules through Eq. 1.
Single point calibration is rapid, but it has a significant drawback-lower reliability in comparison to multiple point calibration. The results depend on a single standard sample measurement. As an alternative, it was decided to investigate the suitability of multiple point calibration. It is a more time-consuming method, but the uncertainty of a single reference measurement will have a smaller impact on multiple point calibration.
In both cases, the calibration parameters (either cell constant or the regression parameters) were used only for 1 day. For multiple point calibration, the calibration measurements were spread out during the measurement day in a random order.

Single Point Calibration
In routine operation, the manufacturer recommends determining the cell constant (CC; Eq. 2), using a solid sample with a known water content. The CC should be determined daily and, according to the manufacturer, the acceptable values are between 0.8 and 1.3. This constant was used to compensate for changes in the device's performance and behaved as a slope, derived from the single point calibration. This relied on the assumption that the efficiency of the method, i.e., the electrolysis of water molecules into H 2 and O 2 , is constant across the measurement range. The initial experiments in our lab indicated, that the CC was not completely stable within a day of normal operation (Fig. 1A), the average time between two consecutive reference measurements was 72 min. Thus, it was decided to determine the CC multiple times a day. The measurement series were structured as shown in Fig. 1B, where n stands for a number of measurements in a series. The CC was determined separately for each measurement series (Fig. 1B) and used until a new cell constant was determined. The same approach was applied to tare values, which were subtracted from sample results (Eq. 3). Each measurement series was handled independently of others. While this approach was more straightforward to use than the multiple point calibration, the single point calibration was inherently less reliable. This was because the reported results depend solely on a single CC determination.
In Eq. 3 m watersample.f is the amount of water in the sample after tare subtraction, m watersample.i is the amount of water found in the sample before tare subtraction, m watertare is the amount of water from the tare measurement and CC is the cell constant as defined by Eq. 2.

Multiple Point Calibration
The calibration curve was constructed separately for each measurement day, by pooling all CC and tare measurements from 1 day, the measurements with varying amounts of reference materials were spread randomly throughout the measurement sequence. As the number of electrons measured by the instrument is linked the amount of water, the calibration curve was constructed as follows: the amount of water measured in a calibration sample on the X-axis and the reference amount of water in the sample (calculated from the sample amount and reference water content) on the Y-axis (Fig. 2). The acquired data was processed using a software tool (later as M-CARE) published previously by Collège Français de Métrologie [19]. This tool enables the use of generalized Gauss-Markov regression, which allowed uncertainties to be assigned to quantities on both axes.
Different aliquots of standard samples were weighed in such a manner, that different amount of water was measured in each CC measurement. For example, a 0.1000 g sample of RM contains 980 μg water and a sample of 0.0500 g of RM 490 μg of water. This way, several calibration points could be prepared from a single standard sample material. The standard sample sizes were chosen to encompass the analytical range-to ensure that all sample measurements were calculated via interpolation as opposed to extrapolation; the latter being less accurate.
All tare measurements were included as 0;0 points. As shown in Eq. 3, the amount of water found in tare measurements was subtracted from the amount of water in the samples before data processing. The tare measurements were included in the calibration curve because they were intrinsic to data processing and there was an uncertainty component associated with tare measurement (as seen in Table 2 and Fig. 3).
As mentioned above, for tare measurements both X-and Y-axis values were 0. For standard sample measurements, the X-axis value was the experimental result and Y-axis value was the reference value (e.g. from a certificate of analysis). The M-CARE tool was used to calculate the linear calibration curve (slope and

Uncertainty Estimation Methods
Both uncertainty estimation methods were based on the ISO GUM approach. But the way how uncertainty sources were accounted for by different uncertainty components differed. The Kragten spreadsheet approach relied on acquiring sufficient experimental data to quantify all relevant uncertainty sources one by one. The M-CARE tool used an initial set of calibration data to model a larger number of measurements, in turn used to estimate the resulting uncertainty.

Kragten Spreadsheet
The first method used for uncertainty estimation was the Kragten [20] spreadsheet for quantifying the relevant uncertainty components. The measurement model contained the uncertainty components presented by Table 2.
The first component u sample was calculated as the pooled standard deviation of at least three measurement series with the sample in question (e.g. cardboard). Each measurement series had three replicates with the sample and the series were performed on different days.
Systematic error of the EWV system, u bias.TC , was estimated from a series (n = 10) of RM measurements on 1 day as the difference between the measured water content and the reference water content. The RM water content uncertainty (Table 1) was combined with the standard uncertainty estimate from the RM measurements, the latter assuming rectangular distribution to avoid underestimating this uncertainty component.
Reproducibility of the EWV system (u rep. ) was calculated as the pooled standard deviation of calibration sample measurements spread over 2 months, each measurement day being one measurement series. The standard deviation was calculated on the difference between the measured and reference water content.
The uncertainty component from tare measurements, u tare , was calculated as the pooled standard deviation of tare water content values, with each measurement day being one measurement series.
The uncertainty from weighing the sample (u w ) was calculated as comprising the uncertainty originating from the resolution of the balance and the repeatability of the balance. To avoid underestimating this uncertainty component, the tolerance of the balance was used for the latter, instead of the experimentally determined repeatability (0.00015 g and 0.00004 g, respectively). Since the sample boat was weighted twice, this component was multiplied by 2.
The abovementioned uncertainty components were quantified separately using a pool of data gathered over an extended time period (2 months). The Kragten approach allows the user to compare the different components as seen in Fig. 3. All of them were calculated in terms of mass of analyte (amount of water). The individual uncertainty components were combined for a standard uncertainty estimate representative of the measured sample. For estimating the uncertainty components u bias.TC , u rep. and u tare , "raw" measurement results from the instrument were used; u bias.TC expressed the possible deviation from the expected value, u rep. used the spread of reference measurements and u tare used the spread of tare measurements. The uncertainty of mass measurement (u w ) was dependent on the calibration of the analytical balance.

M-CARE Tool
By providing the M-CARE tool with the necessary inputs-both measured and reference values and the respective uncertainty estimates (Table 3), the tool calculated the slope and intercept, their variances and covariance. An advantage of this tool was that it accounted for variance-covariance effects between the calibration points and assigned an uncertainty range to the calibration curve. The data inputs for the M-CARE tool are given in Table 4.
In the case of tare measurements, their uncertainty sources were: (i) intrinsic repeatability of the EWV instrument, (ii) pooled standard deviation from tare measurements from multiple days. The uncertainty sources for standard samples were: (i) the uncertainty associated with the particular reference material (e.g. certificate of analysis) and the mass of the sample, (ii) standard deviation of the cell constants measured on that day. Different reference materials had different uncertainty estimates for their water content and the relative impact of the sample mass measurement was dependent on the sample size (weight). To use the cell constant, which does not have a unit, the relative standard deviation of cell constants was multiplied with the amount of water measured. The repeatability, standard deviation of the study sample (sample with unknown water content), was used as its uncertainty estimate, pooled standard deviation was used if available to include within-lab reproducibility (intermediate precision). The uncertainty of the amount of water in the study sample was calculated using Eq. 4.
where σ(α 0 ) is the variance of the intercept, Y is the amount of water measured in the sample, Cov(α 0 α 1 ) is the covariance, σ(α 1 ) is the variance of the slope, u c (Y) is the uncertainty estimate of the measurement of water in the sample and u c is the standard uncertainty of the measurement result.

Comparison of the Calibration Methods
Both calibration methods yielded very similar results ( Table 4). The differences became more pronounced only for samples with higher water content (wood pellets and cardboard). Even so, the relative difference between the obtained water content values were below 1 % of the measurement result and well within the uncertainty range calculated using the Kragten spreadsheet. This was further corroborated by the E n scores, all being below 1. While the differences between the two calibration methods were very small, it is important to note here, that the multiple point calibration used interpolation between calibration points to calculate the water content of the samples. For one, this would inherently increase the accuracy of this approach. Furthermore, the inclusion of multiple calibration points (with varying water content) would ensure that the calibration measurements were more representative of the sample-the analysis time is highly dependent on the quantity of water in the sample and there is potential for slight changes within the system (see Fig. 1A). The multiple point calibration was also less affected by the variability of the calibration measurements as it used multiple determinations-if the only calibration measurement used for single point calibration was biased, then there was no way to negate that effect.

Comparison of the Uncertainty Estimation Methods
The uncertainty estimates calculated with both methods (Kragten spreadsheet and M-CARE) for the measurements with multiple point calibration showed the same tendencies as did the calibration results. Although the E n scores were below 1 in all cases, samples with higher water content had a larger difference between their respective uncertainty estimates (Table 5). Cardboard, the sample with the largest water content had the largest difference between Kragten and M-CARE uncertainty estimates (0.270 g/100 g and 0.400 g/100 g, respectively; Table 5). Wood pellet samples had the second highest water content, just above 8 g/100 g, and the second largest difference between its uncertainty estimates 0.400 g/100 g and 0.410 g/100 g. In the case of kaolinite clay, the sample with the lowest water content, both uncertainty estimates were similar. For samples with low to medium water content, both uncertainty estimation methods yield comparable results. However, the results differed for samples with a high water content (exceeding 10 g/100 g). This may be linked to different analysis time-usually the calibration sample analysis was comparatively short, while the analysis of a water rich samples could take several times longer. Thus, the analyses of samples with low and high water content are not directly comparable unless more measurements with the standard sample are performed and care is taken to get a closer agreement between the study and standard samples. Why the uncertainty increases with larger water amounts could be linked to the drift (or background current) instability over longer time periods. The baseline signal is determined before the analysis and cannot be measured during the run. Furthermore, with more complex samples sensor fouling (changing the cell constant) may have a more prominent role. In addition, analysing large quantities of water will "deplete" the sensor faster-requiring regeneration at shorter intervals.
Another benefit of the M-CARE tool is the relative ease of use, all the necessary experimental data points can be acquired within a single analysis batch (in the order of 10 to 20 measurements). This contrasts to the Kragten spreadsheet, which requires the user to determine the different uncertainties components separately (in this study this required a further 50+ measurements).

Conclusions
The study focused on the calibration and uncertainty estimation for moisture determination in solids. For calibration, a single-and multiple point calibration were compared. For estimating the uncertainties, two tools based on the ISO GUM approach, were compared: Kragten spreadsheet and M-CARE.
Both calibration methods showed good agreement with each other over the range of measured water contents (approximately 1 g/100 g to 10 g/100 g), with the largest differences remaining below 1 % of the results. The single point calibration required less work, but the results could be considered less reliable because the trueness of the calibration depends only on one experimental measurement. The proposed alternative, using a multiple point calibration, provided a more representative calibration as it was less dependent on any single measurement. Because the calibration curve was constructed as a function of the amount of water measured in a sample, it accounted for possible effects due to longer analysis time as well.
Both uncertainty estimation methods yielded results in the same order of magnitude and were thus considered comparable. However, the uncertainty estimation using M-CARE was a more streamlined procedure, requiring fewer uncertainty components to be quantified separately, therefore making it more accessible. On the other hand, the Kragten spreadsheet was more suitable for detailed analysis of the different sources of uncertainty as it quantified each source of uncertainty separately. This information can be used to find the areas most in need of improvement.
Regardless of the uncertainty estimation methods, the results showed the importance of the sampling method. A more complex sampling method (in this study wood pellets) can lead to higher uncertainty. This can be circumvented by a rigorous series of measurements to assess both repeatability and reproducibility.
When working with uncertainty estimation, the first objective of the analyst tends to be the lowest possible uncertainty estimate, as it is seen as an indicator of excellence for a given task. However, the reported values must always be realistic for the measurement situation, if they fail to be so, it undermines the validity of the uncertainty estimation. Thus, the uncertainty estimate must first and foremost be realistic and minimizing the value is secondary.