Expression for uncertainty intervals handling skewness when the relative standard uncertainty is independent of the measurand level

Uncertainty intervals for many measurement results are typically reported as symmetric intervals around the measured value. However, at large standard uncertainties (> approx. 15 %–20 %), it is necessary to consider asymmetry of the uncertainty intervals. Here, an expression for calculating uncertainty intervals handling asymmetry when the relative standard uncertainty is independent of the measurand level is presented. The expression is based on implementation of a power transformation (xB\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x}^{B}$$\end{document}) for transformation of measurement results in order to achieve results that have a symmetric and approximate normal distribution. Uncertainty intervals are then calculated in the transformed space and back-transformed to the original space. The transformation includes a parameter, B, that needs to be optimized, and this can be based on real results, modelling of results, or on judgement. Two important reference points are B equal to 1 that corresponds to an approximate normal distribution of the original measurement results, and B approaching 0 that corresponds to an approximate log-normal distribution of the original measurement results. Comparisons are made with uncertainty intervals calculated using other expressions where it is assumed that measurement results have a normal distribution or a log-normal distribution. Implementation of the approach is demonstrated with several examples from chemical analysis.


Introduction
It is typically assumed that the uncertainty of results from many measurements can be described by a normal distribution, i.e., a symmetric distribution. This assumption is stated in the GUM document (Evaluation of measurement data-Guide to the expression of uncertainty in measurement) [1,2] that is a fundamental reference document for evaluation of measurement uncertainty. The document is based on a linear combination of random variables giving a normal (symmetric) distribution of results according to the central limit theorem. However, in many instrumental techniques (for instance in chemical analysis), results are typically generated by multiplicative combination of random variables, i.e. distributions of results are driven towards a log-normal distribution (according to the multiplicative version of the central limit theorem). At small or modest relative standard uncertainties < 15 %-20 %, normal and log-normal distributions are so similar that the normal distribution can serve as a suitable approximation. When the relative standard uncertainty becomes larger than 15 %-20 %, asymmetry, or skewness, needs to be handled when calculating uncertainty intervals [3][4][5]. This is outside the scope of GUM [1,2], and is commonly handled by transforming data using log 10 x or log e x [3,[5][6][7][8] prior to calculation of uncertainty intervals. Transformation is also widely used in microbiological enumeration methods [9,10]. Hence, it is typically assumed that distribution of measurement results can be approximated with either a normal distribution or a log-normal distribution. Recently, an approach based on a power transformation for handling a broader spectra of skewness has been proposed [4].
Here, the approach of power transformation is discussed further, and the purpose is twofold. First, to explain and clearly describe the approach for calculating uncertainty intervals, and second, to compare for different types of chemical analyses the uncertainty intervals obtained with other expressions for uncertainty intervals (assuming either normal or log-normal distribution of measurement results).

List of symbols
Coverage factor for a given probability n Number of data s Standard deviation* s rel Relative standard deviation in original space s trans Standard deviation in transformed space using x B opt transformation s rel,trans Relative standard deviation in transformed space using x B opt transformation s log 10 Standard deviation in transformed space using log 10 transformation x Measurement result in the original space x trans Measurement result in transformed space using x B transformation x log 10 Measurement result in transformed space using log 10 transformation *In this paper, for calculating uncertainty intervals, assumed to be equal to the combined standard uncertainty Transformation based on x B when evaluating measurement uncertainties

Transformation based on x B
Many data sets with asymmetric distributions can be transformed to data sets with symmetric and approximated normal distributions using a power transformation according to where x trans and x are the transformed and original data, respectively, and B is a parameter that is optimized with the goal that transformed data should have a skewness close to zero, i.e., become symmetric. Equation 1 can then be written as where B opt is the optimized B . Different values of B will transform different distributions to symmetric distributions as shown in Fig. 1. Fig. 1 Illustration of B values that will transform different distributions to a symmetric distribution using the transformation x B 1 3 This has been studied using Monte Carlo simulations [4]. As illustrated in the figure, there are two values that can serve as reference points on a " B-scale", 0 and 1. With B = 1 , no transformation will occur, i.e. original distributions that can be approximated with a normal distribution will have an optimized B value equal to 1. Using B close to 0 is analogous to taking the logarithm of the values, and original distributions that can be approximated with a lognormal distribution will be transformed to normal distribution approximations. Note that B = 0 will transform all values to 1. To avoid this, B values between − 0.0001 and 0.0001 are not used. With B values somewhere between 0 and 1, distributions with other types of positive skewness can be transformed to approximately symmetric distributions. These distributions are here referred to as being "between" normal and log-normal distributions. Note that transformations using the square root, i.e. B opt equal to 0.5, will transform distributions that are "between" normal and log-normal distributions to approximated normal distributions. For original distributions with negative skewness, B values larger than 1 will transform the distributions to approximately normal distributions. Finally, for distributions far from zero with a positive skewness, B values less than 0 will transform distributions to approximately normal distributions. Such distributions can be obtained for instance by adding a constant number to values with positive skewness, or by summing values with different distributions.
After finding an appropriate B opt confidence interval can be calculated in the transformed space and then back-transferred according to Note that for B opt < 0 , the order of data in the transformed space will be opposite to the order of data in the original space. Hence, when calculating a confidence interval, the lower limit of the interval in the transformed space corresponds to the upper limit in the original space.

Calculation of uncertainty intervals
In the following text, the standard deviation, s, is assumed to be equal to the combined standard uncertainty. Different expressions for calculating uncertainty intervals are given in Fig. 2 for a measurement result, x , when the standard deviation increases proportional to the measurand level, i.e. the coefficient of variation (CV) and the relative standard deviation (s rel ) are independent of the measurand level, which is often the case for instance in instrumental analysis [11].
Expressions for uncertainty intervals are given in Fig. 2 when the distribution of the measurement results can be (1) approximated with a normal distribution, (2) have various distributions, or (3) can be approximated with a lognormal distribution. Derivations of the different equations (Eqs. 4-10) are given below.
If the distribution of the measurement results can be approximated with a normal distribution and the relative standard deviation, s rel , is independent of the measurand, an uncertainty interval for a measurement result, x , will be asymmetric since the standard uncertainty, s , will be different at the lower and the upper limit. For small s rel , the difference in s at the lower and upper limit can be neglected, and the interval can be obtained as where k is the coverage factor. However, if the difference in s at the lower and upper limit is taken into account, the uncertainty interval will be asymmetric and can be calculated as [3,4] This equation is valid for k × s rel < 1.
If the distribution of the measurement results can be approximated with a log-normal distribution, the standard deviation in the transformed space after transformation using log 10 x or ln x will be independent of the measurand level. An uncertainty interval in the transformed space for a measurement result, x log 10 , will then be given by where s log 10 is the standard deviation in the transformed space. This will give an uncertainty interval in the original space that is Equation (7) can also be written as where F U is called the expanded uncertainty factor that here is calculated as 10 k×s log 10 [5].
For many distributions including log-normal, transformation using x B opt will result in symmetric distributions that can be approximated with normal distributions. In the transformed space, the relative standard deviation of the transformed data, s rel,trans , will be independent of the measurand level [4]. Hence, an uncertainty interval in the transformed space for a measurement result, x trans , can be obtained as x log 10 − k × s log 10 to x log 10 + k × s log 10 (7) x 10 k×s log 10 to x × 10 k×s log 10 where s rel,trans is the relative standard deviation of transformed data. An uncertainty interval in the original space will then be given by Equation 10 is valid for k × s rel,trans < 1, i.e. for large s rel,trans , Eq. 10 will not be valid. However, distributions in the original space having large standard deviations will typically be asymmetric. In the transformed space, these distributions will ideally be symmetric having small standard deviations not restricting Eq. 10. This is demonstrated in Example 4.
As shown in Fig. 2, intervals calculated using Eqs. 5 and 10 will be identical when B opt approaches 1. In addition, though less obvious, intervals calculated using Eqs. 7 and 10 will be identical when B opt approaches 0 (for instance, using B opt equal to 0.0001) [4]. This shows that Eq. 10 when B opt is known will provide a way of expressing uncertainty intervals handling a broad spectra of asymmetry when the relative standard deviation (s rel ) is independent of the measurand level. Hence, the uncertainty intervals calculated according to Eq. 10 can be considered to be more correct than using Eqs. 4, 5 or 7 if an appropriate value of B opt is used, although the improvements are generally small. As discussed below, however, estimation of a proper value needs a measurement model or extensive raw data.
Note that transformation based on log 10 x , log e x and x B opt will result in symmetry around the median. Hence, an interval (in the original space) calculated using Eqs. 7 or 10 will result in an interval that will cover the median with a given probability (95 % when using k equal to 2). In many applications, it is sensible to use the median as what is intended to be measured when skewness originates from the measurement process.

Estimation of B opt
A value of B opt in Eq. 10 for a given data set that will result in a skewness close to zero in the transformed space can be obtained using mathematical tools available in many calculation software (including the widely used Microsoft Excel [4]). Although skewness is here utilized to optimize B , other distribution characteristics can also be considered. A detailed discussion of this is beyond the scope of this work. For the purpose of this study, skewness is considered to be appropriate for optimization.
However, in reality, it is difficult to find an optimized value of B based on data set of experimental data [4]. The accuracy of B opt will improve with increasing number of data in the data set and with increasing coefficient of variation (CV) of the data set [4]. However, the extremely large numbers of experimental data (typically > 10 3 to 10 4 ) that are needed are rarely available. This is not surprising, and it is well-known that departure from normality has to be quite large in order to demonstrate non-normality [12]. It has been suggested that without any other information of a proper value for B opt [4], it is sensible to assume B opt equal to 1 when CV is less than approx. 15 % (i.e. no transformation of the data is performed). For these low CV, the value of B opt is not critical, and different values of B opt will result in similar measurement uncertainty intervals. For CV > approx. 15 % to 20 %, it is often sensible to assume B opt close to zero (for instance 0.0001), i.e. to assume a log-normal distribution. A proper value of B opt can be obtained from Monte Carlo simulations if a relevant model equation is available. Alternatively, a general agreed value of B opt could be used for a specific measurement procedure. For instance, in microbiology, transformations using the square root, i.e. B opt equal to 0.5, are sometimes used [13].
Control samples and control charts that are important tools in quality work [14], describing within-laboratory reproducibility, can contain data in the order of 10 2 to10 3 . However, typically this is not enough, or barely enough, to obtain a relevant estimate of B opt . Some examples of withinlaboratory reproducibility data from different types of chemical analysis are given in Table 2.

Combination of uncertainty components with different asymmetry
Above it has been assumed that the standard deviation, s, is equal to the standard uncertainty. However, it is of interest to be able to combine uncertainties components with different distributions having different asymmetry for instance when adding a component related to bias or sampling to a component describing analytical precision. This can be performed as described previously [4] by: 1. finding B opt for each uncertainty component 2. determination of mean and standard deviation in the transformed space for each uncertainty component 3. generation of random large normal distributed data set (using the determined mean and standard deviation in the transformed space) describing the two uncertainty components in their transformed space 4. back-transformation of the two large data set to the original space 5. combination of the two data set in the original space (by multiplication or addition depending on judgements) 6. finding a new B opt for the combined data set.

Calculations
Calculations were performed using Excel software (Office 365, Microsoft). Finding B opt was performed using Solver (an Excel add-in program) with settings given in Table 1.
The constraint B ≥ 0.0001 was used to prevent B from reaching 0 in the optimization. If optimization resulted in B = 0.0001 , a second optimization step was performed with the constraint B ≤ −0.0001 . A start value of 0.5 was used (− 0.5 if a second optimization step was performed) but the value is not critical. Random data with a normal probability distribution were generated using NORM. INV(RAND();mean;standard deviation) and with a rectangular distribution using RAND().
Analysis of variance (ANOVA) was performed using RANOVA2 (a stand-alone program running in Microsoft Excel) available from Royal Society of Chemistry (RSC) website [15].

Results and discussions
Implementation of transformation using x B opt when calculating expanded uncertainty intervals (i.e. using Eq. 10) on experimental data from different types of chemical analysis is given in Table 2. Also included are expanded uncertainty intervals (95 %) calculated without transformation of data (using Eqs. 4 and 5), and using transformation based on log 10 x (i.e. using Eq. 7) .
The examples 1-5 in Table 2 are discussed. Example 1 Determination of sulphur in gas samples using gas chromatography and chemiluminescence detection: The within-laboratory reproducibility is 15 % at two different concentration levels, which is on the border when asymmetry needs to be considered. New control samples were prepared when the previous control sample was finished, and the measured concentrations have been corrected to account for the difference in nominal concentrations between the control samples. The data indicate that a "true" B opt is around 0.5, i.e. the example illustrates a case when the distribution of measurement results is "between" a normal distribution and a log-normal distribution. The calculated expanded uncertainty intervals (95 %) around a nominal measured value of 10 and 20 based on within-laboratory reproducibility are compared in Fig. 3(a) and (b).
The intervals are fairly similar, and this example illustrates a case that is on the border when asymmetry needs to be considered (the within-laboratory reproducibility is 15 %). The shape of interval A is "between" interval C (that corresponds to using B opt equal to 1) and interval D (that corresponds to using B opt equal to 0) which is reasonable since B opt for the data sets is found to be around 0.5.
Example 2 Determination of nitrogen (N) using an elemental analyser for C, H and N: The within-laboratory reproducibility is 6 %, which is in the range when asymmetry is typically considered negligible. However, B opt is around 5 to 6, indicating a negative skewness, which in this case has an impact on the uncertainty interval even at a within-laboratory reproducibility as low as 6 %. The calculated expanded uncertainty intervals (95 %) around a nominal measured mass fraction of 0.10 % based on within-laboratory reproducibility data are compared in Fig. 4.
Interval A has a somewhat different shape compared to the other intervals, and has a shape that is "outside" the shape of interval C (that corresponds to using B opt equal to 1) and interval D (that corresponds to using B opt equal to 0). Note that although the measurement results will have a negative skewness, the fact that the relative standard deviation is independent of the measurand level (i.e. the standard deviation increases with the measurand level) will cause the uncertainty interval to have a positive skewness.
Example 3 Determination of biochemical oxygen demand (BOD) using electrochemical detection of oxygen: The within-laboratory reproducibility is 5 %. This is another example where B opt is above 1, i.e. the measurement results have a negative skewness. The results indicate (Eq. 10), (B) without transformation neglecting that s will be different at lower and upper limit (Eq. 4), (C) without transformation taking into account that s will be different at lower and upper limit (Eq. 5), and (D) using transformation based on log 10 x (Eq. 7) Fig. 4 Expanded uncertainty intervals (95 %) for determination of nitrogen (N) using an elemental analyser for C, H and N calculated around a nominal measured mass fraction of 0.10 %. a Using transformation based on x B opt (Eq. 10), b without transformation neglecting that s will be different at lower and upper limit (Eq. 4), c without transformation taking into account that s will be different at lower and upper limit (Eq. 5), and d using transformation based on log 10 x (Eq. 7) 1 3 that a "true" B opt is around 4. BOD is determined as the dissolved oxygen concentration before incubation minus dissolved oxygen concentration after incubation. Hence, if results for measurement of the oxygen concentration after incubation have positive skewness, results for measurement of BOD can have a negative skewness since it is calculated as a difference. Similar to example 2, this has an impact on the uncertainty intervals. The calculated expanded uncertainty intervals (95 %) around a nominal measured value of 200 mg/l based on within-laboratory reproducibility data are compared in Fig. 5.
As in the previous example, the shape of interval A is not "between" the shapes of interval C (that corresponds to using B opt equal to 1) and interval D (that corresponds to using B opt equal to zero). The difference in shape between interval A and C is smaller than in example 2 reflecting that the within-laboratory reproducibility and B opt are both somewhat smaller than that in example 2. As in example 2, the uncertainty interval will have a positive skewness even if the measurement results will have a negative skewness.
Example 4 Determination of lead (Pb) in contaminated soil: Here, the measurement uncertainty is based on repeatability data that is obtained from measurements of duplicate samples. Repeatability is calculated for the sampling step and the analysis step using ANOVA. This is often referred to as the "duplicate method" and is described in the Eurachem/CITAC Guide Measurement uncertainty arising from sampling-A guide to methods and approaches [6,7]. Using results for determination of lead in contaminated top soil in the Eurachem/CITAC Guide (Example A2) describing between target variability, a B opt value of − 0.31 was obtained. This B opt value was then used to transform results for duplicate samples (assuming that sampling variability has the same distribution as the between target variability) followed by calculation of repeatability for the sampling step and the analysis step using ANOVA. In the original literature, it was instead assumed that the between target variability had a close to log-normal distribution and that the sampling variability had the same distribution as the between target variability. A more detailed description of the calculations is available in the literature [4,6]. The calculated expanded uncertainty intervals (95 %) around a nominal measured value of 300 mg/kg based on repeatability data are compared in Fig. 6 for the sampling step (Fig. 6a), the analysis step (Fig. 6b), and the whole measurement (Fig. 6c).
The shape of interval A is "outside" the shapes of interval C (that corresponds to using B opt equal to 1) and D (that corresponds to using B opt equal to 0), but very similar in shape to interval D since B opt for the data set is found to be just below zero (− 0.31). It is also apparent that for the sampling step, intervals B and C, where it is assumed that measurement results have a normal distribution, do not work well. In particular, the upper limit of the interval for the sampling step for interval C is unreasonably high. This is due to the wrong assumption that the distribution of measurement results has a normal distribution when using Eq. 5. Transforming data to obtain an approximate normal distribution neglecting that s will be different at lower and upper limit (Eq. 4), c without transformation taking into account that s will be different at lower and upper limit (Eq. 5), and d using transformation based on log 10 x (Eq. 7) Fig. 6 Expanded uncertainty intervals (95 %) for determination of the mass fraction lead (Pb) in contaminated soil calculated around a nominal measured value of 300 mg/kg for a sampling step, b analysis step, and c total measurement. (A) Using transformation based on x B opt (Eq. 10), (B) without transformation neglecting that s will be different at lower and upper limit (Eq. 4), (C) without transformation taking into account that s will be different at lower and upper limit (Eq. 5), and (D) using transformation based on log 10 x (Eq. 7) prior to calculation of the interval will solve this issue (see interval A). For the analysis step where CV is small, the differences between the intervals are small. The uncertainty for the sampling step is dominating the whole analysis, and the uncertainty for the analysis step is almost negligible.
Example 5 Determination of organophosphorus pesticides in bread: In this example, Monte Carlo simulations was used to generate data sets (with 10 6 data) for a model equation describing the combined standard uncertainty for a calculated concentration C . A suitable model equation is taken from the literature [11]: where the input quantities are defined in Table 3. Four different data sets were generated denoted (a), (b), (c), and (d).
The probability distribution of the input quantities and the parameters describing the distribution (mean set to 1, and standard deviation or halfwidth) are also given in Table 3. The Monte Carlo simulations for (a) was aimed to generate a data set with CV similar to the relative standard uncertainty reported in the original literature (34 %).
The B opt value was found to be in the range of 0.26 to 0.32 (see Table 2). The calculated expanded uncertainty intervals (95 %) around a nominal measured value of 1 based on combined standard uncertainty for the model equation are compared in Fig. 7 for the four data sets.
In Fig. 7a, the shape of interval A is "between" interval C (that corresponds to using B opt equal to 1) and interval D (that corresponds to using B opt equal to 0) which is reasonable since B opt for the data sets is found to be around 0.32. As in example 4, intervals B and C, where it is assumed that measurement results have a normal distribution, do not work well. The data sets denoted (b), (c), and (d) were generated to have CV equal to 20 %, 11 %, and 2.1 %, respectively, by scaling down the standard deviation and halfwidth of the input quantities (see Table 3). Clearly, the difference between intervals calculated in different ways (A, B, C, and D) vanishes with decreasing CV of the data set. This illustrates that when CV is smaller than 15 % to 20 %, positive skewness in the measurement results can be neglected when calculating measurement uncertainty intervals. Furthermore, when CV is smaller than 10 % to 15 %, neglecting that s will be different at lower and upper limit will work fine.
It is also possible to calculate a 95 % coverage interval for the output quantity of the Monte Carlo generated data  on x B opt (Eq. 10), (B) without transformation neglecting that s will be different at lower and upper limit (Eq. 4), (C) without transformation taking into account that s will be different at lower and upper limit (Eq. 5), and (D) using transformation based on log 10 x (Eq. 7) 1 3 sets using the 0.025-and 0.975-quantiles as endpoints [16]. For data set (a) (CV of 33 %), the interval will be 0.47 to 1.79. An identical interval will be obtained by calculating an interval in the x B opt transformed space as x trans − 1.96 × s trans to x trans + 1.96 × s trans , where x trans is the mean in the transformed space, followed by back-transformation to the original space. This demonstrates how well the transformation using x B opt works.

Conclusions
Several conclusions can be made from above: 1. Uncertainty intervals calculated using can handle many types of asymmetry in the measurement results when the coefficient of variation is independent of the measurand level. Here, k is the coverage factor, and s rel,trans is the relative standard deviation of measurement results in the transformed space. The parameter B opt is optimized to get a symmetric distribution. Equation 10 includes cases where distribution of the measurement results can be approximated with a normal distribution (using B equal to 1) and a lognormal distribution (using B close to 0, for instance B equal to 0.0001). 2. Several of the examples indicate that B opt for many types of chemical analysis is neither close to 1 that corresponds to normal distributed measurement results, nor 0 that corresponds to log-normal distributed measurement results. 3. A value for B opt can be estimated based on experimental results, modelling of results, or on judgement. 4. When the coefficient of variation of measurement results is less than 15 % to 20 %, approximation with a normal distribution for the measurement results is often "good enough" for most applications.