Introduction

Mineral oil hydrocarbons are among the pollutants most often found on contaminated sites. Mineral oils are composed of a vast mixture of hydrocarbons displaying a boiling range dependent on the origin and processing of the base mineral oil. The quantification of the mineral oil content is reasonably done as a summation parameter that covers a defined boiling range and is often referred to as total petroleum hydrocarbons (TPH). Extraction procedures [13] and the choice of detection method [4, 5] for the quantification of TPH in various environmental matrices such as soil, water and waste have recently become a matter of ongoing concern in the literature and are the subject of international standardisation activities.

Due to its variable nature, TPH is a parameter that is defined by using a method which must be followed strictly. The comparability of TPH quantification results in soil and similar matrices from different laboratories is ensured by following the international standard ISO 16703:2004 that prescribes the extraction procedure, clean-up of the extract and the boiling range to be quantified. It offers guidance on the use of an appropriate calibrant and stipulates flame ionisation detection (FID), which (compared to e.g. mass-selective detection) displays linearity of the signal response over six orders of magnitude. At the same time, the FID response of CH-equivalents is independent of the respective distribution of alkane chain isomers [6, 7]. The method is therefore suitable for the overall quantification of hydrocarbon mixtures regardless of their respective composition of congeners.

Routine laboratories conduct the vast majority of environmental analyses and participate in external quality assurance measures such as proficiency testing [8]. In addition, there is a need for reference materials to be applied as a tool for internal quality control. Here, we describe the certification of a respective matrix reference material produced from a real-world sediment and its traceability to a stated reference.

Materials and methods

Chemicals and reagents

n-Heptane, n-hexane and acetone for residual analysis were purchased from Promochem (Wesel, Germany). Retention time markers n-decane (C10) and n-tetracontane (C40) were purchased from Fluka (Basel, Switzerland) and used as a solution in n-heptane (about 50mgL−1).

Calibration standard BAM-K010, a certified mixture of additive-free diesel and lubricating oil as specified in ISO 16703:2004, was used to cover the range 1,000–5,000mgkg−1 equidistant with five calibration solutions prepared on a weight/weight basis.

Two control solutions of BAM-K010 in n-hexane were prepared (0.713 and 1.619mgmL−1) and filled in 4.5 Certan containers (Promochem, Wesel, Germany). Magnesium silicate (Florisil, 60–100mesh) from Merck (Darmstadt, Germany) was heated for 16h at 140°C and allowed to cool in a stream of dry nitrogen.

Sample extraction and gas chromatography

Following ISO 16703:2004, 5g of soil was shaken with 20mL of acetone and after addition of 10mL of n-heptane containing C10 and C40, and the mixture was sonicated for 1h at 60°C. A 45 mL aliquot of water was added followed by shaking (10min) and centrifugation (10min, 3,500rpm). The organic phase was collected and centrifuged (10min, 3,500rpm) after addition of 60mL of water. A 5 mL sample of the organic phase was submitted to clean-up (Florisil/Na2SO4 column).

TPH was quantified using an Agilent 6890 gas chromatograph (GC) equipped with an Agilent 7683 autosampler (Agilent, Waldbronn, Germany), a FID (detector temperature 370°C) and an HT 8 capillary column (15m × 0.25mm × 0.25μm); the injection volume was 3μL. N2 (5.0) was used as carrier gas carrier gas (3mLmin−1). The GC oven program started at 50°C, was held for 10min, increased to 360°C at 30°Cmin−1 and then held for 10min; this enabled a complete run within 20min (see Fig.1).

Fig. 1
figure 1

Gas chromatogram of an extract of ERM-CC015a

Preparation of the candidate material

A representative fresh water sediment contaminated with mineral oil from industrial and municipal input over decades was sampled from River Weisse Elster near Leipzig, Germany. The material was air-dried to constant weight and fractionated by sieving after crushing of agglomerates. A total of 29.47kg of sample was collected after passing through a 125-μm mesh and this was then homogenised using a drum hoop mixer for 6h. Bottling was performed by means of a spinning riffler using a service-proofed procedure of partitioning and back-mixing to yield 360units containing 81.5±0.5g sediment in brown glass bottles which were stored at −20°C in the dark. Table1 lists the characterisation data of the matrix, and Fig.1 depicts the TPH pattern. It should be noted that the boiling range above C40 is not included in the mass fraction of TPH.

Table 1 Characterisation of the sediment matrix

Characterisation

Homogeneity

The variance of the respective analytical procedure \(s^{2}_{{method}}\) and the variance of the TPH content within a bottle \(s^{2}_{{wb}}\) are initially unknown and cannot be investigated independently.

This is achieved here by reducing the heterogeneity as far as technically feasible and by choosing a sample intake for one determination such that its further increase does not significantly decrease the repeatability standard deviation of TPH quantification. At this point \(s^{2}_{{wb}}\)—the variability due to within-bottle heterogeneity—is considered negligible and the observed variability of the TPH content \(s^{2}_{r}\) is dominated by \(s_{method}^2 .\)

$$s^{2}_{r} = s^{2}_{{wb}} + s^{2}_{{method}} $$
(1)

Every 24th unit was taken from the bottling process and the TPH content was determined five times in these 15units under repeatability conditions. Processed extracts were analysed by GC/FID using the method outlined above under repeatability conditions such that all 75 extracts were quantified against one calibration after randomisation.

No evidence suggesting a rejection of the hypothesis that the material is homogeneous was observed and the uncertainty of the TPH content between the bottles u bb was estimated as 26.97mgkg−1 for 5 g sample intake according to Eq.(2) [9] (see also Table2).

$$u_{{bb}} = {\sqrt {\frac{{MS_{{between}} - MS_{{within}} }} {5}} }$$
(2)
Table 2 Results of the statistical evaluation of the homogeneity study and interlaboratory exercise

Stability

From experience a temperature-driven deterioration of the TPH content was to be expected for this material. Therefore, a total of 11 units of the candidate material were submitted to a so-called isochronous study [10] with accelerated ageing at constant temperatures between 4 and 60°C over periods of three weeks to 12months as depicted in Fig.2a. After the respective periods of time, respective units were stored again at the reference temperature of −20°C. All units were analysed for TPH using the method described above together with three reference samples which had been kept at −20°C since bottling. Each extract was measured three times together with procedural blanks. The whole set of measurements included 64 runs conducted under repeatability conditions.

Fig. 2
figure 2

a Accelerated ageing, temperatures and periods. The isochronous study is done after 12 months. The first post-certification monitoring is planned after 24 months. b Arrhenius plot for ERM-CC015a. Regression of TPH degradation rate k(T) over the inverse temperature and 95% one-sided confidence interval. Estimated shelf lives are 2.3, 0.9, 0.5 and 0.3 years at −20°C, +4°C, +20°C and +40°C, respectively

A significant downwards trend for the TPH content is observed for higher temperatures. The dependence of the thermal degradation of the TPH content on time is modelled using the basic assumption of proportionality between the amount of analyte deteriorated (dm) during a time interval dt and the total amount of the analyte available in the sample. The proportionality coefficient is the temperature-dependent reaction rate k eff(T). Thus, from

$${\text{d}}m{\text{ = }} - mk_{{\text{eff}}} \left( T \right){\text{d}}t$$

where m is the mass of the analyte (total amount) in the sample, one obtains an exponential dependence on time of the remaining amount of analyte (see also e.g. [11]). Although this assumption is not universal, it merely describes a large number of degradation processes. At a given temperature, the effective pseudo-first-order reaction rate k eff(T) may consist of more than one and even concurring degradation processes all having their individual dependence on temperature.

Modelling of the dependence of k eff(T) on temperature is therefore quite complicated since degradation may be fed by different sources, especially in natural matrices where all accompanying substances and their influence on the degradation process are not normally known. Thus, a simple Arrhenius model was assumed and tested against experimental data. The reasoning behind this approach is outlined in detail elsewhere [12]. A plot of the reaction rates k eff(T) over the inverse temperature is depicted in Fig.2b.

Obviously, the temperature dependence can indeed be approximated by a straight line in the range from +4°C to +60°C. The estimated k eff(T) at −20°C does not fit well with the model, a fact which may be attributed to mainly two reasons: either a second degradation process which fully dominates the high-temperature region is activated at temperatures above −20°C, or the discrepancy between prediction and experiment is due to the influence of the innate method variability. Although the absolute value of the latter is approximately constant for all sampling points (in temperature and time) involved, its relative influence is largest at the lowest temperature where no or only a very small (insignificant in a statistical sense) degradation is both expected and observable in the experiment.

Within-laboratory method variability can substantially be reduced by partially taking out drift effects. For all temperatures above reference, this can simply be achieved by referring the absolute value measured for a stressed sample to the value measured for the reference sample which is closest in the measurement sequence (normalisation). The raw data from the exercise described here were normalised, then subjected to the procedure as described above.

Data for the samples stored at reference temperature cannot be normalised. Formal evaluation of the raw reference sample values obtained in this exercise revealed a positive (increasing) trend mainly due to drift in the measurement sequences. These drift effects were investigated using an iterative fit procedure on all data measured in one sequence, appropriate corrections were derived also for the references, and a reasonable reaction rate estimate obtained.

This observed effective reaction rate for samples stored at −20°C is below the rate predicted from the Arrhenius model and was not included in the regression of k eff(T) versus 1/T (Fig.2b). By using the upper one-sided confidence bound of the predicted value for k eff(T), one overestimates the degradation process and thus underestimates the real validity of the material. Shelf life estimates are in this sense worst-case estimates to the safe side.

The activation energy ΔE a estimated for the high-temperature region is 24.4±0.4kJmol−1. By using these data and the assumed model, an estimate can be obtained when degradation will presumably force the certified mineral oil content to fall outside the certified expanded uncertainty bounds (normally the lower bound) for any required temperature. This allows the assessment of both the shelf life at storage temperature and any limitations during transportation to the end user. Note that, for this assessment, the upper one-sided confidence bound of the predictions for k eff(T) is used, thus the uncertainty of its determination is included in the evaluation.

Although the preliminarily estimated shelf life at a storage temperature of −20°C is satisfactory and the stability over short periods at ambient temperatures is sufficient for dispatch and use, any prolonged exposure to room or even higher temperatures will drastically reduce the period of validity of the certified values. Therefore, their validity is set to one year after delivery provided that the transport conditions meet the restrictions set by the estimated shelf life and the material is stored at −20°C by the user upon receipt. The first rough estimation of stability will be updated over the period of availability of the material by post-certification monitoring of units stored at 4°C and 20°C. Measurement intervals will be about half of the shelf life estimated on basis of all earlier measurements.

Certification study

Study design

Nine laboratories selected on grounds of their performance in two recent proficiency testing rounds on TPH analysis in soil operated by BAM were invited to participate in the certification exercise along with three laboratories from BAM. Selection criteria included the consistency of documentation of extraction, clean-up, calibration and instrumental analysis according to ISO 16703:2004 and the declaration of commitment to comply with these requirements during the certification analyses.

Two units of the candidate material were to be analysed by each laboratory in triplicate. A unit of the certified calibration standard BAM-K010 was provided for each participant along with the information that the level of content was expected to be between 1,000 and 4,000mgkg−1.

In addition, each participant received the two control solutions whose concentrations of TPH in n-heptane were unknown (to the participants). These were prepared such that they corresponded to the lower and upper region of the appropriate range of calibration. The standard procedure according to ISO 16703:2004 had to be followed strictly and was to be documented by means of a detailed questionnaire.

Evaluation of certified value and its uncertainty

Results for the mineral oil content were to be reported on the basis of total mass intake, and no dry mass determinations were asked for in order to exclude another source of error and since it had previously been established that the water content is not altered significantly if the material is handled as specified. Results of the TPH determination were scrutinised for consistency and a few obvious transcription errors were corrected after clarification with the respective laboratories.

As can be seen from Fig.3 most laboratories show positive or slightly negative correlation between the sample and the control value. Only laboratory number 5 is distant from the main group, and additionally shows a considerable anti-correlation between sediment sample and control solutions.

Fig. 3
figure 3

Results of the certification study. a Laboratory means and standard deviations (sediment): certified value w cert and expanded uncertainty U. b Relative deviation of laboratory means from mass fraction of BAM-K010 (control solutions)

Although no further investigations into the reasons for this discrepancy have been conducted, it seems justifiable to exclude, for technical reasons, the values of this particular laboratory from further processing. This is also supported by the comparison of the slopes of the calibration curves of the participants and their measurements of the control solutions (Table3). Since the control solutions are made from BAM-K010 they should fit into the calibration curves done with BAM-K010. This is reasonably true for the majority of participants, whereas laboratory 5 is among the three exceptions with a high relative deviation Δ b of their calibration slopes from those obtained from the two control solutions.

Table 3 Comparison of calibrations

Δ TPH is the relative difference of the value for the TPH mass fraction in the sediment extract either calculated from the reported concentrations for the control solutions or from the respective reference values. As expected, laboratory 5 deviates significantly from the other participants. The reason for these discrepancies is most likely due to the manual integration of TPH range which is more prone to variation caused by the operator than in the case of an ordinary gaussian peak.

Due to the lack of variance homogeneity (Snedecor F-test, Scheffé test and Bartlett test) laboratory means were used for further evaluation rather than the whole set of single value after pooling. No indication for outlying observations (Cochran, Grubbs, Nalimov) or a non-normal distribution were found.

Therefore, the mean of laboratory means was taken as the best estimate for the TPH mass fraction w char and the standard uncertainty of the mean of laboratory means as the uncertainty contribution u char from the intercomparison.

The estimate for the certified TPH mass fraction w cert is derived by correction for the purity f pur of the calibration standard used for determination according to Eq.(3).

$$w_{cert} = w_{char} \times f_{pur}$$
(3)

The corresponding combined uncertainty u c must appropriately be composed from the uncertainty contributions from the certification intercomparison u char , a possibly undetected inhomogeneity u bb , and the uncertainty of the purity correction u pur according to Eq.(4).

$$u^{2}_{{c,r}} = u^{2}_{{char,r}} + u^{2}_{{bb,r}} + u^{2}_{{pur,r}} $$
(4)

The expanded uncertainty is U=ku c and the expansion factor is k=2. The certified value and the expanded uncertainty are rounded according to the published recommendations [13] to be 1,820±130mgkg−1.

The standard uncertainty of the intercomparison u char is composed of a contribution due to the variability of calibration among the laboratories u calibr , and other components u rest including those from sample preparation Eq.(5).

$$u^{2}_{{char}} = u^{2}_{{calibr}} + u^{2}_{{rest}} $$
(5)

An estimate for u calibr may be obtained from the combination of the relative deviations of reference value and measurement results (Table3) according to Eq.(6).

$$u_{calibr}^2 = \frac{{\sum {\Delta _{TPH}^2 } }}{n}$$
(6)

where n is the number of laboratories. The average calibration-caused deviation u calibr amounts to 4.45%. The average lab deviation from the mean of laboratory means for the samples under certification is 9.69% (not to be confused with the standard uncertainty of the mean u char,r ). Thus, approximately 21% of the overall variance of the certification exercise is explained by calibration variability. Although other sources of uncertainty, namely sample preparation, contribute the major part of the study variability, calibration remains a significant factor and should be controlled. This finding is opposite to the commonly stressed argument about the negligibility of all “equipment-related” influences. The provision of reliable, gravimetry-based synthetic control samples is a convenient way of keeping calibration influences under control.

Conclusions

The development of a reference material with a unique matrix/analyte combination and the traceability of the certified value to an internationally accepted stated reference is presented. The significance of any variability of the TPH content was derived from a comprehensive homogeneity study in accordance with ISO Guide 35 on the certification of reference materials. The long-term stability was estimated on grounds of a sound model whose validity has successfully been tested against the data. Worst-case estimation of the expiry date and regular post-certification monitoring of the material make sure that the user obtains a CRM with reliable certified value plus corresponding uncertainty. Somewhat unexpectedly, the calibration contributed significantly to the interlaboratory variability. As a result, it may be concluded that even the relative tedious sample preparation in question has reached a conclusive state of the art among the selected laboratories.