Introduction

Homogeneity is one of the defining characteristics of reference materials (RMs). Consequently, ISO 17034 [1] requires the assessment of homogeneity. While this does not necessarily require actual measurements (exceptions could be, for example, for RM production by validated processes that are under good statistical control), in most cases, and certainly for every new material, measurements are performed that allow confirmation of “sufficient homogeneity”. A dedicated homogeneity study requires using several samples from the production batch (reduction in the sellable samples), time and investment to perform all the necessary analysis. Therefore, optimisation of the set-up of the homogeneity study may be seen as a potential way to lower the final RM price. ISO Guide 35 [2] gives several examples on how the requirements of ISO 17034 can be fulfilled. They all rely on an appropriate selection of a sufficient number of samples of the whole batch and subsequent analysis in one or several runs by one laboratory. In the first step, the data set should be checked for analytical problems like analytical drift leading to trends in the measurement sequence and outliers. After investigation and resolution of all analytical problems, the analytically valid data set should be checked for trends in the processing sequence before the quantification of the between-unit variation is performed. ISO Guide 35 recommends at least nine degrees of freedom for the final standard deviation between units (sbb). Ideally, all measurements are performed under repeatability conditions, as this ensures the lowest possible analytical variation and therefore the best possibility to quantify the between-unit variation. ISO Guide 35 also foresees the possibility that not all measurements can be performed in one analytical run: in this case, the use of randomised block designs or balanced nested designs is recommended.

One of the possible characterisation approaches for certified reference materials is to use “two or more methods of demonstrable accuracy in one or more competent laboratories” (clause 7.12.3 in ISO 17034). This is often implemented as an interlaboratory comparison among laboratories of demonstrated competence. As each of the laboratories measures different samples, it should be possible to use the data from such a characterisation exercise also for the assessment of homogeneity. Such an approach could have two advantages:

  • Fewer samples are consumed, as combining homogeneity and characterisation study eliminates the need for reserving a separate set of samples. This can be relevant for small production batches.

  • This combination can lead to cost savings. These, however, will usually be relatively minor, as the relatively large measurement series result in relatively low analytical costs per unit. At the JRC Geel, the costs for the homogeneity studies are usually below 10 % of the total project cost.

Combining the homogeneity with the characterisation study will not necessarily lead to shorter time to the market, as a separate homogeneity study can also be run parallel to the characterisation study.

In this manuscript, we describe the approach taken to use the characterisation data for ERM-EB090a and ERM-EB090b, two titanium RMs to be certified for the trace metal mass fractions for homogeneity assessment.

Material

Titanium ingots were produced by conventional alloying of titanium with 25 elements (Al, B, Ce, Co, Cr, Cu, Fe, Hf, La, Mn, Mo, N, Nb, Ni, O, Pd, Ru, Si, Sn, Ta, V, W, Y and Zr) at mass fractions of 0.1 g/kg – 3 g/kg. The ingots were subjected to vacuum arc remelting to improve homogeneity and forged into smaller diameter round bars. These forged bars themselves were machined into rods of a final diameter of 63 mm and used as electrodes that were then melted in the plasma rotating electrode powder process to obtain a powder.

The powder was sieved to remove particles larger than 0.8 mm, and the final mass of powder produced was 114 kg. The powder was mixed and filled into cans made of low-carbon steel with inner diameters of 63 mm. The cans were then subjected to hot isostatic pressing for 240 min at 1750 °C and 101 MPa. This resulted in eight dense bars in a mantle of low-carbon steel. The diameter of the titanium core was 54 mm, the outer diameter 63 mm. The length of the bars varied between 1147 min and 1352 mm.

The top and bottom 20 mm of each bar were discarded. Of each bar, 42–50 discs of 20 mm height were obtained by water jet cutting. Of each of the discs, an inner core with a diameter of 40 mm was cut by water jet cutting. Each of the discs was engraved with the CRM code and a running number representing the position in the bar. In total, only 400 discs were produced. This rather low number made the combination of characterisation and homogeneity study interesting.

The remaining parts of the bars were cut into slices of about 2 mm thickness. Chips of 6 mm diameter, each weighing about 250 mg, were punched out of these slices. All chips were mixed, and 7 g of chips each was filled into amber glass bottles. In total, 765 bottles produced.

Study set-up

Laboratory selection

Eleven laboratories that had demonstrated their competence in the analysis of metals in titanium or titanium alloys were invited to participate in the study. Ten of the laboratories hold formal accreditation to ISO/IEC 17025, but only three of them are actually accredited for the analysis of elements in titanium. Adherence to the provisions of ISO/IEC 17025 for the eleventh laboratory was checked using a questionnaire in which the laboratory confirmed that it runs a quality system and also confirmed that procedures for the technical requirements of ISO/IEC 17025 are in place.

Sample selection and measurement set-up

Twenty-four discs were selected using a random-stratified sampling scheme covering the complete batch. The samples were sent to six different laboratories that provided in total nine independent data sets, each data set consisting of two measurement results per element for each of three samples (six results per data set). As not every data set contained all elements, laboratories were asked to return the samples after analysis and six samples were sent to one of the laboratories for analysis for the omitted elements to achieve at least nine degrees of freedom for the between-unit variation (sbb). This laboratory performed duplicate analysis for Hf, La, Nb and Ta under repeatability conditions. The set-up is shown schematically in Table 1.

Table 1 Overview of samples and measurements performed

Thirty-nine bottles of chips were selected using a random-stratified sampling scheme covering the complete batch. The samples were distributed to 11 laboratories that provided in total 14 independent data sets. The laboratories were the same as for the discs plus two additional laboratories. Again, each data set consisted of two results performed on chips of three bottles each (six results per data set) and all laboratories were requested to return their samples after analysis. To resolve inconsistencies in laboratory data, one laboratory performed quadruplicate analysis for all elements of six samples that were returned by the laboratories under repeatability conditions. The set-up is shown schematically in Table 1.

The six measurements per data set were spread over at least 2 days to ensure conditions of intermediate precision. Laboratories were free to use their method of choice. Inductively coupled plasma mass spectrometry (ICP-MS) and optical emission spectrometry (ICP-OES), k0-neutron activation analysis (k0NAA) and glow discharge mass spectrometry (GD-MS) were used by the participants.

Statistical analysis

Independent data sets from different laboratories or different methods are usually affected by an unknown bias due to calibration and other systematic variations in the procedure. In addition, data from different laboratories or methods often show different precisions, which further complicates the analysis by analysis of variance (ANOVA). Different statistical approaches were applied for the investigation of analytical trends, outliers and position in the bars on the one hand and for the quantification of sbb. The data analysis was performed independently for discs and chips, respectively.

For the investigation of trends and outliers, each individual datapoint was normalised to the average of the respective data set. The normalised data were tested for trends in the production sequence by testing the slope of the linear regression result versus sample number for significance on a 95 % confidence level. The data set was tested for individual outliers and outlying sample means using the Grubbs test on a 99 % confidence level. All data sets were tested for normality using normal probability plots.

Datasets without trends or outliers were used in the second step of the evaluation. The non-normalised data for each element were tested for outlying variances using the Cochran procedure on a 99 % confidence level. This test was applied on the standard deviation of each data. Data sets with outlying variances were removed as ANOVA requires equal variances in each cell. This standard deviation also includes a contribution of the between-unit variation, but this is expected to be the same for all laboratories. In addition, this contribution was expected to be small compared to the repeatability standard deviation, so there was little risk that this test eliminated the between-unit variation that should be assessed.

Quantification of between-unit heterogeneity was undertaken by two-way ANOVA without interaction of the original, non-normalised data using Statistica 13 (Dell Software, Round Rock, USA). This separates sbb from the within-unit variation (swb). The latter is equivalent to the method intermediate precision. u *bb , heterogeneity hidden by method precision, was calculated as shown in the equation below [3].

$$u_{\text{bb}}^{*} = \sqrt {\frac{MS_\text{within}}{n}} \sqrt[4]{{\frac{2}{{\nu_{\text{within}} }}}}$$

In this equation, MSwithin is the mean of squares within-unit from an ANOVA, n the number of replicates per unit (2) and νwithin the degrees of freedom of MSwithin. The larger value of sbb and u *bb was used as estimate of the between-unit variation.

Results and discussion

Homogeneity assessment of Ti-discs

None of the elements show a trend in the position in the bars. Individual outliers were detected for Hf and La. These individual outliers also resulted in the respective average being an outlier for Hf, but not for La. Normal probability plots of the individual results were in line with the assumption of normally distributed data for all elements except Hf, La and Sb. While the deviation for La was small, the one for Hf was substantial enough to preclude evaluation by ANOVA. While the data for Sb did not follow a normal distribution, they did at least follow a unimodal distribution, thus making evaluation by ANOVA possible. The normal probability plots of the unit means were for all elements except Hf in agreement with a normal distribution. Based on this evaluation, the data of Hf were excluded from the further evaluations.

The Cochran test flagged several data sets as outliers. One data set each for B, Mo, Si and Ta and two data sets for Sb were excluded from the further evaluation. These data sets came from various laboratories and reflect the suitability of method principles or laboratory internal procedures for the determination of certain elements.

The results of the two-way ANOVA are shown in Table 2. sbb of Ce, Pd, Sb and Y still have fewer than nine degrees of freedom. Disagreement between results precluded certification of Ce and Sb anyway, and insufficient data sets were available for a certified value for Pd and Y. As certification was impossible for these elements, it was decided not to include them in the re-testing.

Table 2 Results of the homogeneity assessment of titanium discs

Repeatability standard deviations were, with the exception of Cu and Si, between two and four per cent, showing that the methods are sufficiently precise to allow a meaningful assessment of homogeneity. Despite the good precision, MSbetween was in most cases smaller than MSwithin and sbb could therefore not be calculated. As expected from the careful manufacturing process, homogeneity between discs was very good with the conservative estimates for between-disc variation for most elements between one and two per cent. Standard deviations between-laboratory means ranged from 2.2 % to 9.0 % (data not shown). The estimates for the between-unit homogeneity are therefore significant, but, as shown in Table 2 column Fract, not a dominating factor for the uncertainty of most assigned values. This shows that the approach chosen delivered suitable homogeneity assessments for the production of certified reference materials.

Homogeneity assessment of Ti-chips

None of the elements show a trend in the production sequence. Individual outliers were detected for Cr and Si. These did not lead to outlying unit means, but one outlying unit mean for Co was found, and Co was therefore excluded from further evaluation. For all most data sets, the assumption of normally distributed individual data and bottle means was valid and all data sets followed at least unimodal distributions.

The Cochran test flagged several data sets as outliers. One data set each was excluded for Co, Cr, Cu, Mn, Ni, Pd, Sb, Si, Sn, Ta and V, and two data sets each were removed for Mo, Ru and W. As for the discs, these outlying data sets came from several different laboratories and reflect inherent variation of precision between laboratories.

The results of the two-way ANOVA are shown in Table 3. Repeatability standard deviations are comparable to the ones obtained for the discs. This is only expected as the laboratories providing the data are largely identical. Contrary to the discs, sbb could be quantified for the majority of elements (16 of 22). The estimates for the between-bottle variation are similar to the ones obtained for the discs, with sbb or u *bb ranging from 0.9 % to 2.9 %.

Table 3 Results of the homogeneity assessment of titanium chips

Comparison of multi-laboratory study versus single-laboratory study

The data from the re-testing of the characterisation samples allow the comparison of a multi-laboratory study evaluated by two-way ANOVA, and a dedicated homogeneity study evaluated in a single laboratory: the 24 results (four results from six bottles each) were evaluated by one-way ANOVA. This data set results in only five degrees of freedom for the between-bottle standard deviation, which is lower than the degrees of freedom from the between-laboratory study, which limits the reliability of the individual data from one-way ANOVA. Figure 1 shows the ratios of the estimates for within-unit standard deviation and ubb for evaluation by two-way ANOVA from the interlaboratory data (enumerator) and one-way ANOVA from the re-testing in one laboratory (denominator). By and large, within-unit standard deviations were lower in the re-testing. Several factors may contribute to this effect. On the one hand, a laboratory with a better than average repeatability was chosen for the re-testing. On the other hand, even if ANOVA aims to separate the between-sample variation from the variations due to the application of different methods, use of different instruments, analysis by different analysts, this separation is not perfect and the estimate for the between-bottle standard deviation can be affected by the variation due to the other factors. An effect of less relevance in the case of these very stable materials is potential degradation during transport: As the transport conditions (time and temperature) may differ for the different laboratories, an additional variation can be introduced for less stable materials. This variation should in theory also be part of the between-laboratory variation. Despite all these factors, the overall estimates for ubb are roughly comparable. On average, ubb estimated from the interlaboratory data is 54 % higher than the one estimated from the single-laboratory data. This is partly caused by high estimates for La, Pd and Sb: without these elements, ubb from the interlaboratory data is 30 % above the one from the single-laboratory data. In the case of these two materials, the higher uncertainty was still acceptable, as the contribution of ubb to the final uncertainty was minor. This was caused by the larger uncertainty of characterisation. This is not necessarily true for other CRMs. Especially in cases where the uncertainty of characterisation is low, the higher uncertainty of homogeneity may have a significant influence on the final uncertainty.

Fig. 1
figure 1

Ratio of swb and ubb estimated from two-way ANOVA from interlaboratory data and from single-laboratory data

This shows that the evaluation of homogeneity from characterisation data can yield estimates for the between-unit variation that are realistic and fit for purpose. The data also might indicate a trend to higher estimates of the uncertainty of homogeneity when the homogeneity estimates from characterisation data compared to those from a single-laboratory homogeneity study. However, this assessment has to be taken with caution as, due to the lower number of degrees of freedom of the single-laboratory data, the reliability of the individual estimates was lower in this case.

Evaluation of costs and samples saved

While the assessment of homogeneity using interlaboratory data was successful, the evaluation of the envisaged savings of samples and cost is more ambiguous. Standard procedure at the JRC for characterisation studies is to supply each laboratory with two samples which are each analysed in several replicates (often three per sample). In this case, three samples were sent to the laboratories in order to have sufficient degrees of freedom for the estimate of ubb.

For the Ti-discs, this means that in total 24 samples were distributed. This is very close to the number of samples than would have been used in a classical characterisation study for nine data sets (18 samples) and a classical homogeneity study for such a small batch (ten samples). For the Ti-chips, 39 bottles were distributed. This is slightly more than would have been used in a classical characterisation study for 14 data sets (28 samples) and homogeneity study (ten samples). This means that the savings in term of samples did not materialise. In this case, as the samples are exceptionally stable, samples could have been reused for the characterisation. In this way, only 18 samples would have been used for the discs and 28 samples for the chips. (The ten samples from the homogeneity study would have been reused for the characterisation.)

According to offers received from private laboratories, performing classical homogeneity studies on the two materials with 25 elements quantified in duplicate on ten samples each would have cost roughly between EUR 10 000 and 20 000. While these sums are significant, this is only 5 % – 10 % of the total costs incurred for external services and naturally even less if the staff costs for planning, project management and evaluation are taken into consideration. Therefore, the cost saving of avoiding the homogeneity study may decrease the overall cost of the RMs by a few per cent, while the final uncertainty might increase.

Conclusion

The data obtained in the characterisation study of two Ti-materials were successfully used for the assessment of homogeneity. The estimates for the between-sample variation are low enough to make the two materials suitable as certified reference materials. This shows that it is in principle possible to combine homogeneity assessment and interlaboratory characterisation into one study.

However, the set-up with several laboratories required a more complex statistical evaluation, with testing normalised data for trends and outlying individual data and sample means and subsequent testing of the non-normalised data for outliers of variances and performing the two-way ANOVA. The envisaged gains in terms of consumption of fewer samples were not achieved and the cost savings by skipping a dedicated homogeneity study were minor.

Based on the experience with interlaboratory characterisation studies of a wide variety of matrices and analytes, it is expected that these problems are generic rather than limited to the determination of trace metals in Ti. Careful evaluation of the benefits and costs of combining a homogeneity and an interlaboratory characterisation study is therefore advisable.