1 Introduction

The total mass of particulate matter (PM) suspended in the exhaust gas has been the legislated metric for automotive particulate emissions in Europe since 1988 via European Economic Community Directive 88/436/EEC [1]. In the past, also smoke number was used for legislation. It was subsequently amended by a non-volatile particle number (PN)-based emission limit for low-particulate emission vehicles [2]. This PN limit value was first introduced in 2011 for diesel vehicles via the Euro 5b emission standard and extended to direct-injection gasoline vehicles with Euro 6b in 2014. PN was also introduced in the heavy-duty engine emission regulation with Euro VI in 2013 [3].

A standardized PN measurement protocol for the automotive industry was developed and defined by the Particle Measurement Programme (PMP), a work-group of the UN-ECE GRPE [4]. A PN measurement system compliant with this protocol (called “PMP system”) consists of a volatile particle remover (VPR) and a particle number counter (PNC). The PMP system is designed to ensure that only solid particles in the engine exhaust gas are counted, and volatile liquid particles have to be removed. For this the VPR includes a first-stage hot dilution (at least 1:10), followed by an evaporation tube with a fixed temperature of 350°C and a second-stage “cold” dilution (typically 1:10 or 1:15) to bring the temperature and the concentration at levels appropriate for the PNC. Further, to exclude the interference of any remaining nucleation mode particles from the measurement, PNCs are required to have a well-defined lower size cut-off curve. At a particle size of 23nm ±1nm, the counting efficiency must be 50% ±12% (D50-point), and at 41nm ±1nm, it must be larger than 90% (D90-point). At the moment, only condensation particle counters (CPC) are used as PNC in PMP systems since they fulfil the herewith defined slope of the counting efficiency curve, readability, and coincidence correction requirements. As a consequence, this investigation is focussed exclusively on CPC as PNC.

PMP systems are routinely employed across the EU and China as well as Japan and India (introduction planned) to determine automotive PN emissions for legislative purposes. Despite the importance of these measurements, there are significant deficits in the quantification of the measurement uncertainty of these instruments and how large the instrument or calibration variability across the continent might be [5]. The CPC at the core of the PMP system accounts for a major part of the total uncertainty, the other significant part being caused by the loss correction of the VPR. As a first step towards the quantification of automotive PN counting variability across the EU, the CPC and its calibration are targeted in this exercise. It has been set up to quantify and analyse the CPC calibration uncertainty in an industrial context.

While the technical, general CPC specifications are well described in the legislation, there still is lack of a well-defined, reproducible, and validated CPC calibration procedure. In general, the CPC calibration process is laid down in ISO 27891, but many automotive-specific requirements are not addressed by this standard. These have led to pragmatic solutions alongside simplifications required for the industrial field implementation that have generated differences to the ISO standard, in which effects on the total system uncertainty have not been investigated and quantified yet. Important open points are the CPC correction factor (KF) which is used to adjust the CPC in relation to the laboratory reference [4], the linearity specification to deal with concentration-dependent non-linearity, and material effects caused by the calibration aerosol source.

The fact that the calibration aerosol is not defined accounts for a large part of the CPC calibration uncertainty. Research has shown that the calibration aerosol has a significant impact on the measured counting efficiency of the CPCs used for engine exhaust [6]. A CPC is based on the evaporation and re-condensation of a supersaturated working fluid (typically butanol) on the aerosol particle. This leads to a rapid size increase of the suspended nanoparticles acting as a condensation nucleus and allows a subsequent particle detection by light scattering. It is known that the onset of the calibration is highly dependent on surface wettability and morphology of the aerosol material which is used for calibration. This process has been investigated in-depth and has been described as a function of the surface contact angle [7,8,9,10,11].

In 2017, emissions regulation in Europe was amended by so-called real driving emissions (RDE) testing with Euro 6d-TEMP. The application of novel, highly mobile, portable emissions measurement systems (PEMS) for PN has further increased the need for an unambiguous definition of a CPC calibration procedure to ensure comparability between PN-PEMS measurements and laboratory PMP systems. Compared to the PMP system, a PN-PEMS can be regarded as a simplified, compact, low-power particle counting system. Nevertheless, PN-PEMS are referring to the same legislated limit value [12]. The counting efficiency curve of a PN-PEMS is specified for the complete system, i.e., including the thermal preconditioning system and the particle counter [13]. For this reason, a thermally stable, soot-like aerosol is necessary for calibration and required by legislation. Besides soot-like aerosol, emery oil droplets from electrospray (emery oil-particle generator, EO-PG) are a popular choice for CPC calibration. While emery oil droplets are the most common aerosol for engine exhaust CPC calibration [14, 15], these are not thermally stable and cannot be used for PN-PEMS. Thus, they are not relevant for our investigation which focusses on a universally applicable aerosol.

Soot-like aerosol is typically produced either by electric spark discharge (electric discharge-soot particle generator, ED-SPG) or by propane diffusion flame (diffusion-flame soot particle generator, DF-SPG). Soot-like aerosols, produced by, e.g. DF-SPG [16,17,18] or by, e.g. ED-SPG [19,20,21], are frequently used for research (e.g. [22]) as well as for routine calibration [14, 23]. These generators have reached maturity and have been commercially available for more than a decade. Soot-like aerosol, in terms of material and morphology, better represents the non-volatile fraction of the engine exhaust PN than liquid droplets, which are much more similar to the volatile exhaust PN fraction. Giechaskiel, Wang et al. [6], for example, showed that the DF-SPG aerosol was quite similar to diesel engine soot regarding the counting efficiency of a CPC.

This study investigates the inter-lab comparability and the calibration uncertainty for automotive PN measurements using CPCs. For this purpose, seven calibration or automotive laboratories were evaluated in terms of comparability, repeatability and reproducibility of aerosol generators, reference CPCs, and calibration procedures, a first for the automotive PN community. The presented comparison will evaluate the comparability of soot-like aerosol from DF-SPG and ED-SPG as a candidate for an engine exhaust PN standard calibration aerosol. A circulating reference DF-SPG (crDF-SPG) generator was used as preparative aerosol standard in all participating labs and compared to the “local”, in-house aerosol generators in the participating labs(5 DF-SPG, 4 ED-SPG). As a common analytical standard, a circulating reference CPC (crCPC) was sent to be calibrated in all participating laboratories.

2 Experimental

The experiments were carried out by seven participating automotive and calibration laboratories across Europe, which are listed with their specific equipment in Table 1. One set of calibrations each was done by the participants except for JRC. At JRC, two different experiments were carried out independently and blindly (i.e. not knowing either the setup or results of the other test) by different operators.

Table 1 Participating laboratories of the round robin listed by location and category

The available in-house reference PNC instrument and soot aerosol generators are shown by model. In-house CAST generators feature different burner types and dilution systems: PD/SD denotes primary and secondary dilution; “Th” means a thermal treatment system was used. PALAS generators were used without thermal treatment at all laboratories

2.1 Concept of the Round Robin

Each individual comparison exercise was a “blind” calibration of the CPC counting efficiency (CE), meaning that the participants had no knowledge of the calibration values of any other participant. The CPC calibration focussed on the legislated “key” particle diametersFootnote 1: 23nm (setting the cut-off size = D50-diameter) and 41nm (D90, setting the start of the counting efficiency (CE), here defined as plateau region). Additional points were taken at 10nm and 15nm to capture the lower end of the CE curve, as well as at 70nm and 100nm within the plateau region of the CE. To detect and correct for doubly charged particles, additional “sizes” were selected at the corresponding diameter of twice the voltage, i.e. 33nm (for 23nm), 59nm (for 41nm), 103nm (for 70nm) and 150nm (for 100nm).

The crCPC was serviced and calibrated at the start of the exercise by the manufacturer with its standard EO-PG, to set the baseline performance and ensure its proper operation. The calibration was repeated once more at the end of the comparison exercise.

In a similar way, the crDF-SPG was first characterized at JRC by the first author. A set of five operating points was worked out and given to the participants for further use. Each operating point was corresponding to one target calibration particle diameter. It was chosen so that the mode diameter (the number maximum) was always smaller than the desired particle size in order to minimize multiply charged particles. After this preparation, the generator was circulated among the participants in the order listed in Table 1.

The first comparison tested the comparability of each participant’s implementation of the local calibration procedure. For this, each participant used the common crDF-SPG to calibrate the common crCPC against each participant’s local reference counter on the local setup. The local setup of the participant included a differential mobility analyser (DMA) for size classification, tubing, and optional devices for dilution. The local setup was supposed to be used “as-is” according to the individual lab guidelines. This comparison was designed to determine the influence of the local reference counter and the local setup on the inter-laboratory comparability. This test was carried out in all laboratories except JRC and TSI. If possible, several iterations were to be done to also test the intra-lab repeatability. At the time of the comparison exercise, only one of the laboratories (Ricardo Energy and Environment) was ISO 17025 accredited for CPC calibration according to ISO 27891, while two others (TSI, AVL) were in the process of accreditation.

For the second comparison, the participants calibrated the crCPC but using alternative local aerosol generators instead. In-house DF-SPG and ED-SPG (for specifications see Table 1) were employed to calibrate the crCPC against the local reference counter on the local setup. Five different DF-SPGs and four ED-SPGs were used in total. This test was carried out to determine the uncertainty introduced by other varying aerosol sources with regard to inter-laboratory comparability. Again, tests were repeated several times, if possible, to assess the intra-lab repeatability.

2.2 Reference Equipment Description

The crCPC was a TSI 3791 engine exhaust CPC with a D50 of 23nm, designed for the measurement of automotive exhaust and fully compliant to PMP legislation. It was originally calibrated to meet the legislative requirements [2] using an EO-PG, the CPC manufacturer’s (TSI) routine CPC calibration aerosol.

The crDF-SPG was manufactured by AVL List GmbH and is based on a miniCAST (“combustion aerosol standard”) 6203C propane single diffusion flame burner by Jing Ltd. It was combined with a VPR, which consists of an evaporation tube at 350°C and two ejector diluters, one each before and after the evaporation tube. The calibration aerosol was always sampled after the second diluter, and fixed internal dilution settings were given. Operating point of the crDF-SPG can be found in Table S1 of the Supplemental Information.

Different types of in-house reference counters were used by the participants. All of them featured a D50 significantly below the 23nm of the crCPC as recommended by Marshall and Sandbach [23]. The most widely used reference instrument type were “full-flow CPCs” with a D50 of 10nm, all of which were manufactured by TSI Inc (model 3772 or similar). JRC, in contrast, used a “partial-flow CPC” with D50 below 10nm (TSI model 3025A), which introduced an increased uncertainty caused by the flow split taking place inside the device. Therefore, the sample flow measured at the inlet of the CPC was not corresponding to the flow through its measurement chamber. Nevertheless, it was compared with an electrometer before the tests. Aerosol electrometers (AEM) as primary reference PN counters were used at AVL and PTB (labs #2 and #3).

All laboratories relied on a differential mobility analyser (DMA) for size selection of the particles. The most common model was a 3080 Long-DMA by TSI Inc. used by all except PTB, who used a Hauke-Type DMA (according to a design from TROPOS/Leipzig) which employs a reversed electrical polarity (positive centre electrode). With a positive electrode, negatively charged particles are selected, which have a slightly higher charge probability than positively charged ones. While this improves particle throughput at very small particle sizes, it also leads to slightly higher probabilities of multiple charges for particle sizes above 23nm. Additional information on the laboratories’ reference instruments can be found in Table S2 of the Supplemental Information.

2.3 Evaluation Process

Measurements of aerosol sample flow, AEM zero current, and ambient conditions were taken for later correction of the data. A standardized data processing procedure was agreed on to ensure comparability. After the end of the comparison, all raw data were directly given to the first author, to ensure identical data processing. The data processing was done in five steps:

  1. 1.

    All data were checked for plausibility and consistency. If a malfunction of any of the instruments was found, the corresponding data points were excluded from the evaluation. An outlier analysis using Z-scores was applied, but did not reveal additional outliers.

  2. 2.

    The zero offsets of the AEMs were corrected. Alternating zero offset measurements are recommended but were not done consistently by the participants due to different lab preferences. The average zero current of the AEM was subtracted from the measured value.

  3. 3.

    Correction factors for in-house reference counters were applied if available, based on their calibration certificate.

  4. 4.

    In-house reference counters were also corrected for deviations of the sample flow from its nominal value according to the laboratory guidelines. For participants using mass flow metres, the results were converted to volumetric PN concentrations (particles/cm3).

  5. 5.

    Double charge corrections were employed for labs that used an AEM as the reference. For the correction, the concentration at the corresponding diameter for doubly charged particles was measured. A bipolar charge distribution as described by [24] was assumed. Triply charged particles were not corrected for because their influence was found to be less than 1% in any test.

2.4 Mathematical Variability Measures

Two fundamental parameters were derived, the comparability between different labs (also called inter-lab-variability) and the repeatability of an individual lab (also called intra-lab-variability).

The main value of interest in this comparison exercise was the comparability of the results of the in-house calibration procedure between laboratories. This “inter-lab-variability” was expressed by the coefficient of variation (CoV), which is the standard deviation over all participant lab results divided by the arithmetic mean of all labs for each comparison parameter (type of aerosol generator, particle size).

The inter-lab variability also needs to be put into perspective with the repeatability of the calibrations, i.e. the intra-lab variability, which was investigated for those laboratories that repeated in-house calibrations with the same settings. It was expressed as the CoV of one lab’s repetitions.

An in-depth analysis of the repeatability and comparability (ANOVA) according to ISO 5752-2 was carried out but did not reveal additional information due to a lack of data, especially a lack of repetitions, in most laboratories.

3 Results

3.1 Inter-lab Comparability

For the evaluation, the crCPC calibration measurements are shown in three groups. The particle diameters of 70nm, 41nm, and 23nm are given separately, since the particle diameter influences the comparability of the calibration.

Beginning with the emery oil tests, the crCPC fulfilled the regulatory requirements after being serviced and calibrated at the start of the campaign. The counting efficiencies were 95% at 55 nm, 92.5% at 41 nm, and 51% at 23 nm. At the end of the campaign, they were 6–8% lower, partly due to a 4% lower measured sample flow rate of the crCPC.

At 70nm (Fig. 1), the crCPC was close to the CE plateau. Comparability of the calibration was expected to be best, since the sensitivity of the crCPC to particle size and material effects is lowest. The CoV of the crDF-SPG was 5.2%. That of the laboratories’ local generators was 4.5% for DF-SPGs and 2.8% for ED-SPGs. The two labs that used electrometer (#2, #3) measured similar or slightly lower counting efficiencies than the rest labs, indicating that the corrections for multiply charged particles brought the results within experimental uncertainty. No clear chronological trend (drift) was visible over the duration of the exercise when looking at labs #1–#6.

Fig. 1
figure 1

Calibrated crCPC counting efficiency at 70nm determined at each participants’ lab using the various local aerosol generators, shown in chronological order. Error bars show min/max, where multiple measurements were available. The “reference CE” was determined before and after the comparison at the CPC manufacturer lab (Mfr.) using an EO-PG as reference aerosol at a diameter of 55nm, which represents the CE plateau when using an EO-PG

The calibration at 41nm (Fig. 2) took place at the beginning of the CE plateau of the crCPC. The average CE with soot at 41nm was 14% lower than at 70nm. Again there was no visible chronological trend with soot aerosol. The comparability of the crDF-SPG gave a CoV of 5.8% across 5 labs. The CoV of the various DF-SPGs was 2.5%; the CoV of the various ED-SPGs was 4.5%. These numbers were very similar to the ones at 70nm, indicating that there was no significant difference in terms of calibration uncertainty. For 41nm and 70nm, comparability appeared to be more influenced by the individual laboratory procedure and setup than the aerosol generator. This observation will be addressed in the section on intra-lab repeatability.

Fig 2
figure 2

Calibrated crCPC counting efficiency at 41nm in chronological order

At 23nm (Fig. 3), the crCPC is known to be quite sensitive to changes in calibration material and aerosol size. This also became visible as a much lower observed comparability: the CoV of the calibration with the reference generator (crDF-SPG) was as much as 29.8%, with the DF-SPG, it was 10.7%, and with the ED-SPG, it was 30.6%. Therefore at 23nm, the laboratories’ own DF-SPGs produced a significantly better comparability than the other two types of soot generator.

Fig. 3
figure 3

Calibrated crCPC counting efficiency at 23nm in chronological order

The measurements have shown that the calibration of the plateau regime (≥41nm) and the cut-off size regime (23nm) need to be regarded separately. In the plateau regime, the type of soot generator did not affect calibration. The most unfavourable test series at 41nm and 70nm was produced by the crDF-SPG with a CoV of 5.8% and 5.2%. The range of 5.2–5.8% marked the comparability for a calibration with a soot generator that was not yet familiar and thus not fine-tuned by the lab personnel. The local ED-SPG and DF-SPG calibrations showed a CoV of up to 4.5% (ED-SPG at 41nm, DF-SPG at 70nm). This result provides, for the first time, an industry-wide quantification of the comparability of professional CPC calibrations in the plateau regime under a representative scenario with a well-tuned and personally maintained setup.

In the CE cut-off regime, the picture was quite different. Here the CoV of the comparability was more than twice as high. Both crDF-SPG and ED-SPG showed a CoV around 30%, while the laboratory-owned DF-SPG was at a significantly lower value of 10.7%. The CoV of 10.7% is a representation for the achievable comparability of well-tuned laboratory equipment in the cut-off regime. But here the large spread between aerosol generator types calls for a closer look on the application conditions of the aerosol generators.

3.2 A Closer Look at the Soot Generators

The comparability with the crDF-SPG was slightly worse than that using the local DF-SPGs in the plateau regime and much lower in the cut-off regime, despite the fact that different burner models and thermal treatment systems were combined and used in each laboratory (see Table 1).

The participating laboratories have reported to having optimized their in-house DF-SPGs in terms of long-term stability, thermal stability, and correction of multiple charges (when used in conjunction with an AEM). As the participants spent relatively little time with the crDF-SPG, they were less familiar with it and its subtleties, which could partly explain the increased variance. Additionally, a shift in the particle size distribution of the crDF-SPG was noticed during the evaluation of the exercise that was caused by contamination from prolonged use. The particle size distribution returned to its initial values after thorough cleaning at lab #6. Comparing the two states, it was found that the mode of the size distribution was shifted by +2nm at 23nm and by +5nm at 41nm. This shift resulted in double the amount of multiple charges, which were however corrected when using an AEM. Uncertainties of unknown magnitude could have been caused by a drift-induced soot properties change, e.g. a change of particle morphology or surface wettability that influences the condensation process within the CPC. In addition, we removed the results obtained with the crDF-SPG at one lab (lab #5) from the evaluation because it was experiencing flame cut-offs and could not achieve stable generator operation. Another reason for the low comparability at 23nm was the low aerosol concentration at the operating point for 23nm, when sampling after the dual internal injector dilution. At one participant (lab #4), concentrations below only 1000/cm3 could be achieved at 23nm. The measurements were still usable since a CPC was used as the laboratory reference, whereas an AEM would not be applicable for the concentration range below 2000/cm3. Also, the setup at lab #4 included the highest dilution of the participants.

Of the five local DF-SPGs, four were of the smallest “miniCAST 6200” type, and one was the larger “miniCAST 5200”. The default propane flow in the burner of the former was 20ml/min, while it was 60ml/min for the latter. Recent research has shown that the 5200-CAST variants yield more thermally stable particles rich in elemental carbon at small particle sizes like 23nm and below [25]. As all CAST generators thermally treated the generated particles, this could be either due to the much higher flame temperatures in the 5200 model for set points of comparable size distributions, or the longer residence time until quenching [26]. In this exercise, the 5200-CAST of lab #3 showed the steepest counting efficiency curve of all DF-SPGs. It was producing the highest CE of all in-house DF-SPG at 23nm and exceeding the crDF-SPG by 15% in the direct comparison at the same lab. Lab #4 was using the more recent version 6204C of the miniCAST as opposed to the 6203 models used in all other labs. Between these, no trend in CE was visible from the results. According to the manufacturer, changes between these models were targeted at reducing contamination and improving long-term stability, while trying to ensure that flame properties remain largely unchanged.

All laboratories applied locally optimized generator operating points to reduce the share of multiply charged particles. This was achieved by classifying on the “right-hand side” (falling edge) of the particle number size distribution. The double charge corrections at 70nm were in the range of 2–6% for in-house DF-SPGs and 7–10% for the crDF-SPG. This illustrates that the “local” treatment of the in-house generators minimized double charges better than the crDF-SPG (for which during the comparison was less time available to find optimum burner settings).

The CE calibration variance when using the local ED-SPGs stood out in the cut-off regime with a large variance. A possible reason is the fact that it was quite challenging for the ED-SPG to produce particle number size distributions with a mode diameter of 23nm or smaller. A mode diameter smaller than the classified diameter is used to minimize the amount of multiply charged particles. For comparison, the smallest mode diameter reached at lab #4 was 30nm, which would result in a multiple charge correction of 2% in case an AEM was used at this lab (compared to 0–1% with DF-SPG). In the plateau regime, the ED-SPG showed its strength of reliably producing soot particles free of volatile organic compounds (hydrocarbons) with a well-defined chemical composition [21] so that no thermal treatment system for the particles was needed. ED-SPG setups were identical among the labs with the exception of the newer “3000 digital” model used in lab #4, which provided greater flexibility of the operation parameters but featured a similar lower size limit as its predecessor.

3.3 Intra-lab Repeatability

The short-term repeatability (in the same lab, on the same setup and generator) showed a very good consistency of the calibration procedure and the crCPC: At lab #4, four tests with DF-SPG on different days were within ± 1% at 41nm and 70nm. Likewise, tests at lab #1 with ED-SPG at 70nm were within ±1 % on 2 days with two sets each. A third lab (#2) did multiple tests with the crDF-SPG which were within ±2% for 41nm and 70nm.

The tests done by the manufacturer at the start and the comparison’s end 11 months later allowed a look at the long-term repeatability of a single lab. An EO-PG was used exclusively here. The re-test at the end of the exercise showed a lower CE than the initial calibration for all three particle sizes. A decline in CE between 6 (absolute) percentage points (70nm) to 9 points (41nm) was observed, of which 4 points could be attributed to a reduction in the crCPC sample flow. A chronological decline in crCPC performance was not visible in the other laboratories’ results measured with soot aerosol. Therefore, the part of the CE decline that did not originate from sample flow variation (around 4%) was due to the experimental uncertainty of the long-term observation.

When comparing the calibration values of the participants (esp. Fig. 1 and Fig. 2), it appeared that the lab-specific setup and procedure had a larger impact on variability than the aerosol generator used. To verify this hypothesis, the intra-lab repeatability across all soot generators in each lab was calculated.

At 70nm, this repeatability across all soot generators had an average CoV of 1.8% (labs ranged from 0.7 to 3.1%). Similarly, the average CoV was 1.9% at 41nm (range 0.8 to 2.7%). This means for the calibration in the plateau regime: exchanging the soot generator within a single laboratory produced a variability with a CoV of about 2%, which was significantly lower than the 4.5% CoV observed between laboratories. Thus, the soot generator contributed less than 2% (a sub-set of the intra-laboratory variation) to the inter-laboratory variability, and the majority of the uncertainty was caused by other factors. Here, the in-house reference counter and DMA are expected to the major contributors (since flow correction is excluded, and evaluation was centralized).

However, in the cut-off regime, a large intra-lab repeatability with an average CoV of 16% (range 7 to 24%) was observed. This fell in-between the comparability achieved with the DF-SPG (10.7% CoV) and the other two soot generator types (30% CoV). Thus, in this size regime, the soot aerosol generator and potentially the resulting soot morphology change were a major influence factor for the observed variability. Uncertainties at the size classification of the DMA and the higher particle losses at 23 nm compared to 41100 nm should also have contributed. The size uncertainty of the DMA has been calculated in the literature to be ±3% at 100nm [27]. Via a mathematical transformation, the calibration at 100nm is then applied to smaller particle sizes, so at least the same uncertainty as the calibration diameter (of 100nm) is expected at 23nm. Another source of uncertainty is the irregular alignment of non-spherical particles within the DMA depending on the strength of the electric field [28]. To quantify the influence of the classification on CE at 23nm, the DMA was exchanged for another instrument of the same model at two laboratories. This resulted in a CE difference of 5% (lab #4) to 9% (lab #2), which is a CoV of about 4% or a quarter of the variability caused by the soot generator.

3.4 General comparison of the soot generators

To give an overview across laboratories, Fig. 4 shows the average crCPC CE, grouped by soot aerosol type and particle size. Well on the CE plateau, at a mobility diameter of 100nm (soot particles) and 55nm (EO-PG), results were similar across all types of aerosol. The average CE was 92.4% (soot) versus 95.4% (EO-PG). At 70nm, calibrations with soot-like aerosol gave an average CE of 90.9% (range of results: 85–97%), which was 2% below the maximum efficiency, hence suggesting that the plateau was not reached at 70nm.

Fig. 4
figure 4

Average counting efficiency across all laboratories for the crCPC calibration, grouped by soot aerosol generator and particle size. The number of laboratories is given in brackets. Error bars show the min/max value from each set. The calibration by the instrument manufacturer, measured with the EO-PG at the start (higher value) and end (lower value) of the exercise, is given for reference

At the beginning of the plateau regime at 41nm, there was a 15 percentage point gap between the EO-PG and the soot aerosol generators. The average CE of all soot generators was 77.4% (range 73 to 85%) compared to 92.7% with the EO-PG from the calibration certificate. At 23nm, the “soot average” was 28.8% (17 to 42%) compared to 51.5% from the EO-PG.

Hence, with emery oil-based calibrations, the crCPC fulfilled the legislation limit of 50% ±12% at 23 nm and of >90% at 41nm. At the same time, legislative limits were not met using soot aerosols for a CPC that was initially adjusted to meet the legislative targets with emery oil. The general influence of the calibration aerosol material (soot vs. emery oil) on counting efficiency was found to be in line with previous publications considering the experimental uncertainty [8, 14, 29, 30].

Measurements at 10nm and 15nm revealed that the CE drops sharply below 23nm. At both 10nm and 15nm, all laboratories reported 0% CE with DF-SPG and crDF-SPG. With ED-SPG, the CE results were between 0 and 4%. The higher numbers for ED-SPG were most probably caused by a larger multiply-charged particle fraction resulting from an unsuitable size distribution with a too large mode diameter of 30nm or above.

When comparing the three types of soot generators (crDF-SPG, local DF-SPG, and local ED-SPG), they showed very similar average CEs. The differences lay well within the observed comparability. Average calibration results were within 3% in the plateau regime, while a spread of 14% was visible in the cut-off regime. Even though the ED-SPG operates on a technically different principle from the DF-SPG, these two could be considered equivalent for CE calibrations. In terms of variability, the DF-SPG was better suited for calibration at the cut-off diameter (23nm), however. This confirmed, but with a much larger lab count and database, the results of Kiwull, Wolf et al. [30], who had concluded that two types of ED-SPG tested were similar to one DF-SPG within one standard deviation in terms of average CE.

3.5 Influence of DF-SPG Operating Point on Calibration

The influence of the operating point of the diffusion flame burner (models listed in Table 2) on the calibration uncertainty was investigated. For the comparison, the particle size of 23nm was chosen where the variation was largest, using data from three labs with various DF-SPG (model 6200 miniCAST) as well as the crDF-SPG. Figure 5 shows that the derived CE varied between 18 and 31% for the first experiment (cr-ET, L-CS1), where two burner models were compared (in-house DF-SPG and crDF-SPG). These were combined with either an evaporation tube or a catalytic stripper for thermal treatment. The analysis showed that changing the burner had the greatest influence and led to CE differences of 6–11 absolute percentage points (L-CS1+2 vs. cr-CS1+2). These tests at the same time covered a wide concentration range, but no systematic influence of the concentration level was found (L-CS1-2, cr-CS1-2).

Table 2 Details of DF-SPG operating points at 23nm shown in Fig. 5
Fig. 5
figure 5

Variation on crCPC counting efficiency at 23nm introduced by various DF-SPG operating points. Test 1 (Lab #2) (square markers) used two burner models (L=lab’s own generator, cr=crDF-SPG) with either evaporation tube (ET, open points) or catalytic stripper (CS, filled points). Test 2 (Lab #1) (Op1-2) is a variation of burner operating points. In a third test (Lab #4) (Sd1-3), three different particle number size distributions were tested

The thermal treatment of the aerosol had smaller impact and resulted in CE variations of 2–7 points for the same burner model (L-ET vs. L-CS, cr-ET vs. cr-CS). Another test (Op1-2) from a second lab also compared two operating points on the crDF-SPG. Here the observed impact was only a change in CE of 1 percentage point.

A third lab (Sd1-3) made a DF-SPG comparison of three operating points of very similar propane to air ratio and therefore flame composition but different mode diameters of the size distribution of 20nm, 35nm, and 55nm. In this case, no significant influence of the generator’s particle mode diameter (a change of one point) on the crCPC CE was observed. For reference, the fuel-to-air flow ratio in the flame was calculated according to Durdina, Lobo et al. [26]. It is given as the C/O ratio, in which a value of 0.3 indicates a stoichiometric fuel to air mixture.

These investigations have shown that the calibration of the CPC counting efficiency at 23nm can be influenced by the design and setup of the aerosol generator, as reported in literature [8, 11, 15]. By comparing a large range of generators, the source of the variation could be analysed further. The largest influence was identified to be the variation among burner model and implementation. The change to another DF-SPG of the same type but different after treatment implementation (yet both with CS) had the biggest impact while shifting the output to larger particles for the same model with similar flame composition had only a minor influence (Sd 1-3). The thermal treatment had a considerable effect on crCPC CE as well. The possible error introduced by multiply charged particles at 23nm was limited. The probability of a 33nm particle carrying double charges is only 1.1% of the probability of a 23nm particle obtaining one charge.

The test abbreviations are used for reference. ET is an evaporation tube, and CS is a catalytic stripper for thermal treatment. Also, burner gas flows (propane, N2, oxidation air) and resulting C/O ratio are given. The size distribution used is described by its mode diameter “Mode Dp”. For the first set of tests (L-ET to cr-CS2), in-house and cr burner models with ET and CS after treatment were compared (lab #2). For the second set, different operating points (Op1-3) were compared (lab #1). In the third set (Sd1-3), generator size distribution was varied (lab#4)

3.6 Flow Correction of the Reference Instrument “crCPC“

The crCPC was calibrated according to the PMP protocol that means the CPC sample flow is not corrected for a deviation from the nominal value during calibration. In this way, a “flow offset” is incorporated in the CE results and in the resulting correction factor KF. The sample flow is controlled by a critical orifice that is assumed to deliver a constant volumetric flow, and an initial offset from the nominal flow would be corrected via the calibration factor KF from the annual calibration. The comparison of the sample flow measurements carried out by the participants revealed a significant difference of up to 7%. This triggered an investigation of the sample flow measurement as a possible source of a systematic error. An important point was whether the variability originated from a real deviation of the crCPC sample flow or from a measurement error caused by the various flow metres.

Hence, for comparison and to quantify the effect, the tests were recalculated by adding a sample flow correction to all crCPC calibrations. Depending on the participant lab, the actual flow values were measured before or after the calibration. The flow correction had two effects: Firstly, the CE became slightly higher because the actual flow was slightly below the nominal flow for the crCPC across the board. The average correction was +1.7% (range −1 to +4.9%) for the soot-like aerosols. Secondly, the inter-laboratory variance was affected. No trend was visible at 23nm, but at 41nm, a relative reduction of the variance by about 18% (range 7–24%) was visible across all aerosols; see Table 3. At 70nm, a relative improvement of the variance of 26% (range 18–38%) could be observed.

Table 3 Comparison of the inter-laboratory coefficient of variation (CoV, given in %) of the crCPC calibration at 23nm, 41nm, and 70nm, using individually flow-corrected data and non-corrected data as required by the PMP protocol

Except for 23nm, we saw a considerable reduction of inter-laboratory variance by applying sample flow correction to the crCPC. There are two possible explanations for this observation: Option 1, the flow correction that was applied only to the laboratories’ own reference instrument included an additional variability (from the flow measurement device), which was cancelled out when both the reference and crCPC were corrected. Option 2, the crCPC was affected by a variation of the sample flow between tests in different laboratories, which was subsequently corrected by the flow measurements. The altitude of the laboratories, which ranged from 60 to 520m and led to different ambient pressures, was eliminated as an influence factor by converting mass flow measurements to volumetric flow.

The flow measurements did not show a chronological trend that would have hinted to a contamination of the crCPC’s critical orifice. Still, the manufacturer’s measurements 11 months apart at the beginning and at the end of the campaign suggested a 4% flow reduction. At the same time, the flow metres used by the participants have a nominal measurement uncertainty that would explain the observed variability. Therefore, Option 1, an additional error introduced by the flow measurement, presumably has a much larger impact than Option 2. Some laboratories were using a volumetric flow metre (e.g. Gilibrator 2, Sensidyne) with a nominal uncertainty of ±1% of the reading. Mass flow metres were also used by the participants, which have an uncertainty of, e.g. ±2% of the reading (Model 4140, TSI Inc.) or ±2% of the end of scale of typically 2–3 l/min (compact metre, Voegtlin). These uncertainties, assuming a rectangular probability distribution over the measurement range, translated into a standard deviation of 0.6% (Gilibrator 2) to 3.5% (compact metre at 3l/min) of the flow measurements alone. These figures would explain the observed contribution of the flow measurement to the overall variability. While the flow measurement was expected to be the main contributor, the observed variability still was a combination of flow measurement and flow drift.

As a bottom line, investing in an accurate, calibrated flow measurement device and regularly measuring sample flow as well as ambient temperature and pressure is a very effective and valuable exercise to significantly reduce calibration uncertainty.

3.7 Inter-laboratory Variability Put in Context

While this comparison is unique in directly comparing calibration and automotive laboratories, there has been some effort in the metrological community to understand the variability of their reference PN counters. Their numbers can be utilized to put the observed variability into context. The comparison exercise presented here identified an inter-laboratory variance of the CE calibration of a single CPC to have a CoV of 2.5–4.5% at 41nm and 70nm. This deviation was similar for the tested in-house DF-SPG and ED-SPG soot aerosols. At these particle sizes, the range of the results was 4 to 10 (absolute) percentage points between the smallest and largest measured value. It has been shown that a sample flow correction of the crCPC can reduce the CoV by about 20% and the range between the smallest and largest value within a group (size, generator) to 3 to 7 percentage points.

The first metrological key comparison of reference aerosol electrometers at the National Metrological Institutes (NMIs) was exercised on a single aerosol source with singly charged dioctyl sebacate (DOS) particles [31] and showed an inter-laboratory CoV of 1.3 to 1.5% at 50, 80, and 100nm (CoV calculated from the published results). The NMIs’ reference AEMs were brought to the same place to be directly and simultaneously compared with each other on a single test bench. These numbers give the current optimum deviation between metrological primary PN reference instruments under controlled metrological conditions. The influence of different setups for aerosol generation cannot be deduced from this metrological comparison.

Furthermore, the first metrological comparison of reference CPCs (various models, all with D50 ≤ 10nm) serving as secondary PN standards at NMIs and research institutions was realized at a common aerosol source at TUT Tampere [32]. This exercise showed that the results of all laboratories for DF-SPG soot and particle diameter ≥ 41nm were within ±7% from the group’s mean value with a CoV of 4–5%. This is similar to the range of this study, where the calibrations with local DF-SPG of the crCPC against individual lab reference counters showed a CoV of 4.5% and were within ±4% from the mean at 41nm and within ±6% from the mean at 70nm, even though our study took place in many locations. The comparison of these two exercises suggests that the reference CPC accounted for the majority of the observed variability for calibrations in the plateau regime of the CE curve in our exercise.

4 Conclusions

CPCs are one of the two main components of PMP systems. A circulating reference CPC (crCPC) designed for vehicle PN emission measurements was calibrated by seven laboratories using ten different soot generators. This comparison allowed, for the first time, the quantification of the CPC calibration uncertainty of multiple industrial laboratories using soot aerosols according to the UN-ECE PMP protocol.

The goal of this exercise was to provide a quantification of the calibration uncertainty of CPCs that are employed for exhaust emission measurements in the automotive field. At the same time, it aimed to give an insight into the influence of different laboratory equipment and the state of calibration standardization in the field. The exercise investigated two types of soot aerosol generators, which may be used not only for the calibration of stationary CPCs but also for the calibration of PN-PEMS and solid particle losses in a volatile particle remover.

The calibration uncertainty was assessed by multiple labs calibrating the counting efficiency (CE) of a common crCPC at a wide range of particle sizes, esp. 23nm, 41nm, and 70nm. The crCPC was calibrated against the local reference of each laboratory, which either was an aerosol electrometer or a CPC with a D50 of ≤10nm. For aerosol generators, diffusion flame soot particle generators (DF-SPG) and electric discharge soot particle generators (ED-SPG) were used. A circulating common generator (crDF-SPG) was set up prior to the exercise and sent to the participants, too.

The main focus of the evaluation was the comparability of the laboratories. It was calculated using the coefficient of variation (CoV) (1 sigma) of the CE calibrations, grouped by particle diameter and aerosol generator. For comparison, the intra-lab repeatability was also investigated using the within-lab CoV. It was calculated as the standard deviation of the CE across all soot generator calibrations in one lab, grouped by diameter and divided by the respective mean.

The evaluation showed that the calibration uncertainty in general was different for the two operation regimes of the CPC. These regimes were defined as the plateau regime, where the CPC is at or near its maximum counting efficiency at ≥41nm and the cut-off regime at the D50 -diameter of 23nm.

In the plateau regime, analysed at 41nm and 70nm, the inter-lab calibration uncertainty was described by a CoV of 4.5%. This number is representative for the uncertainty with a well-tuned lab setup and was achieved by both the lab-owned DF-SPG and ED-SPG. These two showed a very similar performance despite a different generating principle. For the circulating crDF-SPG, the CoV was up to 5.8%. The laboratories were not familiar with its behaviour, and there was limited time for setup and tuning; thus, a larger uncertainty could be observed.

Another observation was the fact that lab-specific setup and routine dominated the comparability between laboratories, while exchanging the soot generator had a much smaller effect. The repeatability across soot generators within-lab had a mean CoV of 2%, which is low compared to the 4.5% CoV between laboratories. In the plateau regime, this leads to the laboratory reference counter and the flow measurement device as further significant sources of uncertainty. A reference CPC comparison among NMIs suggests that the reference instrument is the main contributor of the remaining uncertainty [32].

In the size cut-off regime, the uncertainty was found to be much higher with a CoV of 10.7% in the best case when using the labs’ DF-SPGs. Due to this comparably low uncertainty, these generators were found to be the most usable for a calibration at the cut-off size. With ED-SPG and crDF-SPG, the CoV rose drastically to about 30%. Both generators seemed less suitable for this size range. The ED-SPG in this study could not produce suitable particle size distributions with a mode diameter below 30nm, while the crDF-SPG suffered from a low (<1000 1/cm3) particle concentration at 23nm in some of the laboratories. Reducing the operating range of DF-SPG to only stoichiometric and lean fuel/air mixtures is expected to further reduce the variability of the aerosol [17, 26].

In this size regime at 23 nm, the intra-lab repeatability was much worse compared to the plateau regime with an average CoV of 16%. Viewing this number in comparison to the achieved comparability, it shows that the soot generator is the main source of variability for the calibration in the cut-off regime. This is very different to the situation in the plateau regime. The DMA size was another influencing factor, but the variability introduced by an exchange of the DMA was much less at a CoV of about 4%.

This exercise is difficult to put into context, since it is unique in its field so far. A key comparison of primary PN reference counters among metrological institutes achieved a CoV of 1.3–1.5%. Because the calibration of the labs’ reference counters in our comparison is derived (through several intermediate calibration steps) from a national metrological standard (except for PTB which has its own primary standard as a metrological institute), their variability must be larger.

The sample flow correction of the PN counters (crCPC and reference) has been investigated as a source of variability and was found to have noticeable impact. In a supplemental evaluation, the crCPC was normalized to its nominal flow rate in the same way as the reference counter. At particle sizes of ≥41nm, a relative reduction in the inter-laboratory variability of 18–26% could be achieved by including this correction. In part this could be explained by a probable drift in the flow rate of the crCPC. The variation of the flow metres used by the participants was identified as the likely main cause for this error. When sample flow correction was applied to both the crCPC and the laboratory reference counter, the error introduced by the flow measurement cancelled itself out in the calculation of the counting efficiency. Flow correction of the crCPC would bring down the variability from 4.5 to 2.5% in some cases (on average around 20% improvement). In practical terms, this means that the flow correction of the reference device should be better controlled or corrected. It is recommended to use a very high accuracy flow metre and to regularly get an accredited calibration as required for ISO 17025 certification.

The calibration with soot aerosols needed a correction for multiply charged particles, which could increase the uncertainty compared to particle generators that produce very narrow particle size distributions, e.g. electrospray generators for liquid droplets. At 70nm, multiple charge corrections were between 2 and 6% for the labs’ DF-SPG and ED-SPG and up to 10% for the crDF-SPG.

The calibration uncertainty found in this inter-laboratory exercise gives a typical value for the expected measurement uncertainty of a CPC used in a well-controlled industrial environment. As such, it should serve as an orientation point for users and legislators alike.

Although soot particle generators might not reach the accuracy of spherical particles without multiple charges (e.g. from electrospray), they can serve as a universal reference standard for both CPC and PN-PEMS. As such, the soot-based calibration establishes a link between these otherwise very different measurement approaches. A single aerosol type and a common PN calibration procedure for all automotive exhaust PN measurement devices, which are both currently missing, may lower the overall process uncertainty und thus the uncertainty when testing vehicles for legal emission limits.