Background

Serum creatinine (SCr) based prediction equations are frequently used in screening and clinical settings in order to estimate the glomerular filtration rate (GFR). The current variability in SCr measurements affects all estimating equations for GFR, including the MDRD equation. Many automated routine methods for SCr measurement exceed the desirable imprecision criterion of ≤ 2.2%; therefore, reduction of analytical bias ≤ 3.4% in creatinine assays by standardization of calibration is needed [1]. It is important to notice that standardization of calibration does not correct for analytical interferences (nonspecificity bias). The bias and nonspecificity problems associated with some of the routine methods must be addressed.

Chronic kidney disease (CKD) staging directly relies on these estimated GFR values. Using accurate SCr measurements is essential, since systematic errors cause unreliable renal function estimates, leading to incorrect drug dose adjustments, misclassifications in CKD staging and incomparability of patient data [25].

Since significant interlaboratory variation was observed worldwide, it was internationally confirmed that calibration traceability to higher-order reference methods was needed to realize comparable biochemical measurement results [2, 6, 7].

Therefore, the European in vitro diagnostics (IVD) directive 98/79/EC, and the laboratory working group of the National Kidney Disease Education Program recommend that in order to improve standardization, clinical laboratory measurements should be traceable to internationally recognized and certified reference materials [810]. Since the development of NIST SRM 967 in 2006, a matrix-based IDMS targeted creatinine standard, all essential elements (i.e. reference methods, reference laboratories, and materials) needed to complete the creatinine reference system are in place, according to ISO 17511 [11]. Because the complete traceability train is agreed upon in vitro diagnostic manufactures of creatinine assays in Europe are legally obliged to make their products metrologically traceable, regardless of the method applied.

In this study we examine the degree of variability and interchangeability of SCr measurements across all clinical chemistry laboratories in 2009 in the Netherlands, in order to evaluate the situation after global restandardization, using data from the annual national external quality control program of the Dutch external quality assessment (EQA) organization for clinical chemistry laboratories (Stichting Kwaliteitsbewaking Medische Laboratoriumdiagnostiek, SKML). Subsequently, we investigate in a theoretical model, the impact of the variability in SCr measurements between laboratories on estimates of GFR using the 4-variable IDMS-traceable MDRD formula and the consequences hereof on CKD staging of patients, when the data from the SKML are extrapolated to a large cohort of patients.

Methods

In this cross-sectional study, we evaluate the effect of different SCr assays on SCr levels and CKD classification. EQA data are derived from the 2009 EQA program of the SKML. Annually, the SKML creates 11 pairs of frozen commutable, value assigned serum samples spiked with crystalline creatinine, aliquoted in 1 ml vials. A commutable material reflects the characteristics and properties of native patient samples [2, 12]. Value-assignment was performed by a joint committee on traceability in laboratory medicine (JCTLM)-endorsed reference laboratory. Each of the 144 laboratories participating in the EQA program in the Netherlands annually receives a set containing 11 pairs of these commutable samples from the SKML and store them intermittently at −80°C. The 11 pairs of EQA-samples cover SCr values in the entire measuring range: 52-73-94-115-136-157-178-199-220-241-262 μmol/L and form a linear sequence; thus each laboratory received and analyzed, the range mentioned before in twofold over the year. The target values for the SCr levels are established by a JCTLM listed reference laboratory (Bonn, Germany) using an Isotope Dilution Gas Chromatography/Mass Spectrometry (ID-GC/MS) method [13, 14].

Every other week all routine laboratories thawed one of the EQA samples and measured the SCr concentration applying their routine SCr methods according to the manufacturer’s instructions. 91 (63%) versus 48 (33%) of the laboratories used a Jaffe or enzymatic method to measure SCr, respectively. 62 (68%) laboratories using a Jaffe technique applied a modified kinetic Jaffe method; 29 (32%) used a compensated Jaffe method. Few laboratories used dry chemistry to measure SCr; since this group of laboratories was too small to draw conclusions from (n = 5), this group was excluded from further analyses. Companies and instruments included Abbott (Abott Park, Il, USA; Aeroset, Architect), Beckman (Brea, Ca, USA; Synchron, Unicel, LX20, Lxi725), Siemens Healthcare diagnostics (The Netherlands; ADVIA 1650, 1800, 2400), Roche Diagnostics (Mannheim, Germany; Integra, Hitachi, Modular, Cobas, Cobas Integra) and Olympus (Tokyo, Japan; AU 400, 600). In total, 39 different instrument method combinations were used to measure SCr.

Data of the SCr measurements as measured by the participating laboratories were reported centrally to the SKML and collected in an completely anonymous database.

Variability SCr extrapolated in cohort

To investigate the impact of the variability in SCr measurements as found in the national EQA database and the eventual clinical consequences hereof in a real patient population, we used an unselected cohort of 82424 patients whose SCr had been measured in 2009 in the Isala Clinics Zwolle, the Netherlands; the details of this population have been described before [15]. In short, 45.3% of the population was male, age varied from 19–106 years and 38.7% was older than 65 years old. SCr in this cohort was measured using an enzymatic technique (modular P Analyzer, creatinine plus assay; Roche Diagnostics, Mannheim, Germany). In order to obtain SCr reference values that are traceable to the reference data from the EQA program for each patient, we requested the results from the 2009 EQA program from the clinical chemistry department, Isala clinics Zwolle. Based on these results we created a regression equation for the Zwolle laboratory (the exact procedure is extensively described in the statistical analyses section), using inverse regression. Subsequently SCr values as measured in the Zwolle population were introduced in this regression equation in order to calculate the SCr values traceable to the results of the EQA program for each of the 82424 patients. The GFR using these IDMS-traceable SCr values was estimated using the 4-variable IDMS-traceable MDRD formula [16, 17].

Statistical analysis

We used SPSS version 16.0 (SAS Institute, Cary, NC, USA) and STATA version 11 (StataCorp, College Station, Texas USA) for statistical analyses. All SCr measurements of the laboratories participating in the EQA program were inspected for outliers (truncated at ± 3 standard deviation (SD)); 3 laboratories were removed from the dataset because more than 50% of the measurements of these 3 laboratories deviated more than 3 SD from the other laboratories. The target reference values from the linearity sequence of the EQA program served as the reference method against which routine methods to measure SCr from participating laboratories were compared, by means of relative and absolute bias. The results were displayed in box and whisker plots for each method group. Relative bias is defined as the mean percentage difference [(measured SCr-target value SCr)/target value SCr] x 100; absolute bias is defined as the mean difference between SCr values measured by individual laboratories and SCr target values; precision is defined as the SD of the absolute bias.

We extrapolated the impact of the non-equivalence in SCr measures (as derived from the laboratories participating in the EQA program), to our patient cohort of 82424 patients. In order to do so, SCr values as measured by laboratories participating in the quality assessment program were regressed on the target values of the samples sent by the SKML, so-called inverse regression, for each of the participating laboratories separately. Regression equations for each of the participating laboratories, (n = 47 for Jaffe and n = 39 for enzymatic), who had not changed their technique to measure SCr in 2009, were created. For each of these regression equations (thus for each of the laboratories fulfilling the criteria mentioned above), we calculated an area under the curve (AUC) in the range 73–115 μmol/L. The range of 73–115 μmol/L was chosen since especially these values of SCr provide eGFR’s around the threshold value of 60 ml/min/1.73 m2, sufficient to classify patients as having CKD stage 3. Subsequently, the AUC’s of all the participating laboratories were ranked in ascending order for the Jaffe and the enzymatic technique separately, in order to establish a 10th and 90th percentile regression equation for each of the techniques. Then, SCr values from our cohort of 82424 patients were inserted in the 10th and 90th percentile regression equations (for the Jaffe technique and the enzymatic technique as appropriate). These ‘newly calculated’ SCr values were introduced in the appropriate MDRD equations, thus providing 10th and 90th percentile eGFR values. To get an impression from the clinical implications of the variation in SCr values on CKD staging, we classified the patients according to the K/DOQI guidelines and evaluated the differences in CKD staging when SCr values were measured by Jaffe or the enzymatic methods [18].

Ethical statement

No permission was required from the Medical Ethics Committee as our data only included lab result information, which had been obtained from a laboratory database. No personal patient information was included. Permission to use the national 2009 EQA-data was obtained from the SKML. The laboratories in the dataset were anonymous.

Results

The relative and absolute bias for both Jaffe and enzymatic techniques are shown in Figures 1, 2, 3 and 4. The enzymatic method to measure SCr produced the least biased results, which were not significantly different from the target values, whereas the Jaffe technique produced the most biased and imprecise results, which differed significantly from the reference values. The Jaffe technique especially tended to overestimate SCr at low concentrations: 21%, 12%, 10% for the SCr target values 52, 73 and 94 μmol/L, respectively. The enzymatic method had a small bias that was constant over the entire range of SCr values. The precision for Jaffe/enzymatic (per reference value) is: 10/3 (52 μmol/L), 10/3 (73 μmol/L), 7/3 (94 μmol/L), 13/4 (115 μmol/L), 7/5 (136 μmol/L), 8/4 (157 μmol/L), 8/5 (178 μmol/L), 9/5 (199 μmol/L), 10/6 (220 μmol/L), 11/5 (241 μmol/L), 5/2 (262 μmol/L) for both the Jaffe and the enzymatic method.

Figure 1
figure 1

Box and whisker plot showing the percentual bias of the Jaffe technique. Box and whisker plot showing the percentual bias for the Jaffe technique . Interpretation of the vertical axis e.g. 1.1 means a percentual bias of 10%. The box represents the 25th, 50th and 75th percentile; the whiskers represent the 5th and 95th percentile. The extremes, defined as values more than three times the interquartile range, are the signs above and underneath the whiskers. The grey line represents the 0% bias line.

Figure 2
figure 2

Box and whisker plot showing the percentual bias of the enzymatic technique. Box and whisker plot showing the percentual bias for the enzymatic technique (1). Interpretation of the vertical axis e.g. 1.1 means a percentual bias of 10%. The box represents the 25th, 50th and 75th percentile; the whiskers represent the 5th and 95th percentile. The extremes, defined as values more than three times the interquartile range, are the signs above and underneath the whiskers. The grey line represents the 0% bias line.

Figure 3
figure 3

Box and whisker plot showing the absolute bias (μmol/L) for the Jaffe technique. The box represents the 25th, 50th and 75th percentile; the whiskers represent the 5th and 95th percentile. The extremes, defined as values more than three times the interquartile range, are the signs above and underneath the whiskers. The grey line represents the 0 μmol/L bias line.

Figure 4
figure 4

Box and whisker plot showing the absolute bias (μmol/L) for the enzymatic technique. The box represents the 25th, 50th and 75th percentile; the whiskers represent the 5th and 95th percentile. The extremes, defined as values more than three times the interquartile range, are the signs above and underneath the whiskers. The grey line represents the 0 μmol/L bias line.

The impact of the variability in SCr measurements on CKD staging is illustrated in Tables 1 and 2. From the tables we can conclude that the differences between the 10th and 90th percentile laboratory are large when a Jaffe technique is used. Downgrading to a lower CKD class was observed using the Jaffe assay for CKD stages: 45–60 ml/min/1.73 m2 (1.1%, 41.9%); 60–90 ml/min/1.73 m2 (1.8, 36.7%) and >90 ml/min/1.73 m2 (12.3%, 78.9%), for the 10th and 90th percentile values respectively. When an enzymatic technique was used, the variability resulted in both upward and downward reclassification of CKD stage. Downward reclassification occurred in 2.1-4.1% of patients, whereas upgrading occurred in 15.6-30.1% of patients.

Table 1 Implications for CKD staging when a Jaffe or a standardized serumcreatinine is used
Table 2 Implication for CKD staging when an enzymatic or a standardized serumcreatinine is used

Discussion

The present study shows that interlaboratory non-equivalence in SCr assays in the Netherlands was still substantial in 2009, notwithstanding the recent international creatinine restandardization effort. The high variability was largely explained by the ongoing use of Jaffe assays for measuring SCr. Compared to the enzymatic assays, the Jaffe assays had a significantly larger bias, especially for SCr levels in the lower range (reference value range 52–115 μmol/L). Although relative bias decreased when SCr reference values were higher, imprecision remained high. It was of course to be expected that Jaffe methods lead to a positive bias compared to the ID-/GC-MS method, and that adjustment for this bias would occur when using the appropriate MDRD equation. This has caused the downgrading of patients to a lower CKD category relevantly more often when a Jaffe technique instead of an enzymatic technique was used, especially in categories >45 ml/min/1.73 m2 (up to 79%). In contrast, the use of an enzymatic technique more often resulted in upgrading of the CKD stage, which may be explained by the differences in bias: the Jaffe technique provided higher values of SCr, whereas the enzymatic technique provided slightly lower values.

Ever since SCr is assessed in clinical practice its accuracy has been debated [1, 7, 19]. Although, SCr measurements are routinely performed, it is one of the most variable laboratory tests [20]. The increasing use of eGFR in clinical practice has renewed the interest on the shortcomings of the SCr methodology [1, 2124]. Since SCr is the most important variable in the renal function estimation equations, calibration of the creatinine assays is necessary to reduce bias in these formulas. This even lead to a modification of the factor used in the MDRD-equation (from 186 to 175 for IDMS traceable creatinine). However, the way this calibration was obtained has been regularly criticized in literature, due to the fact that the formulas were modified after having recalibrated the Jaffe creatinine to an IDMS traceable enzymatic method, having deleted the intercepts since these were not statistically significant.

The substantial bias and between laboratory variance as we found in our study, has been shown in various other studies in which data from proficiency testing (PT) and EQA scheme programs were evaluated [7, 25, 26]. A European trueness verification study of SCr also showed large interlaboratory variability before the matrix-based SRM 967 standard was available [7]. In our study we would have expected a significantly reduced interlaboratory CV due to global restandardization on SRM 967. However, despite the European IVD directive with stricter regulations, no improvements compared to earlier studies, in which a method group SD of 2.6-11 μmol/L and a median CV of 5% at a SCr concentration of 74 μmol/L, had been reached [27]. The failure to realize amelioration of interlaboratory non-equivalence is explained by the fact that standardization does not correct for analytical non-specificity problems, as is the case with the Jaffe method [28, 29]. These non-specificity problems concern the measurement of many endogenous and exogenous interfering substances such as protein, glucose and ketones when SCr is measured using a Jaffe method [2831]. Despite many attempts to improve the performance characteristics of the Jaffe reaction, non-specificity remained [7]. This leads to overestimation of the true SCr concentration. Calibration traceability cannot solve this problem nor substitute for improvement of suboptimal routine methods.

Although the enzymatic assay to measure SCr is not free of interference from various substances, it has a better specificity than the Jaffe technique [28]. This was recently confirmed in a multicentre study evaluating IDMS-traceable enzymatic creatinine assays. It showed that the majority of enzymatic methods reached the acceptable total analytical error of 8% for SCr values as low as 36 μmol/L, when adequately calibrated against IDMS, improving the traceability and standardization of creatinine [32].

Moreover, upgrading in CKD stage as we observed when enzymatic assays were used in the MDRD formula may be less relevant in routine clinical practice than the downgrading to a lower CKD stage as occurs when a Jaffe assay was used to measure SCr. E.g. a patient whose eGFR is 57 ml/min/1.73 m2 (CKD stage 3a) or 62 ml/min/1.73 m2 (CKD stage 2) when a Jaffe respectively an enzymatic assay is used to examine SCr, probably have similar risks regarding end-stage renal disease, all-cause and cardiovascular mortality. From studies comparing the prognosis associated with the two most commonly used equations to estimate GFR (the MDRD and the Chronic Kidney Disease Epidemiology collaboration equation, CKD-EPI) during a long follow-up (≥7.5 years) we know that individuals reclassified from CKD stage 3a (eGFR 45–60 ml/min/1.73 m2) to no CKD had lower mortality risk than those not reclassified. Moreover, these participants had an equal risk to those classified as no CKD by both formulas [33, 34].

Based on the large batch of evidence in literature showing that alkaline picrate methods are inferior methods to measure creatinine, it is time for laboratories to substitute the alkaline picrate method by enzymatic methods. Moreover, if an increasing number of laboratories apply enzymatic techniques, the number of vendors of commercial enzymatic assays will increase, leading to more competition, which will ultimately reduce the costs of these assays. To bring this in a broader perspective, more accurate and precise measurements of SCr will lead to a reduction of the source of error in GFR estimates and thus errors in the staging of renal failure. Considering the number of patients that are misclassified in this study when using an alkaline picrate technique, clinical laboratories should also consider the implications for overall health costs, since patients are referred based on creatinine based estimates of GFR [35].

Although this study is a theoretical analysis, it is one of the few illustrating the consequences of variations in SCr measurements on GFR estimation and CKD staging for the individual patient. Since the majority of Dutch laboratories is included and we have a large heterogeneous cohort of patients in which we tested our model, we are able to give a good reflection of the consequences it might have for daily clinical practice. Moreover, we have studied creatinine values on 11 different levels against a strong reference method; and the samples used, were all recent instead of remote samples, which are frequently used in other studies. Selection bias may have occurred, since laboratories with too few analyses in the external quality control program were excluded for further analysis in our patient cohort. Moreover, we applied the MDRD formula in a patient cohort with an age range from 19–106 years. This may have introduced bias since, the MDRD has only been validated for patients from 19–70 years, and underestimates the GFR in patients >70 years. However, in clinical practice, laboratories automatically report eGFR’s each time a creatinine is measured, also in patients older than 70 years, and clinical decision making is often based on these estimates.

Conclusions

In conclusion, accurate and precise measurements of SCr are required for a more reliable estimation of GFR as support for reliable clinical decision making. Enzymatic techniques measure SCr with substantially less variability than Jaffe techniques as compared to ID-MS reference values. This leads to more reliable estimation of GFR and CKD staging. To allow improvement of reliability of eGFR, specific enzymatic techniques to measure SCr are preferable over unspecific Jaffe techniques.