Background

The International Classification of Diseases (ICD) coding system is widely utilized for administrative, clinical, and epidemiological purposes. ICD codes serve a vital role in informing the medical community as key decisions are made regarding policy and reimbursement decisions [1]. On October 1, 2015, the 10th revision of the ICD coding system was implemented under mandate of the United States Department of Health and Human Services [2]. Previous research has examined the accuracy of ICD-10 coding with regard to Chronic Kidney Disease (CKD), but limited longitudinal data precluded examining ICD-10 coding data accuracy in the context of disease progression [3, 4]. This study utilizes ICD-10 data originating in a large claims database from 2016 to 2021 to assess ICD-10 coding accuracy among CKD patients.

The previous ICD-9 system was revised to ICD-10 with the aim of increasing specificity of the codes. This increased specificity allows for rapid incorporation of emerging diseases and higher detail allowing for more precise diagnostic codes. Consequentially, ICD-10 boasts 69,823 codes compared to only 14,025 for ICD-9 [2]. However, CKD diagnostic codes have not benefitted from the improvements from ICD-9 to ICD-10. Indeed, the primary diagnostic codes indicating CKD staging simply change the prefix from 585 to N18, yet continue to identify only the primary stages with no distinction between stage 3a and stage 3b. Codes indicating an underlying cause of CKD have increased allowing for more detailed diagnosis and better tracking of the disease’s etiology, though whether this translates to improved diagnostic has not been established.

Studies of agreement between ICD-9 coding and gold-standard clinical markers have demonstrated disease-dependent accuracy rates. Cardiovascular diseases, stroke, and pneumococcal pneumonia, for example, have all been shown to have accurate ICD codes [5,6,7]. Similar studies with ICD-10 data have drawn conclusions consistent with previous ICD-9 based research [8,9,10]. That these conditions generally present with clear symptoms may partially explain the accuracy of their related codes.

Chronic Kidney Disease (CKD) coding accuracy, however, is notably deficient, with many ICD-9 studies reporting low sensitivity rates with high specificity rates [3, 11, 12]. Meta-analyses and systematic reviews of the surrounding literature report widely varying sensitivity and specificity rates, suggesting inconsistent coding practices and accuracy [13, 14]. Research utilizing ICD-10 codes has not shown substantial improvement [3]. However, a recent study demonstrated that utilizing multiple CKD codes in conjunction may yield acceptable diagnostic accuracy [4]. These latest results notwithstanding, the subtle nature of CKD and its common presentation alongside other comorbid conditions may offer some cause for the poor diagnostic utility of ICD codes in identifying clinical CKD.

The identification of rapid progressors, defined as those with yearly estimated glomerular filtration rate (eGFR) loss greater than 4 ml/min/1.73 m²) would allow for expedient care for those suffering from advanced CKD. Our previous work showed that ICD-9 CKD staging codes and their use was insufficient to identify patients with rapidly progressing CKD [3]. However, only two years of ICD-10 data was available at the time of that prior study, and therefore progression analysis was not possible.

This manuscript expands our prior research and leverages five years of outpatient ICD-10 codes to evaluate coding accuracy along three objectives:

  • Rapid Progression Accuracy: Rapidly progressing patients identified clinically using longitudinal eGFR were compared against patients with multiple ICD-10 CKD staging codes indicating increasing disease severity to determine accuracy of ICD-10 codes.

  • Overall and Stage-Stratified Accuracy: CKD patients identified clinically using multiple eGFR measures were compared against those with any ICD-10 code indicating CKD to determine overall accuracy. Further, CKD patients were assigned a CKD stage based on eGFR measures and compared against those with ICD-10 CKD staging codes to assess accuracy of ICD-10 staging codes.

  • Demographic/Comorbidity Varying Accuracy: Agreement of the two diagnostic paradigms (eGFR-based and coding-based) was modeled against demographic and comorbidity data in a multivariate logistic regression to assess if diagnostic accuracy improves with varying patient demographic and comorbid profiles.

Methods

This study utilized claims data from a large third party insurer, servicing over 1.3 million patients across the Western New York and Albany areas of New York State. Consisting of ten years of data from 2011 to 2021, prior research has explored this rich database [3, 15]. Focusing on the five-year period from 2016 to 2021, this study examines ICD-10 coding accuracy in the context of CKD. Patients with stage-3 CKD were identified using measured serum creatinine values and estimated glomerular filtration rate (eGFR) using a modified eGFR formula to exclude race [16]. With unique patient identifiers and observation dates, these eGFR values were linked to diagnostic ICD codes.

Based on clinician interpretation of Kidney Disease Outcomes Quality Initiatives (KDOQI) guidelines, patients with serum creatinine, age, and gender had eGFR values calculated. Those with two eGFR measures less than 60 ml/min/1.73 m² at least ninety days apart, with no intervening measurement greater than 60 ml/min/1.73 m², were identified by their eGFR as stage-3, stage-4, or stage-5 CKD cases. Limited presence of lab values precluded albuminuria-based stage 1 and stage 2 CKD diagnosis. Individuals with laboratory-confirmed CKD are referred to as eGFR-CKD.

CKD patients were alternatively identified using ICD-10-CM codes. The following code groups were considered: Chronic Kidney Disease (N18.1, N18.2, N18.3, N18.4, N18.5, N18.6, N18.9), Hypertensive CKD and hypertensive heart and CKD (I12.0, I12.9, I13.0, I13.1, I13.10, I13.11, I13.2), and diabetic mellitus with CKD (E08.21, E08.22, E08.29, E09.21, E09.22, E09.29, E10.21, E10.22, E10.29, E11.21, E11.22, E11.29, E13.21, E13.22, E13.29). Patients with at least one occurrence of any code were classified as ICD-CKD.

A longitudinal mixed model analysis was used to estimate the rate of eGFR progression over time using the eGFR-CKD patients [17]. Patients were followed from initial entry into CKD-stage 3 until they reached CKD-stage 5, or end-stage-kidney-disease (ESKD) treatment was initiated. Only patients with at least three years of follow-up data and five observations were included. eGFR was modeled against fixed and random effects of time (measured in quarter-year increments), and a random intercept was also included in the model. Those patients who experienced a yearly loss of eGFR greater than 4 ml/min/1.73 m² were considered to be rapid progressors [18, 19].

Based on the mixed model, Estimated Best Linear Unbiased Predictors (EBLUPs) for each patient was calculated [20]. Based on the slope derived from the EBLUPs, each patient was categorized as rapid progressors (RP). For the ICD-CKD patients that also met inclusion criteria for the progression analysis, ICD-10 staging codes (N18.3, N18.4, N18.5) were used to identify RP. Those with at least two codes of increasing stage were considered as such. Thus, each patient in the analysis was categorized as an eGFR-RP or ICD-RP or not.

To assess the accuracy of ICD-CKD and ICD-RP to indicate eGFR-CKD and eGFR-RP, epidemiological quantities for sensitivity (#true positives/[#true positives + #false negatives]), specificity (#true negatives/[#true negatives + #false positives]), positive predictive value (PPV; #true positives/[#true positives + #false positives]) and negative predictive value (NPV; #true negatives/[#true negatives + #false negatives]) were estimated with 95% confidence intervals. These four quantities are referred to as “performance measures” in this paper.

Agreement of ICD- and eGFR-CKD diagnoses was modeled against gender, age > 65, and comorbid conditions (proteinuria, diabetes, congestive heart failure, other heart diseases, and hypertension) in a multivariate logistic regression. Receiver operating characteristic (ROC) curves were generated using the Mann-Whitney association to estimate the area under the curve (AUC). A non-informative curve with AUC of 0.5 was held as reference, and every other curve was compared using a non-parametric approach [21].

Results

Of the approximately 1.3 million patients in the claims database, 336,752 had sufficient serum creatinine measurements to determine eGFR-CKD status. Of these, 21,328 patients were identified as eGFR-CKD and 48,322 were ICD-CKD. Table 1 summarizes the sample demographics and selected comorbidities. Results of McNemar’s test showed differences in proportions across all groups (p < 0.0001).

Table 1 Demographic summary

Of the 5,618 patients qualifying for the progression analysis, 72 were identified as eGFR-RP, while 718 had multiple codes to qualify as ICD-RP patients. However, only 4 of these patients were among the eGFR-RP. Sensitivity was 5.56% (1.53, 13.62), with PPV 5.6% (1.5, 14.2), and specificity 87.13% (86.22, 88.00), with NPV 98.61% (98.24, 98.92). Table 2 summarizes the progression analysis sample.

Table 2 Contingency table of eGFR-based identification against ICD identification of rapid progressors (RP)

When considering all CKD codes as well as diabetic, hypertensive, and heart disease codes that also indicate CKD against eGFR-CKD status, ICD codes perform well, with a sensitivity of 77.12% (76.56, 77.68). Sensitivity for staging codes is varied, with a low of 50.41% (49.89, 50.94) among clinically identified stage-3 patients, to a high of 67.82% (66.39, 69.25) among stage-4 patients, and finally 60.62% (57.68, 63.56) among stage-5 patients. Full results can be seen in Table 3 below.

Table 3 Performance measures

In the progression sample, ROC analysis showed little improvement in detection of rapid progressors when controlling for comorbid history, with heart issues offering the greatest advantage in predictive value over an arbitrary decision (AUC = 0.5769, 95% CI = 0.5596,0.5942). In the overall sample, minor to moderate improvement to overall coding accuracy compared to over an arbitrary decision when controlling for comorbidities. Elderly age (AUC = 0.7199, 95% CI: 0.7163, 0.7235) added the most predictive value. AUCs are plotted in Fig. 1 below.

Fig. 1
figure 1

ROC curves for comorbidities in progression (left) and overall (right) samples

Discussion

Detection of individuals who are experiencing rapidly progressing CKD is a critical step in treatment. Utilization of ICD codes to programmatically identify potential rapid progressors would allow for expeditious care for those at the highest risk. This study is the first to explore the viability of ICD-10 codes and practices in detecting rapid progressors and CKD patients in general. As shown previously, ICD codes remain ineffective at either of these tasks [3].

While the CKD-staging codes identify the major stages of the disease, the ICD-10 revision has done little to mark the more subtle changes that may indicate a patient at risk for rapid progression. Compared to our previous work with ICD-9 data, diagnostic accuracy for RP patients was worse among most measures [3]. Sensitivity was 5.56% in the current ICD-10 study vs. 25.7% in the previous ICD-9 study, PPV 5.6% vs. 14.2%, specificity was 87.13% vs. 94.94%, with only NPV showing slight improvement at 98.61% vs. 97.73%.

An additional code to separate CKD-stage 3 into the commonly used stage 3a and stage 3b subtypes would perhaps improve detection rates for patients at this critical junction in their CKD course. This problem has been addressed in the upcoming ICD-11 revision, however, with distinct codes for stage 3a and stage 3b included [22].

Table 4 below summarizes selected research studies into coding accuracy.

Table 4 Characteristics of studies on diagnostic accuracy of chronic kidney disease

Compared to our previous study on ICD-9 data, the ICD-10 codes utilized in this study have shown improvement in sensitivity for stage-3 (50.34% vs. 24.68%), and PPV in stage-3 (58.71% vs. 40.08%), stage-4 (42.43% vs. 18.52%), and stage-5 (35.85% vs. 4.51%). However, sensitivity in stage-5 compares poorly (59.02% vs. 91.05%) [3]. Other ICD-10 studies have shown similar performance [23]. Novel approaches that combine multiple codes may yield improvement [4].

Comparing diagnostic accuracy using any qualifying code showed improved sensitivity (77.12% vs. 32.16%) and NPV (98.31% vs. 90.33%), but worse PPV (34.04% vs. 63.10%) and specificity (89.89% vs. 97.12%) [3]. These mixed results of the diagnostic accuracy measures may reflect the increased amount of secondary codes indicating underlying CKD causes.

Generally speaking, ICD-10 coding appears to have some accuracy improvement over ICD-9. Given the similarity between ICD-9 and ICD-10 coding, it is likely that this improvement is derived from clinical practices. Increased reliance on electronic health records (EHR) and physicians becoming more facile with current technologies, as hospital administrators and staff implement policies to comply with EHR mandates. EHR implementation has been criticized for disrupting workflow and increasing workload, although positive effects of increased data collection has been seen over time [31]. Improved diagnostic accuracy of ICD codes may be a result of this changing paradigm.

This study has limitations, largely related to the nature of claims data. Chief among them is the lack of racial data. While this demographic variable is not present in the formulation of eGFR used here, racial disparities are commonplace in medicine, and these results may be subject to this phenomenon [16]. Additionally, these data are derived from privately insured patients in the United states and may not be reflective of patient experiences or caregiver practices with respect to ICD coding in other countries.

Conclusion

The study presented here has utilized claims data from patients followed from 2016 to 2021, and it demonstrates that coding accuracy has not improved substantially since adoption of the ICD-10 coding standards in the context of CKD. There remains a gulf between clinically derived diagnostic procedures and attempts at ICD-based diagnosis. Consequentially, clinical markers remain the only viable tool for identifying CKD patients, rapidly progressing or otherwise. Future work may include attempts to utilize multiple codes in concert to increase diagnostic accuracy.