Background

The Charlson comorbidity index (CCI) is a scoring system to classify or assign weights to comorbid conditions. The index was initially developed in a small cohort of patients for predicting one-year mortality and tested in another cohort during a 10-year follow-up period [1]. After years of clinical practice, CCI not only facilitated prediction of short- and long-term mortality but could also be utilized to measure disease burden in multiple clinical settings [2, 3].

CCI involves 19 comorbidities, which can be extracted from clinical diagnoses or the corresponding International Classification of Disease (ICD) codes. Compared to the considerable work involved in one-to-one calculations based on diagnosis, CCI can be automatically and quickly calculated using the ICD code [4]. Accordingly, ICD-based CCI is widely used. The most extensively applied version is ICD-10 published in 1993. However, CCI assessed using the ICD-10 code does not completely match that from clinical diagnosis. Accurate reclassification of clinical diagnoses that do not match the ICD-10 code requires the professional clinical knowledge of coders and occasionally clinicians [5]. Due to disagreements between diagnosis and ICD-10 code-based methods, ICD-10 generalization involves long time-periods. National administrative departments in developed countries, such as the Department of Health and Human Services in the United States, are in charge of adaptations of ICD modifications and updates to ensure concordance with diagnosis [6], (https://www.cdc.gov/nchs/data/icd/10cmguidelines_2017_final.pdf). ICD has also been widely applied in developing countries [7,8,9] but its use in these cases is non-standard. China officially started to use ICD-10 in 2002 and attempted to promote a 6-digit extension code of ICD-10 in 2012. As the world’s largest developing country, China should provide valuable information for the effective implementation of ICD. Previous studies in China disclosed a 4.79–73.08% error rate of coding [10]. Considering the overall heterogenicity and relatively poor coding quality in China, the feasibility of coding-based CCI should be investigated.

The main objective of present study was to ascertain the utility of coding-based CCI through comparison with diagnosis-based CCI.

Methods

Study design and data sources

A multi-center, population-based, retrospective observational study was conducted, using the phase 1 dataset of the China Collaborative Study on Acute Kidney Injury, which contains all the literal discharge diagnoses with relative ICD-10 codes and in-hospital death records. This multicenter retrospective observational study was designed to identify novel risk factors of acute kidney injury. The registration number in clinicaltrials.gov is NCT03061786. The study protocol complied with the Declaration of Helsinki and was approved by the Ethics Research Committees of Guangdong General Hospital (GDREC2016327H).

The phase 1 dataset included 3,616,478 adult (18 years or older) admissions in 15 hospitals from January 2012 to December 2016 across 9 provinces in China (Guangdong, Sichuan, Zhejiang, Anhui, Jilin, Shanghai, Chongqing, Inner Mongolia and Xinjiang). Twelve of these were tertiary hospitals and the remaining three were secondary hospitals (Supplementary Table 1). The hospital names were anonymized in reports owing to privacy considerations. The exclusions criteria were as follows: 1) missing or abnormal data (including data of age, hospitalization stay or medical cost); 2) younger than 18 years old; 3) repeated hospitalization (Fig. 1).

Fig. 1
figure 1

Flow chart of the selected study population

CCI calculation

CCI was calculated using both ICD-10-based and diagnosis-based methods. ICD-10-based CCI was assessed according to the transformation rule reported in previous studies (Supplementary Table 2) [11,12,13] while diagnosis-based CCI was calculated based on the literal description from discharge diagnosis, regarded as the “gold standard”. Calculations were independently performed by two trained physicians. In cases where the calculations were inconsistent, final classification was made by the research group.

Statistical analysis

Data with normal distribution are presented as means ± SD and data with non-normal distribution as median values (25th or 75th percentile). Differences between two groups were determined using the independent-samples t-test or Mann–Whitney U test, as appropriate. Numerical data were evaluated as proportions. Percentage agreement and κ statistic were calculated to evaluate the degree of agreement between ICD-based and diagnosis-based CCI. The κ coefficient of variation (SD/mean × 100%) was applied as a measure of agreement variations among hospitals, with κ coefficient <  0.75 defined as poor agreement. Discrimination abilities of the methods were compared based on the area under the curve of receiver of operating characteristic (AUC of ROC) using R software (Version 1.0.153). Other statistical analyses were undertaken using SPSS version 24.0 (IBM, Armonk, NY, USA). Two-tailed P <  0.05 was considered statistically significant.

Results

Clinical characteristics of subjects

A total of 2,464,395 subjects were included. Median of the comorbidity number was 1 and ranged from 0 to 10 according to diagnosis-based CCI. The characteristics of the subjects are presented in Table 1.

Table 1 Demographic and clinical characteristics

Comorbidity distributions

According to discharge diagnoses, the comorbidity frequencies of CCI (from high to low) were as follows: cerebrovascular disease, tumor, mild liver disease, diabetes without chronic complication, congestive heart failure, chronic pulmonary disease, peripheral vascular disease, renal disease, metastatic solid tumor, diabetes with chronic complication, myocardial infarction, rheumatologic disease, peptic ulcer disease and hemiplegia (Supplementary Table 3). The other six rare comorbidities with < 1% incidence were lymphoma, moderate or severe liver disease, leukemia, dementia, hemiplegia, and acquired immune deficiency syndrome (AIDS) (Supplementary Table 3).

Disagreement between ICD-based and diagnosis-based CCI

Total agreement between ICD-based and diagnosis-based CCI for each index ranged from 86.1% (κ = 0.210, 95% CI 0.208–0.212) to 100% (κ = 0.932, 95% CI 0.924–0.940) (Table 2). None of the 19 indices had a κ coefficient > 0.75 in all the hospitals examined (Fig. 2). Typically, for all 15 hospitals, low κ coefficients < 0.75 for peripheral vascular disease were obtained, comparable to 13 hospitals for moderate or severe liver disease and 9 hospitals for mild liver disease (Fig. 2).

Table 2 Correlation coefficient and κ statistic between ICD-based and diagnosis-based CCI
Fig. 2
figure 2

Agreement between the ICD-based and diagnosis-based CCI for each index. The red horizontal line denotes a κ coefficient of 0.75. The Y-axis values denote κ coefficient, which is used as a measure of agreement variation. The red horizontal line denotes a κ coefficient of 0.75

Discrimination ability of ICD-based and diagnosis-based CCI for in-hospital death

We further compared discrimination ability of the two methods with regard to in-hospital mortality of ICD-based and diagnosis-based CCI by calculating AUC of ROC. AUCs of ICD-based CCI ranged from 0.556 (95% CI 0.516, 0.596) to 0.844 (95% CI 0.819, 0.868) and those of diagnosis-based CCI from 0.585 (95% CI 0.562, 0.608) to 0.849 (95% CI 0.817, 0.865). Total AUC was significantly lower for ICD-based CCI relative to diagnosis-based CCI [0.735 (0.732, 0.739) vs 0.760 (0.757, 0.764), P <  0.001] in all 15 hospitals (Fig. 3) as well as AUC values from10 individual hospitals (supplementary Table 4). In two hospitals, AUC values for ICD-based CCI were similar to those for diagnosis-based CCI [0.843 (0.819, 0.868) vs 0.849 (0.817, 0.865), P = 0.625; 0.713(0.700, 0.725) vs 0.718 (0.705, 0.730), P = 0.234]. AUC in one of the above hospitals was also the highest for CCI based on both methods while in three other hospitals, AUCs for ICD-based CCI were higher than those for diagnosis-based CCI [0.739 (0.716, 0.761) vs 0.717 (0.694, 0.740), P = 0.011; 0.603 (0.582, 0.625) vs 0.585 (0.562, 0.608), P = 0.013; 0.670 (0.652, 0.689) vs 0.657 (0.638, 0.675), P <  0.001]. The relatively low AUC values in these three hospitals are indicative of limited value of any type of CCI (supplementary Table 4).

Fig. 3
figure 3

Discriminatory ability of ICD-based and diagnosis-based CCI for in-hospital mortality

Discussion

This hospitalized population-based study revealed significant differences in intra-hospital comorbidity distributions [14]. ICD-based CCI did not match corresponding diagnosis-based CCI, particularly for peripheral vascular and liver diseases. None of the 19 indices showed satisfactory agreement (κ coefficient > 0.75) in any of the 15 hospitals examined, reflecting frequent discrepancies. Though the κ coefficient were generally higher than Januel et al. reported in 2003 [15]. Furthermore, ICD-based CCI was associated with lower AUC of ROC for in-hospital mortality than diagnosis-based CCI, indicative of a diminished discrimination performance, consistent with earlier studies [16, 17].

Several factors may contribute to the poor performance of ICD-based CCI, the most important being variable intra-hospital coding qualities. Distinct from American hospitals in which a national standard of ICD-Clinical Modification is adopted, Chinese hospitals modify ICD coding at the individual hospital level. Experienced coding personnel are particularly scarce in China and most are not fully trained [10]. Second, inputted Chinese diagnosis-based ICD coding does not match in a one-to-one manner in some cases, leading to inaccurate classification or even missing an ICD code [18]. Third, the qualities of ICD coding and recording are not comprehensively evaluated. Thus, in hospitals without a qualified coding system, direct application of ICD-based CCI should be avoided.

In addition to the implication of lower discrimination performance of ICD-based CCI, its convenience merits consideration. Notably, in a few hospitals (for example, hospital No. 15), ICD-based CCI displayed discriminative value for in-hospital mortality comparable to that of diagnosis-based CCI. Based on our results, we recommend that in hospitals with or without a qualified coding system, physicians and researchers should be aware of the limitations of CCI involving indices and acknowledge the potential errors of direct adoption of ICD-based CCI. Further validation of these indices is advocated, and standardization of ICD-10 coding remains an urgent task. In the future, national standards, specialized training and transformation software should be implemented to improve the reliability of ICD-based CCI along with the progress of hospital information management.

The large sample size including more than 3 million patients is a major strength of this study. Data were derived from hospital populations and both tertiary and secondary hospitals were included, thus minimizing selection bias. In addition, the hospitals included for study were distributed across various geographical and economic regions in China. Our experiences may therefore be applicable to other developing countries.

Conclusion

In conclusion, ICD-10 coding-based CCI does not concur with diagnosis-based CCI and is therefore not a promising technique for CCI scoring in China under the present circumstances.