Numerous studies have shown that the quality of healthcare is variable and often inadequate [13]. Initiatives to measure healthcare quality are an important focus for policymakers who believe that such measurements can drive quality-improvement programs [4]. The measurement of healthcare quality includes process and outcome measurements [5]. Outcome evaluation, including in-hospital mortality, requires adequate risk-adjustment for different patient mixes to make appropriate evaluations of healthcare performance [6]. Because of the clear definition of outcome and influential patient conditions, disease-specific risk adjustment models have been developed to a certain extent in several specialties (e.g. cardiovascular diseases) and have been available for various quality improvement activities [710].

Although disease-specific risk adjustment may be useful for quality improvement of a specific type of care, more generic case-mix risk-standardized outcomes are required for generalized quality evaluation across specialties [11]. In the United States, several generic case-mix measures are available in commercial as well as non-commercial sources (e.g. APACHE, MedisGroup, Adjusted Clinical Group, Diagnostic Cost Groups, and the RxRisk model) [1214], and have been applied to categorizing patients according to resource needs. However, many of these systems require detailed clinical and/or administrative data that involve extensive data collection. Furthermore, most of these case-mix measures target healthcare costs rather than clinical outcomes.

To alleviate the burden of data collection, risk prediction models for in-hospital mortality using administrative data have been proposed [15, 2729]. One study used a modified version of the Charlson Index [16] as a summary score of co-existing diagnoses. A recent international comparative study [17] demonstrated that the estimated comorbidity index could predict the chance of in-hospital death with relatively high precision (c-index of approximately 0.80), although the accuracy was suboptimal when Japanese data were analyzed. In this study, we developed a new prediction model for in-hospital mortality by using the same electronic dataset with national standardized format used in the aforementioned study. We successfully exceeded previously demonstrated predictive precision by including patient demographics and multiple administrative variables. Our study demonstrates a potential use of the developed prediction model for benchmarking the quality of healthcare across various performance units with the national database.


Data source

We used a dataset provided by the Ministry of Health, Labor, and Welfare that was originally used to evaluate a patient classification system newly introduced to 80 university affiliated hospitals and 2 national center hospitals for reimbursement since 2003. The new classification system, called Diagnosis Procedure Combination (DPC), includes information regarding up to two major diagnoses and up to six co-existing diagnoses. The 2003 version of the DPC patient classification system includes 16 major diagnosis categories (MDC) and 575 disease subcategories which are coded in ICD-10 format. The dataset also included additional information on patient demographics, use and types of surgical procedures, emergency/elective hospitalization, length of stay (LOS), and discharge status including in-hospital death [1820]. The dataset originally included information derived from hospital administrative and clinical information provided by participating hospitals to the Ministry research group, then was made anonymous and fed back to the hospitals for benchmarking purposes. Records for 282,064 patients who were discharged from 82 hospitals between July 1, 2002 and October 31, 2002 were distributed and made available for public use as of June 2008. Following the inclusion criteria of previous studies on Hospital Standardized Mortality Ratio (HSMR) [21, 22], we excluded MDC categories with mortality rates of less than 0.5% from our analysis. The data (n = 224,207) were then randomly assigned further into two subsets that were split 80/20, one for model development and the other for validation tests. The development dataset included 179,156 records and the validation dataset included 45,051 records. The datasets were made anonymous and prepared by the government sector for public use. Thus, data use was officially approved and protection of confidential information is ensured.

Model building

We started with a prediction model by referring to the Canadian model of HSMR as mentioned earlier [21, 22]. The model includes age as the ordinal variable (under 60, 60–69, 70–79, 80–89, and 90 and over), gender, use of an ambulance at admission, admission status (emergency/elective), LOS, MDC, and comorbidities (model 1). We also tested another prediction model which omitted LOS (model 2). The rationale is that the model without LOS should be a "pure" prediction model since LOS can be regarded as an outcome affected by patient characteristics and hospital care quality. Several diagnosis-specific models also consider the duration of hospitalization as a part of outcome and do not include it as a predictor variable [23, 24]. Based on Quan's methodology [15], the ICD-10 code of each co-existing diagnosis was converted into a score, and was summed up for each patient case to calculate a Charlson Comorbidity Index score. Scores were then classified into five categories: 0, 1–2, 3–6, 7–12, and 13 and over.

We did not include surgical treatment status as a risk parameter because the decision of whether or not to operate on a patient with a certain medical condition would vary and depend on the clinical judgment of each hospital team. Also, surgery is not a treatment option in certain areas of medicine.

Analytical Methods

A multivariate logistic regression analysis was performed to predict in-hospital mortality by using the development dataset. Tests of model performance and model fitness were conducted using the test dataset. The prediction accuracy of the logistic models was determined using the c-index [25], and the c-index of the full (models 1 and 2) and partial models were compared. A c-index value of 0.5 indicates that the model is no better than random chance in predicting death, and a value of 1.0 suggests perfect discrimination. The models were calibrated by plotting observed versus predicted deaths based on risk. All analyses were conducted with SPSS version 15.0J (SPSS Japan, Inc).


Patient Demographics in the Models

Table 1 shows in-hospital mortality by MDCs in the original full dataset. We excluded 6 out of 15 diagnostic categories due to low mortality rates (< 0.5%). The 9 remaining diagnostic categories (n = 224,207) accounted for almost 99% of in-hospital mortality in total acute hospitalization cases. We further grouped 4 MDCs with lowest mortality into one, resulting in 6 MDCs for the following analysis.

Table 1 Discharge mortality rate in each Major Diagnostic Categories (n = 282064)

Of the 179,156 patients included in the development dataset, 53.2% were male, 35.9% had emergency status at admission, and 8.9% used an ambulance (Table 2). Nearly half (46.6%) of the patients were under 60 years of age at admission, and 9.2% were 80 years or over. The digestive system, hepatobiliary system, and pancreas made up the largest share (22%) of MDCs, followed by the respiratory system (13.5%), circulatory system (13.1%), and nervous system (7.2%). The majority of patients (68.6%) had a total score of 0 for the Charlson Comorbidity Index, and only 2.5% of patients had a score higher than 6.

Table 2 Characteristics of patients in learning dataset and test dataset (n = 224207)

Prediction Models (development dataset; n = 179,156)

Table 3 shows the in-hospital mortality prediction model with LOS as a predictor (Model 1). Using those with a LOS under 10 days as a reference, the odds ratio of in-hospital death for patients with longer LOS increased linearly; the odds ratio for patients with LOS ≥ 30 days reached 4.35 (4.01–4.72). Using the neurological MDC as a reference, MDCs for respiratory, digestive, hepatology, and hematology diseases showed a significantly higher odds ratio for in-hospital death, whereas the cardiology MDC showed a significantly lower odds ratio. Older age, gender, use of an ambulance at admission, and emergency admission status also showed significantly higher odds ratios. Finally, scores for Charlson Index categories exhibited an increasing linear trend in odds ratio as scores increased.

Table 3 MODEL1 Hospital mortality prediction model with length of stay (n = 179156)

Table 4 shows the prediction model without LOS (model 2). The overall statistical significance of odds ratios was completely identical to that of model 1, although the magnitude was somewhat smaller for MDCs and larger for Charlson Index categories.

Table 4 MODEL2 Hospital mortality prediction model without length of stay (n = 179156)

Model Performance (test dataset; n = 45,051)

Table 2 compares patient characteristics in the test dataset (n = 45,051 patients) to those of the development dataset. The two datasets were almost identical in the distribution of patient characteristics and case mix. In-hospital mortality rates were 2.68% and 2.76% for the development and test datasets, respectively.

Table 5 shows the c-indexes for models 1 and 2, and those using a partial set of predictors. C-index values were fairly high in both models (0.841 and 0.869 for models 1 and 2, respectively). A partial model which only included patient characteristics had a c-index of 0.727, and the addition of MDC increased the c-index to 0.786. Further including the comorbidity index resulted in only a marginal increase to 0.841. The model that included more information on comorbidities showed a higher c-index. Figures 1 and 2 demonstrate the goodness of fit regarding the models (i.e., how well the predicted mortality rates match the observed mortality rates among patient subgroups of risk). Close agreement between the predicted and observed mortality rates with our models was seen across various patient risk subgroups analyzed.

Table 5 Hospital mortality prediction model performance metrics
Figure 1
figure 1

Model1 hospital mortality prediction model calibration (n = 45051). * Figure 1 shows the result of the goodness of fit test regarding the model 1 based on test dataset (n = 45051).

Figure 2
figure 2

Model2 hospital mortality prediction model calibration(n = 45051). * Figure 2 shows the result of the goodness of fit test regarding the model 2 based on test dataset (n = 45051).


The prediction model of in-hospital mortality developed in this study is fairly consistent with observed mortality. Results also suggest that inclusion of both comorbidity and other demographic/clinical characteristics of patients account for the better performance of our model compared to a previously described model [17]. When administrative data are used in clinical outcomes research, algorithms to code comorbidities are essential for defining comorbidities. Charlson comorbidity measurement tools [16] are widely used with administrative data to determine the burden of the disease or case-mix. Past studies suggest that the original Charlson Index by chart review and its adaptations for use with administrative databases discriminate mortality similarly [15, 17]. The database used in this study assigns to each patient one to six diagnostic codes. Counting multiple comorbidities markedly enhanced accuracy compared to counting comorbidity based on a single ICD-10 code. In addition to comorbidities based on ICD-10 codes, MDCs were also incorporated into our models. By including MDCs, our model could better reflect the characteristics of major patient conditions among all co-existing diagnoses. This may also help to explain the improved performance of our model compared to former prediction models (c-index: 0.69–0.71) which incorporated only the Charlson Index in the analysis of Japanese data [17].

Recent studies in the U.S. introduced a new risk prediction model that includes extended administrative data with lab test results [30, 31]. Although the inclusion of detailed clinical data may further improve prediction performance, it requires a sophisticated standardized information system on a nationwide scale. Our prediction model exhibited a comparable level of precision, using variables easily accessible in conventional administrative electronic record systems. As we demonstrated, inclusion of patient demographics, conditions at admission, and the category of major diagnosis with a summary score of comorbidities may be useful and efficient in improving model performance.

In the present study, we developed two models that include and exclude LOS. It is possible that a hospital may promote premature discharge in order to lower in-hospital mortality, thereby adjusting for LOS to allow for a fair comparison of hospital performance. However, the duration of hospitalization is a parameter reflecting various factors other than in-hospital mortality risk, such as the quality of hospital management and socio-economic conditions that facilitate earlier discharge (e.g. availability of informal care at home). Since no major difference in accuracy was observed between the two models, we believe that the use of model 1, which excludes LOS, would be more suitable to adjust for the likelihood of in-hospital death purely due to patient conditions.

In contrast to the risk factor of age, gender did not have a pronounced impact on mortality in our study. Previous studies on cardiovascular surgery in Japan have also shown that the impact of gender on in-hospital mortality is negligible even in risk prediction models with detailed clinical variables [9]. The odds ratio of the circulatory system category was unexpectedly low and may require some explanation. The average risk of cardiovascular hospitalization may have been relatively low in this study because many patients are hospitalized for cardiac catheterization as a post-intervention evaluation in Japan. Thus, an alternative model that categorizes hospitalization for evaluation separately may increase performance in Japanese cases and deserves further consideration in future studies.

A number of limitations of this study are worth noting. Exclusion of 6 low mortality MDCs might bias the performance of our models. Given the c-index for model 2 (n = 282,064) was 0.854, we believe that our model can be useful for hospital mortality analysis in all types of disease. Nevertheless, it would be necessary to update the hospital prediction model periodically, given that the relative importance of factors contributing to mortality may change due to future medical innovations in diagnosis and therapy.


This study is one of the few Japanese studies that verifies and demonstrates the accuracy of in-hospital mortality prediction models that take into account all diseases. As standardized hospital mortality rates could be used as indicators of quality of care and in setting national standards, risk adjustment in relation to in-hospital mortality is thought to be useful in implementing hospital-based efforts aimed at improving the quality of medical treatment[26]. The risk model described in this study demonstrates a good degree of discrimination and calibration. In addition to its statistical evaluation, it is important that the model can be readily used for risk prediction by clinicians in the field. A major task for the future is to consider how to improve this model in order to make it more detailed, its analytical qualities even more convincing, and its use more compelling.