Introduction

Diabetes mellitus, a common metabolic disease characterized by chronic hyperglycemia, endangers human health [1, 2]. The 10th edition of the Diabetes Map reported that approximately 537 million adults worldwide would suffer from diabetes by 2021. By 2030, this number is expected to rise to 643 million, and by 2045, it will further rise to 783 million. Currently, there are the largest number of people with diabetes in China, and the number is gradually climbing year by year [3]. Among all diabetes diseases, type-2 diabetes mellitus (T2DM) has the highest prevalence, accounting for more than 90% of all [4, 5].

T2DM is not only a serious threat to people's health but also leads to a sharp increase in medical expenses, causing a heavy economic burden to patients and society [6, 7]. According to a report released by the International Diabetes alliance, the global medical expenses for treating diabetic patients reached US $850 billion in 2017, accounting for 12.5% of the global health and medical expenses. Among them, the medical expenses for diabetes in China reached US $110 billion, ranking second in the world after the United States (US $348 billion) [3]. Prolonged LOS in hospitals has been identified as one of the main reasons for the increased expenses [8,9,10]. Previous studies have studied the factors associated with prolonged LOS for several common diseases [11,12,13,14], and risk factors such as age, nutritional status, body mass index, underlying diseases (e.g. hypertension, arrhythmias, and chronic pulmonary disease), and common laboratory measures (e.g. prothrombin time, neutrophils, and ejection fraction) have been reported. However, there are few studies on prolonged LOS of inpatients with T2DM, and the relevant studies had used only a small number of participants and did not evaluate a wide variety of clinical factors.

Therefore, in this study, we recruited a large, multicenter patient population with T2DM. We aimed to: (i) identifying novel risk predictors of prolonged LOS of patients with T2DM, with the hope of remediating and preventing unwanted prolonged LOS, and (ii) constructing an online predictive service to provide clinicians with a simple and precise personalized prediction function of LOS of inpatients.

Methods

Study design and patients

In this study, the clinical data containing 120,073 patients with T2DM were collected from six hospitals in China, of which 83,776 patients passed quality control for the final analysis, according to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guideline [15]. A total of 2224 patients were recruited from Yongchuan Hospital of Chongqing Medical University; 34,888 from the Second Affiliated Hospital of Chongqing Medical University, 14,767 from the University Town Hospital of Chongqing Medical University, and 20,100 from the Third Affiliated Hospital of Chongqing Medical University. Patients from these four centers were divided randomly (7:3) into training (n = 50,385) and internal validation (n = 21,594) sets. Patients recruited from Chongqing Southeast Hospital (n = 7018) and People's Hospital of Tongjiang District (n = 4779) were used as external validation set 1 and external validation set 2, respectively. Additional file 1: Table S1. displays the data from the six hospitals.

This study was approved by the Ethics Committee of the Affiliated Banan Hospital of Chongqing Medical University and was conducted in accordance with the Declaration of Helsinki and Good Clinical Practice guidelines. Informed consent for participation was not required for this study owing to its retrospective design, and the study was conducted in accordance with the national legislation and institutional requirements.

Inclusion and exclusion criteria

The inclusion criteria includes: (i) data obtained from 2010 to 2022, and (ii) hospitalization(s) for T2DM. The exclusion criteria includes: (i) age < 18 years, (ii) death during hospitalization, (iii) discharge against medical advice, and (iv) patients with > 30% missing data. The selection process is illustrated in Additional file 1: Fig. S1.

Data collection

A total of 31 candidate variables were selected to identify a prolonged LOS. Specifically, we explored age, sex, insurance type, age-adjusted charlson comorbidity index score [16], hypertension, coronary heart disease, cerebral infarction, hyperlipidemia, antihypertensive drug use, statin use, antiplatelet and anticoagulant use, past surgical history (PSH), past medical history(PMH), smoking, drinking, systolic and diastolic blood pressure, pulse, aspartate aminotransferase, alanine aminotransferase, triglycerides, neutrophil-to-lymphocyte ratio, platelet-to-lymphocyte ratio, lymphocyte-to-monocyte ratio, neutrophil percentage-to-albumin ratio (NPAR), creatinine, uric acid, low- and high-density lipoprotein cholesterol, fasting glucose, and glomerular filtration rate. All biomarkers only retain the values measured for the first time after admission.

Definition

LOS was calculated from the date of admission to the date of checkout. A LOS exceeding the third quartile value of the study population was defined as prolonged LOS [17,18,19,20,21,22]. Specifically, hospitalization < 13 days was defined as normal LOS; whereas, hospitalization ≥ 14 days was defined as prolonged LOS.

The type of insurance chosen by individuals reflects the economic burden borne by the individuals and families. In this study, insurance type can be grouped to: urban employee medical insurance, urban resident medical insurance, other insurance, and full self-payment [23].

Statistical analyses

Statistical analyses were performed using SPSS 22.0 and R (version 4.0.2, Vienna, Austria). Least absolute shrinkage and selection operator regression model and multivariable logistic regression analysis were used to identify the independent factors [24]. A nomogram was constructed based on these independent factors, and area under the receiver operating characteristic curve was used to evaluate the discriminative ability of the nomogram [25]. Calibration curves were used to evaluate calibration of the nomogram [26]. Furthermore, decision curve analysis and clinical impact curve were used to demonstrate the clinical applicability of the nomogram [27,28,29]. The multiple imputation method was used to fill in missing continuous variables [30, 31]. In each simulation dataset, missing data will be filled by Monte Carlo method. At this point, the standard statistical method can be applied to each simulated dataset, and the estimated results and the confidence interval when missing values are introduced are given by combining the output results. The enumerative data were expressed as rates and percentages, and the chi-square test was used for comparisons between groups. The quantitative variables did not show a normal distribution and were represented by median and interquartile range [M(Q25–Q75)]. Comparisons between groups were performed using the Mann–Whitney U test. All statistical analyses were two-sided, and statistical significance was set at P < 0.05.

Results

Subject characteristics

The clinical characteristics of the subjects in the training and internal validation sets were summarized in Table 1. No significant difference was found between the training and internal validation sets in terms of age, sex, insurance type, hypertension, coronary heart disease, cerebral infarction, hyperlipidemia, statin use, antiplatelet and anticoagulant use, PSH, PMH, smoking, drinking, or all laboratory variables (P > 0.05).

Table 1 Demographic and clinical characteristics of the training and internal validation sets

Selection of predictors

Patients from the training set were stratified into one of the two groups according to LOS: normal LOS (n = 38,026) and prolonged LOS (n = 12,359). Univariate analysis identified that the following variables were associated with prolonged LOS: age, sex, insurance type, age-adjusted charlson comorbidity index score, hypertension, coronary heart disease, cerebral infarction, antihypertensive drug use, statin use, antiplatelet and anticoagulant use, PSH, PMH, smoking, drinking, systolic and diastolic blood pressure, aspartate aminotransferase, triglycerides, neutrophil-to-lymphocyte ratio, platelet-to-lymphocyte ratio, lymphocyte-to-monocyte ratio, NPAR, creatinine, uric acid, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, fasting glucose, and glomerular filtration rate (Table 2).

Table 2 Demographic and clinical characteristics associated with a prolonged LOS as assessed in the training set

First, 27 variables with statistical differences in the univariate analysis were included in the least absolute shrinkage and selection operator regression. Then, 17 non-zero coefficient variables were selected at minMSE + 1SE (Fig. 1). After multivariable logistic regression, age (OR [odds ratio] = 1.010, 95% CI [confidence intervals] 1.008–1.012), cerebral infarction (OR = 1.632, 95% CI 1.530–1.741), antihypertensive drug use (OR = 2.340, 95% CI 2.224–2.462), antiplatelet and anticoagulant use (OR = 2.100, 95% CI 1.998–2.208), PSH (OR = 1.569, 95% CI 1.492–1.651), PMH (OR = 4.021, 95% CI 3.635–4.456), smoking (OR = 1.392, 95% CI 1.308–1.481), drinking (OR = 1.663, 95% CI 1.559–1.774) and NPAR (OR = 1.110, 95% CI: 1.104–1.116) were identified as final predictors of prolonged LOS (Fig. 2).

Fig. 1
figure 1

Features selection by LASSO. A LASSO coefficients profiles (y-axis) of the 27 features. The upper x-axis is the average numbers of predictors and the lower x-axis is the log(λ). B Tenfold cross-validation for tuning parameter selection in the LASSO model

Fig. 2
figure 2

Forest plot showing the results of multivariable analysis for prolonged LOS

Nomogram construction and performance

To visualize the predictive model, a nomogram was constructed, thereby providing a convenient and personalized tool to predict the probability of prolonged LOS (Fig. 3). In the training set, the area under the curve of the nomogram was 0.803 (95% CI 0.799–0.808), indicating a good discrimination (Fig. 4), and the calibration curve (bootstraps = 1000) suggested that the predicted probabilities were in good agreement with the actual probabilities (Fig. 5). The area under the curves of the internal validation set, external validation set 1 and external validation set 2 were 0.794 (95% CI 0.788–0.800), 0.754 (95% CI 0.739–0.770), and 0.743 (95% CI 0.722–0.763), respectively. Besides, the calibration curves indicated a good agreement (Additional file 1: Fig. S2–S4). More detailed performance metrics for the four models are listed in Table 3.

Fig. 3
figure 3

Nomogram predicting Prolonged LOS in patients with T2DM. First, a point was found for each variable of a T2DM patient on the uppermost rule; then all scores were added together and the total number of points were collected. Finally, the corresponding predicted probability of Prolonged LOS was found on the lowest rule

Fig. 4
figure 4

Receiver operating characteristics curves of the nomogram

Fig. 5
figure 5

Calibration curves of the nomogram. The dotted line represents the performance of the nomogram, whereas the solid green line corrects for any bias in the nomogram. The dashed line represents the reference line where an ideal nomogram would lie

Table 3 Detailed performance metrics of the four models

Clinical utility of the nomogram

Decision curve analysis was performed to assess clinical applicability. The decision curve showed that, based on the nomogram in this study, the threshold probability of prolonged LOS in patients with T2DM was of 17–55% (Fig. 6), and application of this nomogram to predict prolonged LOS would add significantly more benefit than either the treat-all scheme or the treat-none scheme. Moreover, the clinical impact curve was further drawn according to the decision curve analysis to evaluate the clinical impact of the nomogram and thereby intuitively understand its substantive value (Fig. 7). The clinical impact curve depicted the estimated number of patients expected to reach prolonged LOS at each risk threshold and the number of patients experiencing prolonged LOS. When the risk threshold exceeded 30%, the estimated number of patients was close to the actual number of patients.

Fig. 6
figure 6

Decision curve analysis of the nomogram. Solid black line (None curve) net benefit of not investigating any people, assuming that no people would have a prolonged LOS, solid gray line (All curve) net benefit of investigating all people, assuming that all people would have a prolonged LOS

Fig. 7
figure 7

Clinical impact curve of the nomogram. The red curve shows the number of people who are classified as prolonged LOS by our model under each threshold probability; The green curve represents the number of people who are really prolonged LOS at each threshold probability

Construction of online interface to easily access the nomogram

To facilitate the use of the nomogram by clinicians, we built an online interface (https://cytjt007.shinyapps.io/prolonged_los/) to calculate the probability of prolonged LOS. For example, when a patient has cerebral infarction, antihypertensive drug use, antiplatelet and anticoagulant use, PSH, PMH, smoking, drinking, an age of 60 years, and NPAR level of 26.00 ml/g, the probability of prolonged LOS would be 0.864 (95% CI 0.853–0.874) (Fig. 8).

Fig. 8
figure 8

An example of nomogram to predicting prolonged LOS in patients with T2DM via a link

Discussion

In this study, we found that the combination of cerebral infraction, antihypertensive drug use, antiplatelet and anticoagulant use, PSH, PMH, smoking, drinking, older age, and higher NPAR levels were closely related to prolonged LOS in patients with T2DM. The combination of these variables in a prediction model formed a useful clinical tool for identifying patients with possible prolonged clinical course during hospitalization. Early identification of patients with T2DM with prolonged LOS risk can provide clinicians with the opportunity to formulate personalized treatment measures, thereby shortening the hospitalization period of these patients and reducing the disease burden of patients.

The result showed that age (OR = 1.010, 95% CI 1.008–1.012) was an independent risk factor for the prolongation of hospitalization in patients with T2DM, which was consistent with the findings of exploring the factors of prolonged LOS in other populations [32,33,34]. However, to a large extent, age reflects the process of aging and degenerative changes in organ function; whereas, older patients with T2DM tended to be with more diseases [35, 36], which was also confirmed in this study. The result in this study showed that cerebral infarction (OR = 1.632, 95% CI 1.530–1.741) was also an independent risk factor for prolonged LOS in patients with T2DM. Therefore, the intensity of the medical intervention received by older patients and their response, to some extent, led to the prolonged LOS.

NPAR was a recently discovered viable biomarker of systemic infection and inflammation [37]. NPAR has been proven to be a prognostic factor for pancreatic cancer, sepsis, acute ST-segment elevated myocardial infarction, and restenosis after internal carotid artery stenting [38,39,40,41]. It was well known that neutrophils were classical inflammatory cells and played an important role in mediating inflammatory response [42]. Serum albumin could exert anti-inflammatory effects by specifically inhibiting the expression of adhesion molecules [43]. Owing to its simple calculation and economic efficiency, the NPAR could be implemented even in areas with backward medical conditions [44]. More importantly, as a combination of two classical laboratory indicators, NPAR could reflect the balance between anti-inflammatory and pro-inflammatory effects in vivo to a certain extent [45]. NPAR amplified the predictive value of neutrophil percentage and serum albumin, which were often ignored by clinicians, particularly when they did not deviate significantly from the normal range. The result in this study also showed that patients T2DM with higher NPAR levels were 1.110 times more likely to be hospitalized than patients T2DM with lower NPAR levels. Therefore, attention should be paid to hospitalized patients with T2DM with high NPAR levels to avoid prolonged LOS and to improve prognosis through early active and effective treatment.

Besides, PSH, PMH, smoking and drinking have been proven to be risk factors for prolonged LOS, which was consistent with the result in this study [46,47,48,49]. Interestingly, the result also showed that antihypertensive drug use (irbesartan, furosemide, nifedipine, and others) and antiplatelet and anticoagulant use (enteric-coated aspirin tablets, dabigatran etexilate capsules, and others) correlated with prolonged LOS in patients with T2DM. It could be interpreted that the underlying conditions associated with the use of drugs correlate with prolonged LOS. However, further research is needed to confirm that the effects of certain drugs contribute to prolonged LOS.

Studies regarding the prediction models of prolonged LOS for patients with T2DM are limited. The advantage of the nomogram constructed in this study is that all predictors can be easily obtained from the patients. Moreover, to simplify the clinical application of this model, we built an operation interface on the web page, which will facilitate informed decision making regarding prophylactic treatments for prolonged LOS. However, this study has several limitations. First, this was a retrospective study and sample selection bias was inevitable. However, a multicenter and relatively large training set was used to build the models, which were further subjected to external validation. Second, some potential influencing factors were ignored because of the high percentage of missing data. The addition of these factors could improve the prediction efficiency of the model. Therefore, prospective studies with more detailed data and larger sample sizes are needed to verify or update the findings.

Conclusion

T2DM is one of the main causes of the incidence and mortality of cardiovascular diseases, leading to an increased disease burden in many countries. This study provides us a useful clinical tool to identify possible prolonged LOS patients with T2DM. The proposed model could help clinicians formulate personalized interventions and shorten the hospitalization period, thus reducing the disease burden for patients and society.