Introduction

With the development of global economy, many countries, including China, are facing serious aging problems [1]. Physical disabilities seem to be inevitable in middle-aged and elderly people as they age. A survey study showed that the overall disability rate of activities of daily living (ADLs) among middle-aged and elderly people in China was 23.8%, and the overall disability rate of organic ADLs was 35.4% [2]. And depression is very common among the middle-aged and elderly population, which is the most important psychological problem of middle-aged and elderly people [3]. In China, the prevalence of depression in the middle-aged and elderly population over 45 years old is more than 30% [4]. By 2023, depression will rank first in the global burden of disease [5], bringing a serious economic burden to patients, families, and society. At the same time, large-scale population-based research studies have shown that middle-aged and older adults with disabilities, especially women, have higher rates of depression than non-disabled older adults [67]. The Peruvian National Health Survey showed that middle-aged and older persons with disabilities tend to suffer from more severe depression [8]. This suggests that depression is prevalent and severe in the middle-aged and older physically disabled population. However, most studies on depression in people with disabilities suffer from the problem of including too few correlates or failing to quantify the risk of correlates in depression [910]. Also, relevant studies are unable to screen potentially at-risk patients for depression and fail to achieve prevention of depression in people with disabilities. We screened national survey data from the China Longitudinal Study of Health and Retirement (CHARLS) for correlates of previous studies that may affect depression among people with disabilities. Afterwards, we constructed a predictive model of depression among middle-aged and elderly physically disabled people in China by combining LASSO regression and binary logistic regression to screen out predictor variables that were highly correlated with depression. After verifying the credibility, stability, and generalizability of the model, we visualized the model as a nomogram, and the nomogram obtained from the model visualization can help middle-aged and elderly people with disabilities to self-check whether they have high risk factors for depression and intervene on their own. Based on the nomogram of the prediction model, the clinical staff can quickly screen out the middle-aged and elderly persons with physical disabilities who have high risk of depression, so as to achieve early identification, early intervention and early treatment of depression.

Method

Data sources and model design

CHARLS is a large-scale interdisciplinary survey project hosted by the National Development Research Institute of Peking University, and jointly implemented by the China Social Science Survey Center of Peking University and the Peking University Youth League Committee. Its purpose is to collect a nationally representative set of longitudinal survey data representing households and individuals of middle-aged and elderly people aged 45 and above in China, which can be used to analyze the population aging problem in China, promote interdisciplinary research on aging, and provide a more scientific basis for China to formulate and improve relevant policies [11]. In this study, we selected data from 2015 to 2018, from which we extracted data on the Center for Epidemiologic Studies Depression Scale (CES-D10), health behaviors, demographic factors, physical functioning, and social interactions of middle-aged and older adults with physical disabilities. The data screening process is shown in Fig. 1 Flowchart.

Fig. 1
figure 1

Flowchart

Depression assessment

The CES-D10 contains 10 items, each scored: 0 (rarely or not at all), 1 (sometimes), 2 (most of the time), 3 (all of the time). The total score ranges from 0 to 30, with lower scores indicating lower levels of depressive symptoms. Studies have shown that a threshold of 10 has reasonable sensitivity and specificity for Chinese older adults [12]. Therefore, we defined a CES-D10 score ≥ 10 as depression [13].

Definition of middle-aged and older persons and definition of physical disability

Middle age was defined as 45 to 65 years old, and older people were defined as older than 65 years old [14,15,16,17,18]. Physical disability was defined as loss of function or dysfunction of the human locomotor system to varying degrees due to disability of the limbs or paralysis or deformity of the trunk of the limbs [19].

Correlation between age and depression

We used Association Between Age and depression Using a Restricted Cubic Spline Regression Model after looking at the prevalence of depression in the middle-aged and older age groups in our data set. Graphs show ORs for depression according to Age. Data were fitted by a logistic regression model, and the model was conducted with 3 knots at the 10th, 50th, 90th percentiles of Age.

Inclusion of predictor variables and data set creation

Combining previous studies and the principle of no more than 30% missing values for predictor variables. For demographic factors, we chose gender, childhood health and residential address. Gender differences in depression have been demonstrated in many studies. However, a global meta-analysis showed that gender differences in depression are evident in adolescence [20]. Therefore, we also included the childhood health status of the sample.A multi-country systematic evaluation showed that the association between urban and rural residence and depression remained significant after adjusting for covariates [21]. For health behaviors, we chose sleep duration, nap time, smoking, alcohol consumption, and weekly exercise. Population suffering from depression is associated with dysregulation of normal sleep-wake mechanisms [22]. Systematic evaluations have shown that smoking and alcohol consumption increase the risk of depression [2324]. Exercise, on the other hand, reduces the risk of depression [25]. For physical functioning, we chose vision, hearing, inability to work due to disability, in dressing, bathing, eating, getting up, using the toilet, controlling urination and defecation, preparing food, shopping, and completing household chores. The Spanish National Health Survey showed that visual impairment and hearing impairment lead to more severe depression [26]. Decreased ability to perform ADLs in the population and dependence on ADLs have also been shown to be associated with depression [27]. In terms of socialization, we choose whether or not to get help in the future. Good social relationships can play a role in preventing depression, especially in the elderly [28]. After the collection of predictor variables for the dataset was completed, multiple interpolation of missing values was performed.

General characteristics of the data set

We used chi-square tests for dichotomous predictor variables in the dataset, independent samples t-tests for measured variables, and Mann-Whitney U-tests for ordered categorical variables.

LASSO regression analysis combined with binary logistic regression for analytical screening of predictor variables

We randomly divided the dataset into training and validation sets in the ratio of 7:3. In the training set, we performed LASSO regression analysis using the R package “glmnet” (version 4.2.2). After an initial screening of the variables in the training set, binary logistic regression analyses were performed on the above variables using SPSS (version 26).

Model construction

After screening the predictor variables for the predictive model of physical disability depression in middle-aged and elderly Chinese, we constructed the model in Rstudio using the lrm function in the design package. And we visualized the model using the “nomogram” function in the “rms” package.

Comprehensive evaluation of the model

Model evaluation

In binary classification problems, we often use the Concordance Index (C-Index) to measure the model’s ability to correctly rank positive and negative examples. The C-Index can take values between 0.5 and 1, where 0.5 indicates random prediction and 1 indicates perfect prediction. A higher C-index implies that the model has better ranking performance and discrimination [29]. In addition, a calibration curve is a graphical tool used to assess the calibration performance of a prediction model. It helps researchers to understand the level of predictive accuracy of a model under different probability intervals by comparing the relationship between actual observations and model predictions. We calculated the C_index of the model in order to assess the discrimination of the model; calibration curves of the model were produced to assess the calibration of the model. However, it is difficult to determine the ideal value of the C-index to ensure the efficacy of the predictive model, and it is equally difficult to determine what erroneous calibration values should be rejected. Therefore, we chose Decision Curve Analysis (DCA) to further evaluate the model. It is a graphical tool for evaluating the decision-making performance of a classification model under different thresholds, which can help us in weighing the predictions of the classification model and determining the most suitable thresholds for decision-making [30]. Also, in order to comprehensively assess the performance of the model in different aspects (coverage, accuracy), we also calculated the Recall, Precision, F1_score, and Brier scores.

Internal validation

Internal validation is a necessary step in the development of a model, the significance of which is to quantify the predictive performance of the developed model. In this study, internal validation is carried out using Bootstrap resampling method. Bootstrap resampling is a statistical method used to estimate the distribution of sample statistics as well as parameter uncertainty by reusing the data set by means of random sampling from the training set with put-backs. This method is effective in assessing the stability and reliability of a model over different subsets of data and providing confidence intervals for parameter estimates [31]. We set the sample size to 1000 to obtain more accurate estimates.

External validation

The model is externally validated using the C-index and calibration curves in the validation set.

Results

General characteristics of the data set

A total of 1052 middle-aged and older adults with physical disabilities were included in this study, of whom 497 did not suffer from depression (47.2%) and 555 suffered from depression (52.8%). The distribution of predictor variables is shown in Table 1.

Table 1 General characteristics of the data set

Correlation between age and depression

In the middle-aged group, there were 648 people, 335 (51.6%) suffered from depression. The older age group had a total of 404 individuals and 220 (54.5%) suffered from depression. We plotted the Restricted Cubic Spline Regression Model between age and depression (Fig. 2). For the middle-aged group, age between 45 and 50 years was a protective factor and greater than 50 years was a risk factor, and the correlation was stronger with increasing age. For the older age group, the correlation was weaker with age.

Fig. 2
figure 2

Restricted Cubic Spline Regression Model Note Solid lines indicate ORs, and shadow shape indicate 95% CIs. OR odds ratio; CI confidence interval

LASSO regression analysis combined with binary logistic regression analysis to screen predictor variables

In the LASSO selection path diagram (Fig. 3A), two specific values of λ, lambda.min and lambda.1se, are shown.The LASSO path diagram (Fig. 3B) shows that as the coefficients decrease, the predictors decrease accordingly. We screened 22 predictor variables based on lambda.1se in Fig. 3A. Eighteen statistically significant variables were screened by SPSS, and the covariance diagnostic scores were all less than 10, indicating that the 18 variables were independent. The 18 predictor variables were then included in the binary logistic regression analysis, in which the ordered categorical variables were set as covariates, and the results are shown in Table 2. The significance of the Hosmer-Lemeshaw test was 0.819, indicating a good model fit. According to the principle of P < 0.10 [32], we screened out Gender, Location of Residential Address, Shortsightedness, Hearing, Any possible helper in the future, Alcoholic in the Past Year, Difficulty with Using the Toilet, Difficulty with Preparing Hot Meals, and Unable to work due to disability totaled 9 variables from Table 2 to construct the model.

Fig. 3
figure 3

LASSO regression analysis combined with binary logistic regression for analytical screening of predictor variables (A) LASSO Selection Path Plot: Vertical dashed lines on the left side of the plot indicate Log(λ) corresponding to the minimum error (lambda.1se), while vertical dashed lines on the right side of the plot indicate Log(λ) that is one standard error away from the minimum error (lambda.min). Binomial Deviance: binomial distribution loss function of the model computed on each fold during the cross-validation process (B) LASSO Path Plot: the curve of regression coefficients versus Log(λ) as the coefficient scores are gradually decreasing. Coefficients: regression coefficients corresponding to each independent variable.L1 Norm: the use of the absolute value to Calculate the number of norms for all eigenfactors of the model

Table 2 Results of binary logistic regression analysis

Constructing and validating a predictive model of depression for middle-aged and elderly persons with physical disabilities in China

After determining the final variables for constructing the depression prediction model for Chinese middle-aged and elderly persons with physical disabilities, we constructed the model in Rstudio using the “rms” package and generated a nomogram using the “nomogram” function of the package (Fig. 4A). The area under the ROC curve of the depression prediction model for Chinese middle-aged and elderly persons with physical disabilities was 0.714 (95% CI: 0.673–0.751), as shown in Fig. 4B. The Recall of the model was 0.655, the Precision was 0.692, the F1_score was 0.672, and the Brier score was 0.213. The mean of the mean calibration curve was absolute error was 0.016, as shown in Fig. 4C. The DCA curves showed that, over a wide range of thresholds, the model constructed using the model containing nine predictor variables had higher vertical coordinate values than the model constructed using a single predictor variable. This suggests that the predictive model constructed in this study has a greater gain, see Fig. 4D. The net gain curve shows that patients using the predictor variables have a higher risk of depression, see Fig. 4E.

Fig. 4
figure 4

Construction and validation of the depression prediction model for middle-aged and elderly physically disabled people in China (A) nomogram: the sum of the scores on each predictor, which predicts the probability that depression will occur (B) ROC plot of the model: the horizontal coordinate is the false positive rate, representing the proportion of false positive samples. The vertical coordinate is the sensitivity, representing the proportion of true positive samples. The two form the ROC curve, which is used to assess the model’s ability to correctly classify positive and negative samples at different classification thresholds (C) Calibration curves: the horizontal coordinate is the probability of an event occurring as predicted by the model. The vertical coordinate is the proportion of events that actually occur within the predicted probability range. The calibration curve is used to assess the agreement between the predicted probability of an event and the actual probability of its occurrence (D) DCA chart: the horizontal coordinate is the high risk threshold, referring to the thresholds selected for the different predictor variables. The vertical coordinate is the standardised net benefit, which refers to the standardised net benefit calculated for the different predictor variables constituting the model and the line strategy. The DCA plot of the two makes it possible to assess the extent to which each model outperforms or underperforms the baseline strategy under different decision scenarios (E) Net Benefit Curve: The high-risk thresholds and cost-benefit ratios in the horizontal coordinate compare the costs of using the model with its benefits. The vertical coordinate represents the number of samples judged to be depressed at the selected high-risk threshold for a sample size of 1,000. The three form a net benefit curve that can be used to assess the benefits of each model across different predictor variables, helping policymakers to make optimal decisions

Internal validation of the depression prediction model for middle-aged and elderly people with physical disabilities in China

The area under the ROC curve for internal validation was 0.716, see Fig. 5A. The mean absolute error of the calibration curve was 0.018, see Fig. 5B.

Fig. 5
figure 5

Internal validation of the depression prediction model for middle-aged and elderly Chinese people with physical disabilities (A) ROC plot of internal validation, see Fig. 4B for picture annotations (B) Calibration curves of internal validation, participate in Fig. 4C for picture annotations

External validation of the depression prediction model for middle-aged and elderly people with physical disabilities in China

We externally validated the model according to its predictor variables in the validation set. In the validation set, the area under the ROC curve of the depression prediction model for Chinese middle-aged and elderly persons with physical disabilities was 0.716 (95% CI: 0.660–0.772), as shown in Fig. 6A. The mean absolute error of the calibration curve was 0.016, as shown in Fig. 6B.

Fig. 6
figure 6

External validation of the depression prediction model of Chinese middle-aged and elderly persons with physical disabilities (A) ROC plots of the depression prediction model of Chinese middle-aged and elderly persons with physical disabilities in the validation set, see Fig. 4B for the picture annotations (B) Calibration curves of the depression prediction model of Chinese middle-aged and elderly persons with physical disabilities in the validation set, see Fig. 4C for the picture annotations

Discussion

Researchers in the CHARLS database used a multi-stage sampling method with probability proportional to size to select 150 county-level units sequentially from all counties in China, and ultimately to the community level. This sampling method ensures the representativeness and reliability of the sample and allows for effective statistical analysis of the entire Chinese population [11]. Table 1 shows that the prevalence of depression among middle-aged and elderly persons with disabilities in China is more than 50 per cent, and more than 80 per cent of the patients live in rural areas where there is a lack of medical protection. The prevalence of depression is higher in the elderly group than in the middle-aged group. For the middle-aged group, age between 45 and 50 years is a protective factor, and age over 50 years is a risk factor, and the risk increases with age. For the elderly group, age was a risk factor, but the risk decreased with increasing age. In summary, depression is a serious problem for middle-aged and elderly Chinese people with disabilities.

In the training set, LASSO regression analyses were combined with binary logistic regression analyses to screen for nine variables that were highly correlated with depression.The area under the ROC curve for the model, internal validation and external validation were all greater than 0.70, the mean absolute error was less than 0.02, and the recall and precision were both greater than 0.65, indicating that the model performs well in terms of discriminability, accuracy and generalisation. The DCA curve and net gain curve of the model indicate that the model has high gain in predicting depression. F1_score was 0.672, which indicated that the model could take into account both Recall and Precision. indicates that the model is able to balance Recall and Precision. Brier score of 0.213 indicates that the model is more accurate in predicting the probability of occurrence of depression.

The nomogram shows that being female, living in a rural area, having poor vision and/or hearing, lack of help from others, drinking alcohol, having difficulty in using the bathroom and preparing food, and being unable to work due to a disability are risk factors for depression among middle-aged and older adults with physical disabilities. Women have long been recognized as an independent risk factor for depression. Epidemiology shows that the prevalence of depression in women is almost twice as high as in men [33]. The causes of this phenomenon include genetically determined vulnerability, hormonal fluctuations associated with various aspects of reproductive function, and hypersensitivity to hormonal fluctuations that mediate depressive states [33]. Data from the Korean Longitudinal Survey on Aging showed that the female physical disability group exhibited more depressive symptoms than the male physical disability group [34]. However, this did not occur in the non-disabled group [34]. The higher prevalence of depression and the greater severity of symptoms in rural older adults compared to urban older adults may be due to under-recognition and inadequate treatment [35]. Factors affecting income play an important role in the development of depression in rural residents [36]. It is clear that middle-aged and older adults with physical disabilities will have lower incomes than middle-aged and older adults without physical disabilities. The Spanish National Health Survey showed that the prevalence and severity of depression in adults with both visual and hearing impairments were higher than those with either impairment alone [26]. Nomogram suggests that hearing impairment and visual impairment provide a very large contribution to elevating the incidence of depression in middle-aged and older adults with disabilities. Social relationships not only influence the onset of depression, but also have a significant impact on the severity of depression [28]. Good social relationships can play a protective role against the onset of depressive symptoms in old age [28]. However, middle-aged and elderly patients with physical disabilities are more likely to lack good social relationships than middle-aged and elderly patients without physical disabilities. This is one of the reasons for the high prevalence of depression in the middle-aged and elderly population with physical disabilities. Chronic intake of excessive alcohol may affect the neurological function of the brain and metabolic changes, increasing the prevalence and severity of depression [24]. Epidemiologic studies have shown that physically disabled people have higher alcohol intake [37]. Chronic intake of excessive alcohol may affect the neurological function of the brain and metabolic changes, increasing the prevalence and severity of depression [24]. Combined with the nomogram, alcohol consumption plays an important role in depression among middle-aged and elderly Chinese with physical disabilities. Difficulties in accessing the toilet and preparing food represent the dependence of physically disabled people on the most basic activities of daily living. The interaction between depression and ADLs is unclear in the general population [27]. However, in the disabled population, disability-induced declines in ADLs can exacerbate depression [27]. Finally, the inability to work due to disability directly affects income, which plays an important role in the onset of depression [36].

From the predictor variables, it can be seen that female middle-aged and elderly people with disabilities living in rural areas are at high risk of depression and need to be paid attention by the society. Meanwhile, we constructed a credible predictive model of depression among Chinese middle-aged and elderly persons with disabilities, and its Nomogram can be used as an efficient screening tool for depression among middle-aged and elderly persons with disabilities.

Our study also has limitations. On the one hand, there are a large number of missing values in the CHARLS database regarding the presence of physical disability, which prevented us from calculating the prevalence of physical disability. On the other hand, our model was developed based on data from China, making it difficult to generalize this study to other countries.

Conclusion

This study showed that being female, living in rural areas, having poor vision and/or hearing, lack of assistance from others, drinking alcohol, having difficulty using the restroom and preparing food, and not being able to work due to a disability were risk factors for depression among middle-aged and older adults with physical disabilities. In summary, we screened predictor variables that were highly associated with depression among middle-aged and elderly Chinese physical disabilities people and constructed a prediction model. Based on the above risk factors, we can assess the probability of depression among middle-aged and elderly physically disabled people. Middle-aged and elderly people can reduce the risk of depression by intervening in the risk factors themselves. Clinical staff can quickly identify individuals with physical disabilities at high risk of depression based on the nomogram of the prediction model, thus enabling early identification, early intervention, and early treatment of depression. Also, our data were obtained from a nationally representative data set, and the findings emphasize the urgent need for depression prevention and treatment for female middle-aged and elderly physical disabilities people living in rural areas.