Introduction

In the society of rapid aging, the increased risk of chronic diseases among middle-aged and older adults has become a major public health issue. In China, it was projected that the population of people aged above 60 would reach 402 million (28% of the total population) in 2040, and over 75% are suffered from at least one chronic disease [1, 2], which was referred as “multimorbidity”. Over one third of people with chronic diseases would develop multimorbidity worldwide [3], and the risk increases with age. Multimorbidity can account for 0.15 years of disability (YLDs) [4] and lead to increase cost of medical care [5, 6], and more complicated clinical treatment and medication [7, 8]. Yet, so far, little effective treatment was found for multimorbidity, making it significant to prevent the development of multiple conditions at the early stage. Therefore, it is critical to identify the development trajectory between various chronic conditions.

Despite the high prevalence of multimorbidity, research evidence on the development trajectory pattern between chronic diseases has remained limited. The global prevalence of multimorbidity was 30% among those aged from 45 to 64, and increased to 65% among those aged over 65 [9]. In China, the prevalence of multimorbidity among middle-aged and older adults is similar to the rate abovementioned [10], which is higher than that of India and Brazil [11, 12]. However, most of the existing research on multimorbidity are cross-sectional studies, leaving the trajectory patterns between multiple chronic diseases. Jesen et al. (2014) has conceptualized ‘disease trajectories’ with the temporal correlation between various diseases [13], and developed the data-driven longitudinal analysis has been commonly used to explore disease trajectories in general populations as well as in the clinical samples of specific diseases [14,15,16]. With a sample of 6.3 million people in Denmark, gout and Chronic obstructive pulmonary disease were identified as key diseases for disease trajectory progression [13]. Patients with septic sweat glands were found to be susceptible to type 1 diabetes and likely to develop acute myocardial infarction, pneumonia and chronic obstructive pulmonary disease subsequently [15]. In the literature on Chinese older population, Meng et al., (2022) reported that cardiovascular and cerebrovascular diseases, cancers, chronic respiratory system diseases and diabetes were the top four chronic diseases with the highest mortality rate in China (i.e., 1615.68, 759.98, 383.27 and 89.44 persons per 100,000 people respectively [17]). However, little has been known about the disease trajectories in Chinese populations, which may be different from those found in European or other populations due to differences in geographical aspects, living habits and genes.

To fill this research gap, the current study has conducted trajectory network analysis with a national cohort data to explore the chronic disease trajectory of people aged 45 and above in China. The research purpose is to unravel the chronic condition trajectory among Chinese mid-life and older adults. The results would advance our understanding about multimorbidity development in middle-aged and older generations, thus providing new insight for potential mechanism or cause and enhancing self-care and disease prevention among people living with chronic conditions.

Materials and methods

Study design and data collection

This study used data from the China Health and Retirement Longitudinal Study (CHARLS), a nationally representative sample of people aged 45 and above. The baseline survey was conducted in 2011 with three follow-ups in 2013, 2015, and 2018. With multistage probability sampling, the participants were randomly selected from 450 villages/urban communities in 150 counties/districts of 28 provinces in China. Respondents were interviewed using face-to-face computer-assisted personal interviews. The survey includes demographic characteristics, health status and physical functions, as well as health care and insurance information. Further details about CHARLS are available in the previous publications [18,19,20]. The current analysis included all the follow-ups since baseline. After excluding the subjects who did not take part in the follow-ups (n = 1,226) or have missing values in demographic variables (n = 586), 15,895 participants were included in the final analysis (Fig. 1).

Fig. 1
figure 1

Flow chart through the study

Diseases selection

The diagnosis and onset time of 14 common chronic diseases were collected in baseline and every follow-up through questionnaire. Participants were first asked, "Have you been diagnosed with …(disease) by a doctor?". After receiving a positive answer, the inspector then asked, "When (year or age) was the condition first diagnosed or known by yourself?". Diseases included hypertension, dyslipidemia, diabetes, cancer, chronic lung disease, liver disease, heart diseases, stroke, kidney disease, digestive disease, emotional and mental problems (EMP), memory-related disease (MRD), arthritis, and asthma. In the follow-ups, participants were asked if the last reported disease status was accurate. We corrected for possible recall bias with this question. To capture the trajectory of diseases, the first chronic disease was defined with referring to first chronic disease diagnosed among the 14 diseases. Meanwhile, the age of the incidences of each chronic disease were also reported.

Covariates included gender, age, education, type of residence, smoking, and drinking status. Education includes “low education” (junior high or below) and “high education” (high school or above). The type of residence includes “urban” and “rural” areas. Smoking status included “currently smoking” and “not smoking”. Drinking status included “drinking” (more than once a month) and “not drinking” (less than once a month or None).

Statistical analysis

SAS 9.4 was used for data analysis. Descriptive results were presented in Table 1. Variables were weighted by individual-level weight with household and non-response adjustment. Box plot was used to describe the median and interquartile range (IQR) of onset age of the first chronic disease. Survival curve was used to describe the development of chronic diseases during the survey. To address the changes in individual’s proportion of chronic disease, stream plot was conducted. Considering the gender differences in the 14 chronic diseases, we conducted all analysis mentioned below (See Supplementary Table 2, Figs S1, S2 and S3). At last, binomial test was employed for assessing the direction of diseases, and conditional logistic regression was conducted to explore the associations between different chronic diseases. All Charts were completed in R 4.2.1.

Table 1 Characteristics of population

Binomial test

With permutation and combination, there were 91 pairs for 14 chronic diseases. Each pair of diseases had two possible directions of disease progression: disease 1 (D1) → disease 2 (D2) or D2 → D1. Assuming the number of individuals who developed the two diseases consecutively (excluding the number of people who developed two diseases simultaneously) was N, i.e., the number of people developing D1 → D2 was N1 and those with D2 → D1 was N2, and N1 / N and N2 / N follow the binomial distribution of 50%. The sequence of which more people developed two diseases can be regarded as the direction of the disease pair. According to Han et al.(2021) [16], the p value threshold was set to 0.05/N (Bonferroni corrected) to reduce type I errors.

Conditional logistic regression

After the direction was determined for the disease pair by binomial test, e.g., from D1 to D2, D2 would be considered as the outcome and D1 as the exposure of interest. To control for selection bias and test the causal associations, we adopted the method of nested case–control study. Those who had D2 in the follow-up were considered as case group, and matched with control group (who never had D2 and D1 at baseline) by sex, age, and incidence density at a ratio of 1:3. Following this, conditional logistic regression would be conducted to calculate the odds ratio (OR) of D2 after the onset of D1. To minimize bias from small sample sizes, we used the bootstrap for 1000 samples to calculate average ORs and their confidence intervals. [21, 22] and finally selected the disease pairs with a confidence interval not containing 1 and an OR greater than 1 [16].

Construction of disease trajectory network

The construction of disease trajectory network had previously been described [23]. With connecting the common nodes (i.e., common diseases) between disease pairs to form the disease trajectory, the disease trajectory network was conducted. For example, if D1 → D2 and D2 → D3 were found, then the trajectory of D1 → D2 → D3 would be generated. If D3 → D4 was also found, the trajectory would become D1 → D2 → D3 → D4. Disease trajectory networks are visualized by Cytoscape versions 3.9.1. Each node represents a certain disease, and the color depth of the node represents the number of patients with the disease. The thickness of the line represents the OR value between the two diseases. For details, see in Fig. 6.

Linear disease trajectory

A linear disease trajectory showed that participants follow three consecutive diseases in the disease trajectory network. The number of participants on each linear trajectory was calculated and used as a sorting criterion. The length between nodes represents the median duration that patients developed from the previous disease to the next one.

Results

Characteristics of participants

Table 1 showed the baseline characteristics of the study participants, and among 15,895 participants, 47.93% were male and 52.07% were female. Only 14.33% of participants had obtained high school education or above. Among all participants, 30.43% were smokers and 33.18% were drinkers. The prevalence of multimorbidity (more than two diseases) was 49.20% among the population.

First-onset age of each chronic disease

Figure 2 showed the median and interquartile range of first-onset age by chronic disease. The onset age of EMP, asthma, liver diseases, and digestive diseases tended to be earlier (all below age 50), while the onset age of stroke, hypertension, diabetes, and MRD were all above 50 years.

Fig. 2
figure 2

The median and IQR of onset age of individual’s first chronic diseases

Cumulative occurrence of the chronic diseases

Figure 3 showed the cumulative incidence of 14 chronic diseases. The survival curve was estimated by Kaplan–Meier method. To explore how disease risk varies across different age, two age subgroups were compared. Among overall population,, the cumulative incidence of hypertension, arthritis, and dyslipidemia were the highest in the age group of 45 to 64. Conversely, the cumulative incidence of hypertension, heart diseases, and arthritis were the highest for those aged 65 years and above. Compared with those aged from 45 to 64, the cumulative incidences of all diseases increased among people aged 65 years and above, with the most notable increase in that of heart diseases and hypertension.

Fig. 3
figure 3

Cumulative rate of 14 chronic diseases

Distributions of diseases across 14 diseases by age and gender

Figure 4 displayed how the number of 14 chronic diseases varies across age. The number of diseases began to increase significantly after age 40 except for EMP, asthma, cancer, and MRD, with the most conspicuous raises in arthritis and digestive diseases. Overall, individuals start to develop chronic diseases frequently at the age of 40, reaching a peak of incidence at the age of 53–55.

Fig. 4
figure 4

Age distribution of the onset of 14 diseases

The sequence of disease development

The results of the binomial test were shown in Supplementary Table 1, Additional File 1. In the total sample, 51 pairs of disease were selected, which were presented with descending order of ORs (Fig. 5). The three disease pairs with the highest ORs are: from dyslipidemia to diabetes (OR = 2.88, 95% CI = [2.31–4.42]), from dyslipidemia to MRD (OR = 2.56, 95% CI = [1.73–4.42]), and from kidney diseases to MRD (OR = 2.55, 95% CI = [1.47–5.02]).

Fig. 5
figure 5

Statistically significant disease pairs

Disease trajectory network

Figure 6 showed the trajectory network of chronic diseases. Arthritis was found to be the beginning of the trajectory network. In contrast, stroke and MRD were located at the end of the network. Hypertension, digestive diseases, heart diseases, and dyslipidemia were at the center of the network, indicating that they were associated with or transitional into most other diseases. In addition, there was a trend of increasing risk of disease throughout the disease trajectory network. For instance, the ORs from arthritis, heart diseases, and hypertension developing to dyslipidemia were 1.54 (95% CI: 1.17–2.14), 1.44 (95% CI: 1.11–2.00), and 1.84 (95% CI: 1.48–2.42), respectively. However, the ORs of dyslipidemia developing to stroke, diabetes, and MRD were relatively larger: 2.45 (95% CI: 1.83–3.71), 2.88 (95% CI: 2.31–4.42), and 2.56 (95% CI: 1.73–4.42), respectively. No significant relationship was observed between EMP, cancer, with other diseases.

Fig. 6
figure 6

Disease trajectory network. (Nodes represent a certain disease; The color depth of the node represents the number of patients with the disease; The thickness of the line represents the OR value between the two diseases)

Linear trajectory

The trajectories containing three diseases were identified based on the number of patients and the onset age, as shown in Table 2. With the analysis of linear trajectory, it was found that people with arthritis, hypertension and digestive diseases are more likely to develop multimorbidity. The top 10 linear trajectories all start with the three diseases mentioned above. In addition, the average duration from the first stage disease to the intermediate stage disease was 9.5. However, the average duration from the intermediate stage disease to the subsequent disease was 5.1 years, suggesting a more rapid progression of chronic disease in later stage. The top three disease liner trajectories with the largest number of patients are Arthritis-Hypertension-Dyslipidemia (n = 446), followed by Arthritis-Hypertension-Heart diseases (n = 389) and Arthritis-Hypertension-Diabetes (n = 328), which all started from Arthritis.

Table 2 Top 10 linear trajectory

Discussion

With analyzing a large cohort study of middle-aged and older adults in China, we constructed a trajectory network for the progression of chronic diseases. The trajectory network analysis revealed a general pattern of chronic disease development, which filled the gap in the existing research on multimorbidity trajectory in China.

The results indicated that arthritis served as a starting point for developing multimorbidity in this population. The finding is consistent with Castillo et al. (2021), reporting an early onset of arthritis in middle-aged and older populations [24]. According to the authors, not only because arthritis emerged earlier than any other chronic disease, but also because it increased the risk of other diseases. A plausible explanation is that the premature senescence of the immune system in patients with arthritis may lead to multimorbidity [25]. In addition, arthritis was likely to reduce patients’ physical activity, resulting in an elevated risk of diseases. Meanwhile, hypertension, heart diseases, dyslipidemia and digestive diseases were found to be at the center of the trajectory network and involved in various disease courses. This suggests that these four diseases were crucial in the development of chronic diseases among middle-aged and older people. For example, dyslipidemia could be an outcome of cardiovascular disease and was also associated with the onset of metabolic and neurological disorders [26,27,28]. As for digestive disease, it was found to be related to the polypharmacy in multimorbidity [29, 30]. Furthermore, digestive diseases might affect people’s diets and lifestyle, which increase the risks of other diseases. Heart diseases ensued from a vascular embolism and followed a relatively protracted process. Studies have found that arthritis and hypertension may result in increased risk of heart diseases [31, 32]. Our study also suggested that heart diseases may further lead to stroke, diabetes, and MRD, which was consistent with the previous research [33,34,35,36,37,38]. Additionally, it was also found that there is a trend of increasing risk throughout the disease trajectory network. These findings would provide critical insights for the time window of designing disease prevention services.

By addressing linear trajectory, it was found that the progression of multimorbidity was relatively slow at the beginning compared to the later stage. After the onset of the first chronic disease, it took more years for the subsequent disease to occur. Once the second disease emerges, the progression into other diseases tended to be faster. Therefore, more attention should be paid to preventing future disease development, particularly after the onset of the first chronic condition. Once it started, developing into multimorbidity would be faster and more difficult to prevent. In other words, prevention-focused strategies are more important and beneficial than treatment-focused strategies in controlling for chronic diseases development. The findings help clarify the prevention priorities for multimorbidity in middle-aged and older people in China, and has practical implications for developing individualized prevention strategies to reduce the medical burden and improve the quality of life among patients with chronic conditions.

Our study is based on nationally representative cohort, representing the Chinese middle-aged and older people. Besides, temporal disease trajectory analysis is used to visualize pattern of chronic condition progression. Most importantly, our results can provide insights for the time window and the focus of chronic disease control service. There are also some limitations to be acknowledged. First, the small sample size in our study may limit the generalizability of the results to the wider population. We acknowledge that a larger population would allow for more statistical power. However, we have employed rigorous statistical analysis to ensure the validity of our results. In future research endeavors, sample size will be a primary consideration for us. Second, the prevalence of chronic diseases may have been affected by participants’ recall bias given the nature of self-report data. Future studies should consider including medical record into the analysis. Third, the entire follow-up periods have lasted for only seven years, which did not capture the occurrence of diseases with slow onset and complex causes. In addition, since only 14 common chronic diseases were included in the survey it is difficult to compare the results of studies from other countries. More chronic conditions should be included in future studies. At last, the classifications of diseases were broad. The aggregation of diverse diseases into 14 categories may obscure the nuanced understanding. It is important for future research to consider more granular disease classifications.

Conclusion

With the disease trajectory network analysis, we found that arthritis was the key disease that was prone to the occurrence and development of various other diseases. In addition, patients with heart diseases/ hypertension/ digestive disease/ dyslipidemia were under higher risk of developing other chronic conditions. The results highlighted that for patients with multimorbidity, early prevention can reduce the risks of developing chronic diseases with a poorer prognosis, such as stroke, MRD, and diabetes. By identifying the trajectory network of chronic disease, the results provided critical insights for developing early prevention and individualized service to better control the progression of multimorbidity among middle-aged and older adults, as well as reduce disease burden and improve their quality of life.