Background

Rifampicin resistant tuberculosis (RR-TB) has become a serious public health issue that poses a growing threat to individuals and communities. This is due to its long infectious period, extended patient persistence, and significant economic burden on both the patient’s family and society. RR-TB is difficult to diagnose, has a lengthy treatment course, is costly, has a high number of adverse effects, and is associated with a high rate of loss [1]. According to the Global TB Annual Report 2022 by the World Health Organization (WHO), it is estimated that there were 45 (95% CI: 39.9 to 50.1) million cases of RR-TB in 2021. The latest WHO estimation method suggests that China alone accounted for approximately 33,000 cases of RR-TB in 2021, making up 7.3% of the global cases and ranking third in the world. Of these cases, the drug resistance rate is 3.4% (95% CI: 3.3–3.5) in new patients and 19% (95% CI: 19–20) in relapsed patients, making China one of the countries with a high burden of RR-TB globally [2, 3].

Yunnan Province is one of the regions in China with a high burden of RR-TB. The Tuberculosis Management Information System (TBMIS) reports that in 2022, the number of reported cases of pulmonary tuberculosis in Yunnan Province will exceed 25,000, the drug resistance rate of new patients will be 2.6%, and the drug resistance rate of recurrent patients will be 15.7%. According to the WHO standard formula, there are approximately 750 RR-TB patients in Yunnan Province in a year. Although the detection rate of RR-TB patients in Yunnan Province has reached 80%, nearly 30% of RR-TB patients have not received standardized treatment. Additionally, 50% of RR-TB patients in the province have not completed the full course of treatment. These conditions have contributed to the continued insidious spread of RR-TB within the community. When the community residents become infected, they can directly become RR-TB patients after the onset of the disease, thereby exacerbating the RR-TB epidemic in the province.

Previous studies have indicated that tuberculosis incidence exhibits a long-term time trend effect and a cyclic seasonal pattern [4, 5]. Predicting incidence trends is critical in the prevention and treatment of tuberculosis [6]. In this paper, we utilized a SARIMA model to quantitatively predict the monthly incidence of RR-TB in Yunnan Province. Our study provides information on the temporal distribution of RR-TB, which can guide government health administration departments and the Centers for Disease Control and Prevention (CDC) in preventing and controlling the RR-TB epidemic. Our findings can serve as a basis for making informed decisions to manage the spread of RR-TB and mitigate its impact on public health.

Methods

Study setting

The study was conducted in Yunnan province, which is situated in the southwestern border of China. The province is categorized as old, small, border, and poor, and it comprises 16 states and 129 counties [7]. Approximately 74% of the province’s population of 46 million people reside in remote mountainous rural areas with very limited access to transportation. Yunnan province covers an area of about 394,000 km2, with mountainous areas accounting for more than 94% of the region [8]. The province has an underdeveloped economy, poor allocation of medical resources, numerous ethnic minorities, and weak public awareness of infectious disease prevention [9]. These unique conditions make Yunnan province an ideal location for studying RR-TB incidence and prevalence, and for investigating effective prevention and control measures.

The prevention and treatment of RR-TB in Yunnan Province faced several challenges due to inadequate funding and resources, outdated diagnostic techniques, low screening rates, and low detection rates, especially after the end of Global Fund support in 2015. However, in 2019, Yunnan Province implemented the Action Plan for Curbing Tuberculosis (2019 to 2022) to address these challenges and promote the prevention and treatment of RR-TB in the region. The plan aimed to enhance laboratory testing services at all levels, promote the use of rapid diagnostic tools, establish a drug-resistant rapid detection process, and improve RR-TB screening at the county level. From 2019, Yunnan CDC had used the central and provincial special TB prevention and control funds to provide free Gene-Xpert and corresponding reagents for all county-level TB laboratories. Gene-Xpert had became a routine test. TB care facilities first carried out sputum smear and sputum culture tests for TB suspects, and conducted free RR-TB screening by using Gene-Xpert for TB patients with positive pathogenic bacteria (sputum smear or sputum culture positive). Since 2019, the province’s RR-TB screening rate had remained above 90%, which had significantly improved the level of RR-TB detection in Yunnan province.

Data collection

This study utilized routine surveillance reporting data from the Yunnan Provincial Infectious Disease Network Surveillance and Reporting System, a national system introduced in 2004 to improve the timeliness and accuracy of infectious disease reporting in China. The application of this system has greatly improved the timeliness and accuracy of infectious disease reporting, bringing China’s infectious disease reporting to a standardized management track and playing a positive role in better controlling the occurrence and prevalence of infectious diseases [10]. Monthly incidence rates of RR-TB were collected from January 2019 to December 2022, and RR-TB included mono-resistant rifampicin TB, multidrug-resistant TB, and extensively drug-resistant TB. It is expected that at least 80% of the province’s registered and reported pathogenically positive TB patients have received drug-susceptibility testing and reported valid results to ensure data representativeness [3]. The system requires medical institutions to enter basic information on TB patients, case classification, disease classification, onset and time of visit within 24 h of confirming a TB patient or finding a suspected TB patient.

Data analysis

Seasonal time series analysis

SARIMA model is suitable for data with seasonal trends. The modeling is generally divided into three stages, i.e. model identification, parameter estimation and model diagnosis, and through the iterative process of the above three steps, the “optimal” model for forecasting or control is finally determined [11] (Fig. 1).

Fig. 1
figure 1

Seasonal time series modeling process

Model identification: check whether the original series has obvious seasonality and periodicity, and judge the smoothness of the time series, which requires that the series mean and variance do not change with time. For non-stationary time series, the series needs to be smoothed by first-order or second-order difference to eliminate the trendiness of the series existence.

Parameter estimation and model testing: the values of the model parameters p, d, q, and P, D, Q were initially determined by plotting the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) of the series. The coefficients of the autoregressive and moving average processes were calculated from the values of the autoregressive orders p, d, q, and the seasonal orders P, D, Q, using either the maximum likelihood estimation or the least squares estimation, and their standard errors were tested for hypotheses. The Box-Ljung statistic is used to test the residual series to check whether the residual series is white noise, i.e., whether the model satisfies the residual uncorrelated principle. If the test finds that the model does not satisfy the residual uncorrelated principle, the model should be returned to the first stage and re-modeled. If several models meet the requirements of statistically significant parameters and white noise residuals at the same time, the model’s goodness-of-fit statistics, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), can be examined. The smaller the value, the better the model fit.

Model prediction application: after the model is repeatedly revised and improved, an appropriate prediction model is finally established and predictions are made for the values of a certain period in the future.

Statistical analysis

The RR-TB incidence data from Yunnan Province for the period of 2019 to 2022 were analyzed using R, SPSS 25.0. The linear correlation analysis between RR-TB and the incidence rate of all TB was conducted by Pearson linear correlation analysis. Annual trend of incidence proportion was tested using linear chi square test. A time series SARIMA model was used to predict the number of monthly RR-TB cases in Yunnan Province in 2023, and the model was validated using time series plots, seasonal and non-seasonal differencing, autocorrelation and partial autocorrelation analysis, and white noise tests. The model’s predicted values were compared with the actual data to determine its accuracy.

Results

Model identification

The monthly incidence of RR-TB in Yunnan Province between January 2019 and December 2022 was used as modeling data to plot the original time series of the monthly incidence of RR-TB. Over the entire time series, the number of RR-TB cases in the province exhibited a declining trend. There was a period of high value from the later part of 2019 to mid 2020. The trend of RR-TB incidence was similar to the overall trend of all TB incidence. From 2019 to 2022, the number of reported all TB cases in Yunnan province decreased from 29,800 to 24,000 (r = 0.749, P < 0.001). However, there was no significant change in the proportion of RR-TB among all TB cases, which remained within the range of 2.0 -2.5% (χ2trend = 0.107, P = 0.744) (Fig. 2).

Fig. 2
figure 2

The distribution of RR-TB incidence and all TB incidence. a The incidence linear correlation beteween RR-TB and all TB; b the proportion of RR-TB among all TB cases from 2019 to 2022

The time series decomposition shows that time series presented obvious seasonality, periodicity and randomness after being decomposed. The Chinese national standard “Climate Seasonal Division”, which began to be implemented on February 1, 2023, proposes that March to May is spring, June to August is summer, September to November is autumn, and December to February is winter. In terms of seasonal incidence, the number of cases in winter (December-March) ranged between 70 and 90, with a decreasing trend observed. The month of January had the highest peak incidence for the year. Spring (April-June) showed a slight decrease from the previous peak, with the number of cases ranging between 50 and 70, yet remained at a high level for a period. The second peak of the year was observed in May. Smummer to Autumn (July-November) showed a decreasing trend, with the number of cases ranging between 10 and 50. The original series was not a smooth series, as evidenced by the ADF test with t=-2.425 and P = 0.135, and it needs to be differentially smoothed and identified again. Time series analysis was performed on the original series after 1 non-seasonal difference and 1 seasonal difference, and the data fluctuated around 0. The ADF test showed t=-6.924, P < 0.05, and the data can be considered to be smooth and meet the prediction conditions of the SARIMA model (Fig. 3).

Fig. 3
figure 3

Time series decomposition of RR-TB cases from January 2019 to December 2022

Parameter estimation and model testing

After smoothing the original time series, it was determined that d = 1 and D = 1, and the model could be classified as a SARIMA (p, 1, q) (P, 1, Q)12 model. To determine the values of p, q, P, and Q, the truncated and trailing cases of the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots were analyzed and evaluated (Fig. 4). The seasonal parameters were gradually verified from lower to higher orders, and finally p = 1, q = 1, P = 1, and Q = 0 were determined based on the fit and residuals of each model, combined with minimum AIC and BIC for analysis. The SARIMA (1, 1, 1) (1, 1, 0)12 model was chosen and statistically significant model parameter estimates (t=-2.792, P = 0.009). Meanwhile, a white noise test was conducted on the residual series, and the residual autocorrelation and partial autocorrelation coefficients fell within the 95% confidence interval. The Box-Ljung test of the residual series resulted in a statistic of 14.237 and P = 0.508, which indicates that the series can be considered as a white noise series after differencing (Fig. 5).

Fig. 4
figure 4

The ACF and PACF graphs for estimating the parameter. a The ACF graph of the raw data (d = 0 and D = 0); b the PACF graph of the raw data (d = 0 and D = 0); c the ACF graph of one-order trend and one-order seasonal diference data (d = 1 and D = 1); d the PACF graph of one-order trend and one-order seasonal diference data (d = 1 and D = 1)

Fig. 5
figure 5

The ACF and PACF graph for sequence of residual. ACF is on the left, PACF is on the right

Model prediction application

The SARIMA (1, 1, 1) (1, 1, 0)12 model was utilized to predict the incidence of RR-TB in Yunnan Province for 2019 to 2023. As per the prediction, As per the prediction, The predicted seasonal trend of RR-TB incidence in 2019–2023 was similar to the actual data. The percentage accuracy in the prediction excesses 80% in 2019–2022 and is all within 95% CI (Fig. 6). There were 392 recorded cases of RR-TB from January to December 2023, which was a decrease from 498 cases in 2022. The overall trend of RR-TB incidence in Yunnan Province showed a downward trend throughout the year, similar to the trend observed in previous years. But in the real world, there was a certain gap between the actual incidence and the predicted value in 2023, and the acutual incidence had increased by 12.4% compared to 2022. The percentage of accuracy in the prediction was only 70% in 2023 (Table 1).

Fig. 6
figure 6

Trend forecast of the number of drug-resistant tuberculosis incidence in Yunnan Province, 2019–2023. the red and blue lines represent the actual values and predicted values, respectively; grey dotted lines represents 95% confdence intervals; after black vertical line, the blue line represent predicted values of RR-TB in 2023

Table 1 Trend values and confidence intervals for the number of RR-TB incidence in Yunnan Province in 2023

Discussion

Predicting the future incidence of tuberculosis is crucial in the prevention and control of the disease [11,12,13,14,15]. Although there had been many time series analyses on tuberculosis globally, studies on RR-TB are rare. We had limited knowledge of the incidence pattern of RR-TB. This study is the first to conduct a time series analysis and incidence prediction of RR-TB in Yunnan Province, which has a high burden of RR-TB epidemic. By accurately predicting the incidence trend of RR-TB, early warning of epidemics or outbreaks can be provided [16]. The predicted future incidence of RR-TB can serve as a reference for the CDC department’s focus and investment in prevention and control, as well as the preparation of diagnostic reagents and consumables for sentinel medical institutions and the procurement of therapeutic drugs. It can also aid in the more rational allocation of public health resources by relevant health administrative departments.

From 2019 to 2022, the number of RR-TB reported in all TB in Yunnan province showed an overall downward trend. The trend of RR-TB changes was similar to the overall trend of all TB incidence, the incidence of RR-TB decreases as the incidence of all TB decreases. There was no significant change in the proportion of RR-TB among all TB cases, which remained within 2.5%. Meanwhile, the RR-TB screening rate remains consistently above 90%. This finding suggested RR-TB incidence was directly related to the downward trend of all TB incidence in Yunnan province and the number of RR-TB was based on the overall incidence level of all TB. Although there was a risk from the later part of 2019 to mid 2020, this is due to the gradual coverage of molecular tests in Yunnan Province since 2019, and many hidden RR-TB patients with long-term bacterial discharge have been rapidly discovered at the grassroots level. At the same time, this result is in line with the prediction of experts from the Chinese CDC, who concluded through time-series modeling that the rate of RR-TB in China is decreasing year by year, and the estimated number of patients with RR-TB is also decreasing year by year [3]. In addition, the COVID-19 pandemic severely impacts the province’s healthcare system in 2020–2022. On the one hand, traffic control and travel restrictions have resulted in some TB patients not reaching sentinel health facilities in a timely manner; on the other hand, the whole population started wearing masks, reducing mutual mobility, which together affected the changes in the TB epidemic [17,18,19]. Despite this, annual screening for RR-TB in Yunnan Province continues to be conducted normally, with RR-TB screening rates remaining high at over 90%, and has not been significantly affected or impacted by the COVID-19 epidemic.

In terms of month and quarter distribution, from 2019 to 2022, the incidence trend of RR-TB in Yunnan Province shows an obvious seasonal effect, which is more similar to the cyclical changes of non-RR-TB [16, 20], this trend was very similar to the analysis of non-RR-TB time distribution in Yunnan Province by other researchers in the past, without significant differences [6, 21]. With the number of incidences in winter of each year (December-March) showing a sharp downward trend after January, probably because the Chinese Spring Festival holiday is usually in January-February, and according to the Chinese custom, the symptoms during the Spring Festival are not severe The vast majority of patients do not visit hospitals during the Spring Festival [21, 22], Clinicians could not detect RR-TB patients. In addition, there is a peak in spring (April-June), when the number of cases was higher and more concentrated, probably because of the high flow of people during the Spring Festival holiday, which coincides with winter and spring, when the weather is cold and people like to gather in enclosed places such as homes with poor ventilation for parties and dinners, which greatly helps the spread of RR-TB bacteria, thus infecting them [23, 24], and after a few months to April months the disease starts to develop one after another or was detected after physical examinations, etc. Special places like schools and places of supervision where people gather should strengthen the epidemic warning, regular TB screening and inspection. Compared with the distribution of onset time of other acute respiratory diseases, the onset time of RR-TB patients was significantly delayed, with a unique peak in spring and summer, which was different from the winter peak in most respiratory diseases. The SARIMA (1, 1, 1) (1, 1, 0)12 model predicted that the number of RR-TB incidence in 2023 continued to show a certain degree of decline compared with 2022, and the annual incidence trend is similar to that of previous years, But the actual data actually increased and did not continue to decline, which was very important. We need to make a special note here that with the end of the COVID-19 pandemic, the flow of people, wearing masks and the accessibility of medical services had improved significantly, and RR-TB patients who were difficult to get medical treatment and who were spreading in the community have been diagnosed. This also suggested that the COVID-19 had to some extent covered the real incidence of RR-TB, and the RR-TB epidemic we are facing was still serious. Of course, such unknown changes present non-linear characteristics that could not be effectively solved by the SARIMA model for prediction [25].

However, there are still some limitations in this study. First, although the time series method can identify the trend and pattern of series changes in a short period of time, such analysis is limited to short-term prediction, and as time increases, the 95% confidence interval of the predicted values will become wider, and the accuracy of the prediction will then decrease. Second, this study only included morbidity data for analysis, but morbidity is influenced by many factors, such as outbreaks or pandemics of infectious diseases, population, environment, meteorology, and economy, etc. [26,27,28,29]. The prediction results and accuracy will be more accurate only if more data are included and the model is continuously revised and improved [30, 31], and more influencing factors should be further included for in-depth analysis and research in the future.

Conclusion

We found the incidence of RR-TB was based on the overall incidence level of all TB in Yunnan. The SARIMA model successfully predicted the seasonal incidence trend of RR-TB in Yunnan Province in 2019 to 2023, but the prediction precision could be influenced by factors such as new infectious disease outbreaks or pandemics, social issues, environmental challenges or other unknown risks. The above results indicated that the RR-TB epidemic in Yunnan Province was still relatively severe. CDCs should continue to monitor the change trend of the incidence rate of RR-TB and all TB, and pay special attention to the continuous impact and post epidemic effect of the outbreak or pandemic of new infectious diseases on the accessibility of social resources and medical services. We carried out monitoring and early warning, and better optimize disease prediction models to control RR-TB.