Introduction

In different climates, influenza shows a variable epidemic pattern. For example, in temperate climates, seasonal epidemics mainly occur during winter [1]; in subtropical climates, influenza often shows two annual peaks, in winter and summer; and in tropical climates, influenza outbreaks may occur irregularly throughout the year [1, 2]. The difference in influenza epidemic patterns may directly or indirectly affect the response strategies, such as vaccination and the allocation of medical resources [3].

Many factors affect the spread of influenza including human mobility [4], humidity [5, 6], non-pharmaceutical interventions [7], air pollution [8], the types of virus, and the immunity of the population [9]. Of these, ambient pollutants have received an increasing amount of attention. One study in Australia showed that high concentrations of ozone (O3) and PM10 were significant risk factors for pediatric influenza [8]. In Beijing, China, ambient PM2.5 concentrations were significantly associated with influenza-like illness (ILI) incidence risk during the flu season across multiple age groups [10]. Compared with the occurrence of influenza, evidence of the relationship between ambient pollution and influenza transmissibility remains limited. The transmissibility index of influenza can be characterized by the reproduction number (\({R}_{t}\)), defined as the average number of secondary infections caused by a typical single infectious individual at time t; a higher \({R}_{t}\) value indicates higher transmissibility. Ali et al. reported that O3 is a significant driver for influenza transmissibility and has an L-shaped relationship with \({R}_{t}\) in Hong Kong based on data for all types/subtypes [11]. However, as Hong Kong is located in the subtropic, differences in climate may affect such relationships in other regions. Accordingly, exploring the relationship between O3 and \({R}_{t}\) in different climates is urgently needed to clarify the potential impact of O3 on influenza transmissibility.

In China, most of the northern provinces have a temperate climate, while most of the southern provinces are subtropical. Therefore, in this study, we selected four provinces each in northern and southern China to examine the impact of O3 on influenza transmission.

Methods

Data source: 5 years of data from 2013 to 2018

Hourly ambient temperature and dew point temperature data for each province were obtained from the National Centers for Environmental Information (Global Surface Summary of the Day—GSOD. https://www.ncei.noaa.gov/access/search/data-search/global-hourly?bbox=40.563,115.742,39.252,117.052&pageNum=1, Accessed 4 August 11 2021). Using the R package “humidity” (R software, version 4.2.1), we calculated hourly relative and absolute humidity. The daily values for temperature, relative humidity (RH) and absolute humidity (AH) were determined by taking the arithmetic mean of their respective hourly values for each day.

Daily concentrations of O3 in the eight provinces were obtained from the China High Air Pollutants (CHAP) dataset [12]. CHAP is a long-term, full-coverage, high-resolution, and high-quality datasets of ground-level air pollutants for China. This dataset produced high-quality daily O3 concentrations on 10 km × 10 km grid scale, derived from big data (e.g., ground-based measurements, satellite remote sensing products, atmospheric reanalysis, and model simulations), by using artificial intelligence. Its cross-validation coefficient of determination, a root-mean-square error (RMSE), and a mean absolute error (MAE) for daily O3 concentrations were found to be 0.87, 17.10 \(ug/{m}^{3}\) and 11.29 \(ug/{m}^{3}\) respectively when compared with data from ground stations [13, 14]. For provincial-level, the daily O3 concentration was calculated by taking the arithmetic mean of values from each 10 km x 10 km grid. Information about holiday-related school closures, including public holidays, summer holidays, Chinese New Year holidays and winter holidays, was also collected. Weekly ILI and viral-detection rate data for each province were obtained from the Chinese National Influenza Surveillance Network. Based on previous studies [15,16,17,18,19,20], proxy measures for the weekly incidence rate were obtained by multiplying the ILI percentage among patients visiting sentinel hospitals with the proportions of influenza-positive specimens, which is referred to as influenza rate. This proxy is considered a precise representation of the activity of influenza infection [21, 22]. We multiplied the weekly incidence rate by a constant (10,000) representing the inverse of the coverage of the sentinel sites in the studied provinces, and rounded the resulting values to the nearest integers to obtain a time series of weekly incidence rate counts (ILI + counts) [23]. Due to differences in the epidemiological characteristics of influenza in southern and northern China [5, 24], we conducted analysis by region and constricted in 8 provinces and municipalities (Fig. S1). We selected these locations based on the availability of influenza surveillance data and O3 concentration during the study period. Beijing, Tianjin, Shanghai and Jiangsu have relatively high O3 concentrations, while Hunan, Guangdong, Liaoning and Gansu have relatively low O3 concentrations [12].

Influenza epidemics were defined as outbreaks exceeding the epidemic threshold for at least seven consecutive weeks or more. The epidemic threshold was determined as the 50th percentile of all the non-zero weekly incidence rate counts over the study period [23]. Cubic spline interpolation was employed to convert the weekly influenza rate and ILI + counts into daily influenza rate and ILI + counts, which were subsequently used to estimate transmissibility [22, 23]. Cubic spline interpolation operates by constructing piecewise cubic polynomial functions that smoothly connect each weekly data point. By doing so, it generates interpolated values for daily data points. These functions ensure not only that the curve passes through each weekly data point, but also that the transitions between segments are continuous and smooth.

\({{\varvec{R}}}_{{\varvec{t}}}\) and adjusted \({{\varvec{R}}}_{{\varvec{t}}}\) estimation

Using the Bayesian framework applied to the branching process model, \({R}_{t}\) was estimated as proposed by Cori et al. [25]. \({R}_{t}\) serves as a measure of transmissibility. A gamma distribution, characterized by a mean of 2.6 days and a standard deviation of 1.5 days, was assumed for the serial interval distribution [26]. As the epidemic progresses, there is an observable decline in the number of susceptible individuals in the population, resulting in a gradual decrease of \({R}_{t}\). To accommodate this change, the adjusted \({R}_{t}\) was calculated using the methodology outlined by Ali et al. [23]. Further details on the \({R}_{t}\) estimation process can be found in the Supplementary Material.

Exploratory data analysis using R t

To accommodate reporting lags ranging from 0 to 14 days, we evaluated the best functional relationship between \({R}_{t}\) and every potential driver in each province, utilizing both exponential and power univariate regression models [22, 23]. The selection of significant drivers with best-fitted functions was based on variations in the Akaike information criterion (ΔAIC):

$$\Delta_i={\Delta\mathrm{AI}C}_{\mathrm i}={\mathrm{AIC}}_{\mathrm i}-{\mathrm{AIC}}_{\mathrm{main}}$$

where \(i=\) exponential or power form of the association and:

$${\mathrm{AIC}}_{min}={\mathrm{AIC}}_{exponential}{,\mathrm{AIC}}_{power}$$

In addition, we employed aggregated data from the eight provinces to construct a general model that investigates the correlation between \({R}_{t}\) and its various drivers. Subsequently, we executed a permutation analysis on 1,000 dummy or null scenarios using regression models to ascertain if the relationship between \({R}_{t}\) and O3 was due to chance. The results of this investigation were compared with the true time-series dataset.

Quantifying the impacts of drivers on \({{\varvec{R}}}_{{\varvec{t}}}\)

We constructed three multivariable regression models to explore the impacts of the different drivers on \({R}_{t}\). “Model 1” evaluated the impacts of the depletion in susceptibility over time and/or inter-epidemic effects on \({R}_{t}\); “Model 2” incorporated the additional effect of O3; and “Model 3” further took into account school holidays, temperature, relative humidity, and absolute humidity. Using the best lagged model and distributed lag non-linear models (DLNMs), we calculated R2 to quantify the extent to which these influencing factors explained the variation in \({R}_{t}\). The formula of the DLNMs model is

$$log\left(E\left[{y}_{t}\right]\right) = \alpha + {\beta }_{1}{T}_{t,l} \left({temp}_{t}\right) + {\beta }_{2}{T}_{t,l} \left({AH}_{t}\right) + {\beta }_{3}{T}_{t,l} \left({RH}_{t}\right)+ {\beta }_{4}{T}_{t,l} \left({O3}_{t}\right) +ns \left(cu{m}_{{inci}_{t}},df =3\right) +ns\left({holidays}_{t},df\right) =3$$
(1)

\({\mathrm{wherey}}_{\mathrm\,t}\) is the expected \({R}_{t}\) on day t. \({\beta }_{i}\) is the regression coefficient value for each factor on\({R}_{t}\). \({T}_{t,l}\) is the cross-basis function of the each factor (temperature,relative humidity, absolute humidity and O3) level at day t and lag l, and the basis function is “poly”, and the natural spline function with a degree of freedom of 3 is used for the lag dimension; ns is the natural spline basis function; df is the degree of freedom; In addition, we controlled the effects of depletion of susceptibles (\(cum\_inci\)) by using a natural cubic spline with 3 df.

The formula of the best lagged model is

$${\mathrm y}_{\mathrm t}=\mathrm e^{{\mathrm\beta}_0}+{\mathrm\beta}_1{\mathrm x}_1+{\mathrm\beta}_2\mathrm x_2^2+{\mathrm\beta}_3{\mathrm x}_2+{\mathrm\beta}_4\mathrm x_2^2\cdots$$
(2)

where \({y}_{t}\) is the expected \({R}_{t}\) on day t. \({\beta }_{0}\) is the intercept term. \({\beta }_{1}\) is the coefficient for the factor \({x}_{1}\). \({\beta }_{2}\) is the coefficient for the squared term of factor \({x}_{1}\). Similarly, each factor has two associated coefficients: one for the factor itself and another for its squared term.

The best lag model (i.e., those with a specific lag and the largest R2 value) and the distributed lag non-linear models (DLNMs, the R package “dlnm”, version 2.4.7) were utilized to compute the R2 values. The difference between the R2 values of Model 1 and Model 2 quantified the effect of O3 on \({R}_{t}\). The R2 values of Model 3 gauged the impact of all factors on \({R}_{t}\). The DLNMs accounts for the overall effect of the multi-day distribution, rather than presenting results solely for the most optimal lag. This distributed modeling approach also factors in the probability of infection from previous days (equivalent to at least the average generation time) to assess the transmissibility, \({R}_{t}\).

Results

Background characteristics by province

As shown in Fig. 1, during the study period (2013–2018), a total of 54 distinct influenza epidemics were identified (seven for Beijing, six for Tianjin, six for Liaoning, five for Gansu, eight for Shanghai, eight for Jiangsu, eight for Guangdong, and six for Hunan) with different lengths and patterns (i.e., single or double peaks). Table 1 presented the summary statistics of influenza rate, ILI + counts, \({R}_{t}\), O3, daily temperature, and humidity in the eight provinces. The areas with daily ILI + counts ranked from high to low were Jiangsu, Guangdong, Beijing, Hunan, Shanghai, Tianjin, Liaoning and Gansu. The areas with daily median ozone concentrations, ranked from high to low were Shanghai, Gansu, Jiangsu, Liaoning, Beijing, Tianjin, Guangdong, Hunan. However, for the 75th percentile of daily O3 concentrations, the ranking from high to low is Shanghai, Beijing, Tianjin, Jiangsu, Liaoning, Gansu, Guangdong, and Hunan. The median values of \({R}_{t}\) for all the epidemics was 1.0, with the minimum values ranging from 0.7 to 0.8 and the maximum values from 1.23 to 2.0. The climate is colder and drier in the Northern provinces.

Fig. 1
figure 1

Weekly influenza activity as ILI + counts (blue lines) along with the predefined epidemics (gray shaded area) in eight different provinces in China from 2013 to 2018

Table 1 Descriptive Statistics (including median, min–max for \({R}_{t}\) and median and IQR for influenza rate, ILI + counts, O3, daily temperature, and humidity) across eight provinces in northern and southern China from 2013 to 2018

Univariate regression model

We first constructed two univariate non-linear regression functions (exponential and power forms) to explore the associations between each driver and \({R}_{t}\) with lagged values of 0–14 d for each province. AIC differences were used to select the best-fitting function for each driver, and the results are shown in Table S1. In all provinces, an exponential fit was better than that of the power form when fitting each factor and transmissibility. On this basis, we determined the significant influencing factors for influenza transmissibility, and incorporated these into further multivariable regression models.

As shown in Fig. 2, the U-shaped or L-shaped association gave the best-fitting model for the association between O3 and influenza transmissibility. In provinces with relatively high O3 concentrations (maximum > 180 µg/m3), the association between O3 and \({R}_{t}\) is more likely U-shaped, such as in Shanghai, Beijing, Tianjin, and Jiangsu. However, in provinces with relatively low O3 concentrations (maximum < 180 µg/m3), the association between O3 and \({R}_{t}\) is more likely to be L-shaped, such as in Liaoning, Gansu, Hunan, and Guangdong. In addition, to further explore whether this U-shaped or L-shaped association was universal, we conducted the same analysis using the aggregated data for eight provinces. The results revealed a persistent U-shaped association (Fig. 3). The permutation test indicated that the true time series of O3 explained a significantly larger variance in \({R}_{t}\) compared the null/dummy time series of O3 (Table S2). Therefore, O3 is a significant driver of influenza transmissibility.

Fig. 2
figure 2

The association between O3 and influenza transmissibility (Rt) of influenza in different provinces (A-H). A-D for four provinces in northern China (Beijing, Tianjin, Liaoning and Gansu), (EH) for four provinces in southern China (Shanghai, Jiangsu, Guangdong and Hunan). The blue line refers to the R_t, and the gray shading is the 95% confidence interval for the transmissibility

Fig. 3
figure 3

A The predicted general U-shaped form (blue line) with 95% CI (shaded region) of association for O3 on influenza transmissibility; (B) violin plot of aggregated O3 across all the eight provinces

Quantifying the impacts of different drivers on \({{\varvec{R}}}_{{\varvec{t}}}\)

Our multivariate regression model explained 28%–68% of the observed variation in \({R}_{t}\). Notably, a considerable part of the variation was explained by model 1, including the depletion in susceptibility and/or inter-epidemic effects (Table 2). Incorporating O3 into model 2 slightly improved the model fit (\({R}^{2}\)), explaining an additional 1%–13% (\(\%\Delta {R}^{2}\)) of the variance in \({R}_{t}\) (Table 2) compared with model 1. To control for the depletion in susceptibility, we repeated three multivariate regression analyses with adjusted \({R}_{t}\). The results showed that O3 significantly improved the prediction of residual \({R}_{t}\), and further inclusion of other influencing factors only marginally improved the model fit. We used two methods to assess the explanatory power of drivers on \({R}_{t}\), and found that the DLNMs explained a higher proportion of the variation in \({R}_{t}\) than the best lag regression model (Table S3).

Table 2 Percentage of the variance of the instantaneous reproduction number (\({R}_{t}\)) explained by the drivers, across respective provinces from 2013 to 2018. The results are based on the distributed lag model (DLNMs) with lags of 0–2 weeks

Discussion

Our study, which used data from 2013 to 2018 across eight provinces, revealed significant variations in influenza epidemics and highlighted a significant association between O3 concentrations and influenza transmissibility. In areas with high O3 levels, we observed a U-shaped relationship with \({R}_{t}\), while an L-shaped association was noted in regions with lower O3. The consistent influence of O3 across all provinces underscores its pervasive role in influenza dynamics. Our multivariate regression emphasized the important effect of O3 on \({R}_{t}\), even when accounting for other factors. These findings will enhance our understanding of the objective relationships between ambient pollutants, especially O3, and the prevention and control of influenza epidemic.

Our results support the evidence of earlier work [11] on ambient O3 and influenza transmissibility showing a significant negative association. The combined data analysis for the eight provinces showed a U-shaped association between O3 and \({R}_{t}\); this U-shaped association was observed for Tianjin, Beijing, Shanghai, and Jiangsu, while in Gansu, Liaoning, Guangdong, and Hunan, the association was L-shaped. An L-shaped association is consistent with the findings of Ali et al. [11], they studied the association between \({R}_{t}\) and ambient O3 across all the types/subtypes of influenza. To our knowledge, this is the first study to report a U-shaped association between O3 and \({R}_{t}\).

Differences in the shape of the association between O3 and \({R}_{t}\) in may be related to the variances in ambient O3 concentrations. At low concentrations, O3 and \({R}_{t}\) are more likely to show an L-shaped association. For example, the maximum O3 concentration in Hong Kong did not exceed 140 µg/m3 in Ali et al.'s research [11] and the maximum O3 concentration did not exceed 180 µg/m3 in Gansu, Liaoning, Guangdong, and Hunan. In contrast, the U-shaped association between O3 and Rt may become more visible at high concentrations of O3 exposure. The maximum O3 concentrations exceeded 180 µg/m3 in Shanghai and Jiangsu, and those in Beijing and Tianjin exceeded 200 µg/m3. These four provinces showed a U-shaped association. Our findings are consistent with the conclusion of Wang et al. that exposure to both low concentration and extremely high concentration of ambient O3 increased the risk of influenza [27]. These results consistently reflect that there may be a U-shaped concentration-reaction correlation between O3 and influenza/ influenza transmissibility, and suggest that attention should be given to the causes of high ambient O3 levels, as appropriate measures to reduce these may be beneficial for reducing influenza transmission.

Similar to previous research [11], our multivariate regression analysis showed that a large proportion of \({R}_{t}\) variance was explained by the intrinsic factors in model 1, and ambient O3 contributed a further 1%–13% of the total variance. Two main reasons may be responsible for the U-shaped association between O3 and influenza transmissibility. First, reductions in influenza transmissibility may be related to the virucidal activity of O3 and its effect on host defense. In vitro studies have reported that O3 can inactivate the influenza virus within hours [28], and animal toxicological studies demonstrate that inflammation, injury, and oxidative stress are reduced following exposure to O3 at concentrations as low as 300 ppb (642\(ug/{m}^{3}\))for up to 72 h [29]. In mice, continuous exposure to 0.5 ppm (1072\(ug/{m}^{3}\))O3 could reduce the lung injury induced by influenza via activate the immune suppression mechanism [30]. Inhalation of ambient O3 has also been shown to enhance type-2 immune responses that promote allergy- and asthma-related responses in healthy human subjects and susceptible populations [31]. Second, higher influenza transmissibility may be associated with a positive relationship between short-term, high-concentration O3 exposure and respiratory infection. For example, previous studies of animals provide evidence for increased susceptibility to pneumonia infection after exposure to high concentrations (2 ppm: 4885 \(ug/{m}^{3}\)) of O3 [32]; and animal studies also report increased injury markers and inflammatory responses following O3 exposure at concentrations of 1 ppm (2142\(ug/{m}^{3}\))or more [33, 34].

There are some potential limitations in this study. First, the seasonal influenza data were collected from surveillance sentinel hospitals, and values varied between years, which could have negatively affected the results. Second, observations from other parts of the world would help evaluate the studied associations in other climatic settings and populations. Third, we interpolated daily incidence rates from the weekly data, which may artificially reduce variability and lead to underestimated effects. Thus, where available, using daily positive ILI rate data would likely prove advantageous.

Conclusions

From 2013–2018, 54 influenza epidemics were studied across eight provinces. A significant correlation was found between O3 concentrations and influenza transmissibility. High O3 regions showed a U-shaped relationship with transmissibility, while low O3 areas had an L-shaped association. This U-shaped finding is novel, emphasizing O3's role in influenza dynamics. In various climatic conditions, this study provides supplemental evidence regarding the impact of O3 on influenza, enriching research on environmental factors driving influenza variations. These findings could be instrumental for public health strategies, suggesting the need to surveillance and manage ambient O3 levels to mitigate influenza spread.