Introduction

Influenza is a common acute respiratory viral infectious disease caused by influenza virus [1]. It has spread worldwide, poses a considerable disease burden on human health and economic development and is currently one of the greatest public health challenges. In 2019, the Global Burden of Disease Study (GBD) estimated that influenza-related lower respiratory infections caused approximately 145,000 deaths across all age groups [2], carrying a heavy mortality burden [2, 3]. In China, the reported incidence of influenza surged to 253.36 per 100,000 people in 2019. Although in the context of COVID-19, the incidence rate of influenza in China decreased to 81.58/100,000 and 47.40/100,000 in 2020 and 2021, respectively [4], the number of reported influenza cases in southern China in the summer of 2022 showed an unusually high incidence [4]. To date, there is no clear evidence of the influencing factors involved.

Due to the potential impact of mechanisms such as affecting the survival rate, stability and transmissibility of the virus, damaging the respiratory system and increasing host susceptibility on the spread [5, 6], meteorological factors (MFs) and air quality are closely related to human health [7,8,9,10,11,12,13,14,15]. These research results were not entirely consistent. For instance, studies have shown that low humidity within a week is associated with a greater incidence of influenza-like diseases [13], but Zhu et al.‘s study showed that both high humidity and low humidity increase the risk of influenza [8]. In a study by Yang et al., the air pollutants PM2.5, PM10, SO2, NO2 and CO increased the incidence of influenza-like diseases, while O3 was negatively correlated with the incidence of influenza-like diseases [9]. However, another study showed that short-term exposure to O3 was positively correlated with the risk of hospitalization among men and the risk of death among women from respiratory diseases [15].

The difference in the impact of influenza virus transmission is related to influenza subtype, and there are differences in the incidence of different subtypes in terms of season and region [16] (such as A (H1N1) and A (H3N2)).

The spread of influenza is also related to various other important factors. For example, Despite decades of testing and drug and nondrug interventions for influenza viruses worldwide, seasonal influenza viruses continue to cause annual outbreaks worldwide, which may be associated with continuous mutations in their strains [17]. In addition, the immunity of the population significantly affects the influenza epidemic. The decrease in acquired immunity can lead to antigen mutations and the invasion of viruses that have not undergone antigen mutations [18], resulting in influenza outbreaks.

The outbreak of infectious diseases can be divided into the following periods: prodrome period, rising period, stable period and falling period. Early detection refers to the detection of abnormal outbreaks or epidemic information from a series of complex factors that drive the occurrence of infectious diseases and pathogen transmission, such as changes in the environment, case discovery, and an increase in the number of cases. Due to the limited evidence on the driving factors in the prepandemic period, it is difficult to predict infectious disease outbreaks accurately in advance, which poses great difficulties for risk assessment and prevention in practical work.

Xiamen city is located on the southeast coast of China in southern Fujian Province (Figure S1). The city has a mild and rainy climate characterized by a subtropical marine monsoon climate. Xiamen, as an economic special zone, has a large population, developed economy, and complete infectious disease reports. In addition, Xiamen has an area of only 1700.61 square kilometres, making meteorological and air quality monitoring more representative.

The deep learning random forest (RF) model and long short-term memory (LSTM) model are often used by researchers for prediction and yield good results [8, 19]. However, few researchers have combined these methods to achieve better predictions for disease. In addition, different from previous studies, the prediction of influenza cases by risk factors in this study was not limited to the actual data of the day but also included the forecasted risk factors values for prediction. This allows for earlier prediction and thus early intervention.

Thus, we aimed to explore the potential risk factors for influenza outbreaks by studying the driving factors in the prodrome period of an abnormal influenza epidemic in Xiamen in 2022 and make early predictions based on the RF-LSTM algorithm to provide a basis for early warning and intervention.

Materials and methods

Data sources

At present, there is no definition or standard for an abnormal influenza epidemic (high incidence of influenza). In this series of studies, it is uniformly defined as the occurrence of influenza outbreaks in a certain region during non-seasonal epidemic periods, or the intensity of influenza activity during seasonal epidemic periods is significantly higher than the historical normal level in the region, due to potential impacts of global warming and climate change such as extreme weather events, pathogen mutations, human behavior, and other factors.

The influenza case data, including clinical cases and laboratory confirmed cases, from 2013 to 2022 for Xiamen reported by medical and health institutions at all levels according to the diagnostic criteria for epidemic disease cases (WS 285–2008) [20], as well as the resident population data used to calculate influenza vaccination rates, were all downloaded from the China Disease Prevention and Control Information System. The influenza pathogen data for this study was obtained from the Fujian Provincial Influenza Monitoring Network Laboratory, and samples were collected from influenza-like illness (ILI) patients by the Influenza Surveillance Sentinel Hospital in Xiamen. ILI is a nonspecific disease that can be caused by various pathogens, including influenza virus, adenovirus, coronavirus, and rhinovirus. Among them, influenza virus is one of the most common pathogens, especially during the flu season. All influenza cases in this study were counted based on the date of onset. The floating population data were obtained from the Yearbook of the Xiamen Special Economic Zone, which is officially published on a yearly basis by the Xiamen Municipal Bureau of Statistics (https://tjj.xm.gov.cn/tjnj/publish/). The influenza vaccination data were obtained from the Fujian Provincial Immunization Planning Information Management System.

The MF data were monitored by Xiamen Meteorological Station and provided by the Fujian Climate Center. The MFs in this study included 6 indicators, of which air pressure (PRS, hPa), relative humidity (RHU, %), air temperature (TEM, °C), and wind speed (WIN, m/s) were the average values in a day, precipitation (PRE, mm) was the accumulated precipitation in a day, and sunshine duration (SSD, h) was the accumulated sunshine hours in a day.

Extreme weather events: The state of the weather deviates significantly from its climatic state. One type is when the hourly, daily, monthly, or annual values of meteorological elements reach a recurrence interval of 25 years or higher. Another type is when the difference between meteorological elements during a certain period and the 30-year average climate state of the same period in previous years exceeds twice the standard deviation. The measurement of extreme weather events includes two dimensions: extremely low or extremely high measured values; extremely long or extremely short duration. And the extreme weather involved in this study comes from the annual climate bulletins of the Climate Center in Fujian Province and Xiamen City.

The daily average concentrations of air quality indicators, including CO (mg/m3), NO2 (µg/m3), O3 (µg/m3), PM10 (µg/m3), PM2.5 (µg/m3) and SO2 (µg/m3), were obtained from the China National Environmental Monitoring Centre (http://106.37.208.233:20035/).

The first case of COVID-19 was reported in Fujian Province on January 22, 2020. The sharp decline in influenza activity in the context of the public health and social measures during the COVID-19 pandemic had significantly affected the long-term multi-year analysis of the association between meteorology factors and influenza and of the prediction of influenza [21]. In order to avoid this serious interference factor, we deleted the data during the COVID-19 and extended the buffer period for the influenza epidemic by two days. Therefore, the impact of MFs and air quality on influenza during the non-COVID-19 epidemic period (2013/01/01-2020/01/24) was analysed, and based on these characteristics, the factors causing an abnormally high incidence of influenza in 2022 were explored. The number of influenza cases from January 1, 2022, to August 31, 2022, was subsequently predicted.

Statistical data analysis

The map of China in Figure S1 was sourced from the Standard Map Service Network (http://bzdt.ch.mnr.gov.cn/). The map of Fujian Province in Figure S1 was drawn using ArcGIS software (version 10.3; ESRI, Redlands, CA, USA).

We first calculated the proportion of influenza virus subtypes in the surveillance results of the Xiamen Influenza Surveillance Outpost Hospital and then reverse calculated the daily number of influenza subtypes reported by Xiamen medical and health institutions through the China Disease Prevention and Control Information System.

The HA gene sequences of 33 influenza A (H3N2) strains isolated from Xiamen were sequenced by the Fujian Provincial Center for Disease Control and Prevention (FJCDC) or downloaded from the Global Initiative on Sharing All Influenza Data (GISAID) database. Among them, 11 strains (8 sequenced by the FJCDC, 3 downloaded from GISAID) were isolated in 2022, and 23 strains (17 sequenced by the FJCDC, 6 downloaded from GISAID) were isolated from 2014 to 2021.

Viral nucleic acid was extracted using a GeneRotex 96 fully automatic nucleic acid extractor (Xi’an Tianlong Technology Co., Ltd., China) and its supporting reagents. The influenza virus HA gene was amplified with universal primers and RT‒PCR reagents (SuperScript™ III One-Step RT‒PCR System with Platinum™ Taq High Fidelity DNA Polymerase; Invitrogen, Cat. No.: 12574035) for reverse transcription and amplification of the viral RNA.

After purification, the amplified products were subjected to high-throughput sequencing using an Ion Torrent S5 sequencer (Thermo Fisher Scientific) and related reagents. The sequences were spliced with a CLC Genomics Workbench (QIAGEN, Ver.11.0). The clades of the strains were analysed through the GISAID FluSurver online website. A phylogenetic tree was constructed with MEGA 6.06 software (neighbour joining method, bootstrap = 1000 resamples).

R software (version 4.2.2, R Foundation for Statistical Computing, Vienna, Austria) was used to visualize and statistically analyse the data. The effects of MFs and air quality on influenza were analysed using Spearman correlation analysis and distributed lag nonlinear models (DLNMs).

When the measurement data followed a normal distribution, the mean and standard deviation were used to describe it, while followed a non normal distribution, daily average, minimum, median (M), quartile (P25, P75), and maximum values were used. When the econometric data followed a normal distribution and had homogeneity of variance, Pearson correlation analysis, independent sample t-test, and one-way analysis of variance were used, while followed a skewed distribution or did not satisfy homogeneity of variance, Spearman correlation analysis and non-parametric rank sum test were used. The difference is statistically significant with P < 0.05.

The DLNM obtained new data through cross-basis feature transformation using a generalized linear model combined with nonlinear and delay effects for modelling. Influenza, MF, and air quality factors were included in the DLNM in this study. TYt is the t-day number of influenza cases in this study P < 0.05.

The basic model was as follows:

$$\log\left[E\left(Yt\right)\right]=\alpha+\sum\beta ixi+NS\left(time_{t},\;df\right)\;+\;\sum NS\left(Zj,\;df\right)\;+\;Dow$$
(1)

Yt is the t-day number of influenza cases in this study, α was the constant term, βi was the coefficient of the explanatory variable in the regression model, xi was the influencing factor (meteorological and air quality factors included in this study), timet was used to control for seasonal and long-term trends, df was the degrees of freedom, Zj was used to control the mixing effect of other MFs and air quality, Dow was the dummy variable for the effect of the day of the week (controlling for the day of the week effect), and NS (…) was a natural spline function (used to control for seasonal and long-term trends, the day of the week effect and other MFs and air quality) [8, 22]. Df was determined by the AIC minimum principle, which compares different df of a certain item with the same df of other items and then selects the df with the lowest AIC value to establish the model and conduct sensitivity analysis and comparison [23, 24]. Based on the incubation period and epidemic characteristics of influenza and to capture more complete information on impact effects, we determined a maximum lag time of 21 days (d).

FZ-MOS system was used to predict the weather in Xiamen. The core of the Model Output Statistics (MOS) forecasting model is to use fine grid meteorological elements such as the European Centre for Medium-Range Weather Forecasts (ECMWF) as forecasting factors, select historical data with the same or similar changing backgrounds, and use statistical methods to establish forecasting models. FZ-MOS was built by conducting daily forecasts of meteorological elements such as precipitation, humidity, and temperature for the next 1–10 days at the Fujian Provincial Meteorological Station over the years and designing various training period schemes for verification and improvement.

In this study, based on the intelligent grid comprehensive forecast results of FZ-MOS, we calculated the meteorological forecast values for the next 1 to 10 days at the grid point where Xiamen Station was located starting at 20:00 every day. This manuscript displays the predicted results only for Day 1, Day 4, Day 7, and Day 10.

The mean absolute deviation (MAD) was used to evaluate the effectiveness of meteorological forecasting. The MAD calculation formula was as follows:

$$\:\text{M}\text{A}\text{D}=\frac{1}{n}\sum\nolimits_{i=1}^{n}|{x}_{i}-m\left(x\right)|$$
(2)

TensorFlow 2.8.0 software (Google Brain Team, Mountain View, CA, USA) and Python 3.8.13 software (Python Software Foundation, Delaware, USA) were used to predict cases of influenza through the RF-LSTM method.

An RF algorithm integrates multiple trees through the idea of integrated learning. Its basic unit is the decision tree, and its essence is a major branch of machine learning [25]. It is an extended variant of bagging.

The operation process includes five steps. The first step is feature splitting, which is used to split the data and construct supervised learning data. Then, random sampling is performed, and N samples are obtained by randomly sampling N times from the original dataset and placing them back. Third, a decision tree is constructed and trained for each sampled sample dataset. The fourth step is decision-making, in which each tree makes its own decisions based on the data. Finally, there is decision aggregation, with the average value of the tree predicted as the final result [26]. The RF process is shown in Figure S2.

The result of the final decision tree is as follows.

$$\:\:\:\text{D}=\{\left({x}_{1},{y}_{1}\right),\left({x}_{2},{y}_{2}\right),\dots\:,({x}_{n},{y}_{n}\left)\right\}$$
(3)
$$\:{h}_{i}\left(x\right)={\upzeta\:}(\text{D},{D}_{bs})$$
(4)
$$\:\text{H}\left(\text{x}\right)=\frac{1}{T}\sum\nolimits_{t=1}^{T}{h}_{i}\left(x\right)$$
(5)

(Note: D: all the data; x: feature; y: label; hi(x): the output of each base model; \(\:\zeta\:\): base learning algorithm; Dbs: sample set generated by self-service sampling.)

LSTM is an artificial intelligence deep learning algorithm that is suitable for time series data analysis [8]. The whole process is divided into three parts: The first step is to determine what information should be discarded from the neuron state through the sigmoid layer of the “forgetting gate”. The second step is to determine what new information is stored in the neuron state. Finally, the “output gate” determines the output information.

The single cell of LSTM is displayed in Figure S2.

The LSTM calculation formulas are as follows:

$$f_{t} = \sigma (W_{f} [h_{t-1},\ x_{t}] + b_{f})$$
(6)
$$i_{t} = \sigma(W_{i} \cdot [h_{t-1},\ x_{t}] + b_{i})$$
(7)
$$\tilde{\boldsymbol{C}}_{t} = \text{tanh} (W_{c} \cdot [h_{t-1},\ x_{t}] + b_{c}$$
(8)
$$C_{t} = f_{t} \cdot C_{t-1} + i_{t} \cdot \tilde{\boldsymbol{C}}_{t}$$
(9)
$$o_{t} = \sigma (W_{o} \cdot [h_{t-1},\ x_{t}] + b_{o})$$
(10)
$$h_{t} = o_{t} \cdot \text{tanh} (C_{t})$$
(11)

(Note: ht−1: the output of the previous neuron state; Xt: the input of the current neuron state; σ: the sigmoid function; Ct−1 is updated to Ct.)

RF-LSTM was used for time series prediction in this study. The goal of this method was to fully utilize the advantages of the RF and LSTM models to improve the accuracy and stability of time series prediction.

The specific steps were as follows:

  • Data preparation: Time series data, which included time, climate data and influenza incidence data, were prepared into a format suitable for model training; that is, the data were divided into input sequences and target sequences. The input sequence was the historical time step data used for prediction, while the target sequence was the future time step data after the input sequence.

  • RF feature extraction: The RF model was used to extract features from the input sequence. Through the RF training process, the importance of meteorological factors that affected the number of influenza outbreaks was ranked, redundant information was eliminated, and the resulting model was subsequently used as a new feature combination input model.

  • LSTM model training: The LSTM model was trained using input sequences and target sequences.

  • Prediction: The trained LSTM model was used to predict the test time series.

Figure S2 shows the RF-LSTM process.

Four evaluation indicators were used to quantify the accuracy of the model predictions, namely, the root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and symmetric mean absolute percentage error (SMAPE) [8]. The smaller their values are, the higher the prediction accuracy, and the higher the confidence level (best value = 0; worst value=+∞) [8, 27,28,29].

The calculation formulas of the four evaluation indicators are as follows:

$$\text{RMSE} = \:\sqrt{\frac{1}{\text{n}}\sum\nolimits_{\text{i}=1}^{\text{n}}{({\text{P}}_{\text{i}}-{\text{X}}_{\text{i}})}^{2}}$$
(12)
$$\text{MAE} = \:\frac{\sum\nolimits_{\text{i}=1}^{\text{n}}\left|{\text{X}}_{\text{i}}-{\widehat{\text{X}}}_{\text{i}}\right|}{\text{n}}$$
(13)
$$\text{MAPE} = \frac{100\%}{n} \sum\nolimits^{n}_{i=1} \left|\frac{P_{i} - X_{i}}{P_{i}}\right|$$
(14)
$$\text{SMAPE} = \frac{100\%}{n} \sum\nolimits^{n}_{i=1} \frac{|P_{i} - X_{i}|}{|\frac{|P_{i}| + |X_{i}|}{2}|}$$
(15)

(Note: Pi: the observed daily incidence of influenza cases on day i; Xi: the predicted daily incidence of influenza cases on day i, where i = 1…, n [8].)

Results

Descriptive statistics

A total of 50,278 influenza cases, including clinical cases and laboratory confirmed cases, were reported by medical and health institutions in Xiamen from 2013 to 2022, with 330, 602, 431, 799, 586, 3994, 8437, 4710, 954, and 29,435 cases, respectively. Among them, laboratory confirmed cases accounted for 33.04%, 86.41%, 90.57%, 81.78%, 64.34%, 61.63%, 48.33%, 49.75%, 81.86%, and 91.23%, respectively. The number of cases in 2022 was particularly high, accounting for 58.54%. The proportion of laboratory confirmed cases exceeded 90.00% in 2015 and 2022. Figure S3 shows the clinical cases and laboratory confirmed cases of influenza. The correlation between the number of influenza cases and the proportion of laboratory confirmed cases was not significant (r = 0.15, P = 0.68).

A total of 2760 influenza patients were diagnosed by a network laboratory between 2013 and 2022. Among them, the numbers of those positive for influenza viruses A (H3N2), B (Victoria), A (H1N1) pdm09, B (Yamagata), B (line not defined), A (H7N9), and A (unsubtyped) were 919, 742, 679, 409, 6, 3, and 2, respectively. Figure S4 shows the detailed weekly distribution. The numbers of A (H3N2), B (Victoria), A (H1N1) pdm09, and B (Yamagata) virus infections calculated based on the daily proportion of each virus subtype and the daily influenza count reported by health care institutions were 28,699, 11,998, 7362, and 1529, respectively. Among them, in the winter and spring epidemic seasons of 2017–2018 and 2021–2022 in Xiamen, B (Yamagata) and B (Victoria) were dominant, accounting for 53.37% and 100.00% of all pathogens, respectively. During the winter epidemic season of 2018–2019, influenza A accounted for 96.52%, with A (H3N2) and A (H1N1) pdm09 accounting for 65.35% and 30.72%, respectively. During the winter epidemic season of 2019–2020, there was a tripartite pattern of A (H1N1) pdm09, A (H3N2), and B (Victoria), accounting for 26.71%, 33.30%, and 39.99%, respectively. B (Victoria) was the main prototype of influenza in the spring and summer of 2019, accounting for 91.37%. A (H3N2) dominated the 2022 summer epidemic season, accounting for 95.60%. The influenza cases in the summer of 2022 accounted for 83.72% of the year and 49.02% of all influenza reported from 2013 to 2022. Among them, the A (H3N2) cases in the summer of 2022 accounted for 83.90% of all A (H3N2) reported from 2013 to 2022. Figure 1 shows the detailed daily distribution.

Fig. 1
figure 1

Cluster bar chart of the daily distribution of influenza pathogens in Xiamen City, 2013–2022

The number of, daily average, minimum, P25, median, P75, and maximum A (H3N2) cases were 19,282, 1.99, 0, 0, 0, 4 and 101, respectively, from January 1, 2013, to January 24, 2020, and were 23,578, 64.60, 0, 0, 0, 2 and 1460, respectively, in 2022. Compared with that from January 1st, 2013, to January 24th, 2020, the maximum PRE in 2022 was significantly lower (70.00 mm, 172.70 mm), while the daily average was closer, but the P75 was larger (1.00 mm, 0.60 mm). The daily average, minimum, median, and maximum O3 concentrations in 2022 were 91.95, 30.00, 90.00, and 213.00 µg/m³, respectively, which were higher than the values of 82.76, 12.00, 80.00, and 178.00, respectively, from January 1, 2013, to January 24, 2020. Figure 2 shows more information on MFs and air quality.

Fig. 2
figure 2

Box chart of MFs and air quality

The daily values of PRS, RHU, SSD, TEM, CO, PM10, and PM2.5 exhibited significant seasonal periodicity, meanwhile CO, NO2, PM10, PM2.5, and SO2 exhibited downward trends annually. Figure 3 shows more information on MFs and air quality.

Fig. 3
figure 3

Time series of MFs and air quality

The incidence of A (H3N2) increased sharply from week 23 to week 30 in 2022, with a peak significantly higher than the same period from 2013 to 2019, and significantly higher than previous peak periods (week 51 to week 4 of the following year). The highest value occurred in the 25th week, with an average of 1240 cases per day, compared to less than 2 cases in the same period in previous years.

The PRE and RHU during the high incidence and precursor periods of A (H3N2) anomalies in 2022 were higher than those in 2013–2019, while SSD and TEM were lower. The CO, NO2, PM10, PM2.5, and SO2 concentrations in 2022 were almost consistently lower than the average for the same period from 2013 to 2019 throughout the year. Compared with the average values in previous years, the PRS, WIN, and O3 values during the high-incidence period of influenza A (H3N2) showed no special changes in 2022. Figure 4 shows more information on the weekly distributions of A (H3N2), MFs, and air quality.

Fig. 4
figure 4

Daily average values of weekly distribution of A (H3N2), MFs, and air quality

Correlation analysis

The connecting line on the upper right side of Fig. 5 shows a significant correlation between A (H3N2) in 2022 and all the MFs and air quality factors in this study, while the correlations between A (H3N2) and PRS, RHU, WIN, and O3 from January 1, 2013, to January 24, 2020, were not significant.

Fig. 5
figure 5

Heatmap of Spearman correlation analysis of A (H3N2), MFs, and air quality

The heatmap at the bottom left of Fig. 5 shows a strong positive correlation between PM2.5 and PM10 (r = 0.91, P < 0.01). There was a strong negative correlation between PRS and TEM (r=-0.86, P < 0.01) and a negative correlation between PRE and SSD (r=-0.56, P < 0.01). The detailed correlations between MFs and air quality and A (H3N2) are presented in Fig. 5.

DLNM

The cumulative effect of low PRE (20–50 mm) on the risk of infection with A (H3N2) first increased and then decreased with increasing lag time, with 46 mm having the most significant impact (RR = 1.99, accumulated lag = 8 d, 95% CI = 0.04-100.78). However, the impact of high PRE (> 100 mm) on A (H3N2) appeared after a lag of 2 weeks, but there was no cumulative effect.

The cumulative effect of RHU (51–60%) on the risk of infection with influenza A (H3N2) decreased with increasing lag time and was no longer significant after 4 days. The cumulative effect of RHU (70–78%) on the risk of infection with influenza A (H3N2) gradually increased after 12 days of lag (peak: RHU = 73%, accumulated lag = 21 d, RR = 1.25, 95% CI = 0.48–3.27).

The cumulative effect of SSD on the risk of infection with influenza A (H3N2) first increased and then decreased with increasing lag time. Shorter SSDs (≤ 3 h) and longer SSDs (≥ 7 h) had more significant cumulative effects on the risk of infection with influenza A (H3N2) (peak: SSD = 0 h, accumulated lag = 20 d; RR = 3.49, 95% CI = 0.01–1.60 × 103).

The lower the TEM (≤ 21 °C), the greater the risk of infection with influenza A (H3N2). The cumulative effect first increased and then decreased with increasing lag days (peak: TEM = 4 °C, accumulated lag = 4 d, RR = 18.97, 95% CI = 0.11–3.36 × 103); nevertheless, it rapidly increased again after a delay of 12 days. The higher the TEM was (≥ 26 °C), the greater the risk of infection with influenza A (H3N2), but the risk decreased sharply after 3 days of accumulation.

The greater the CO concentration was (≥ 1.1 mg/m3), the greater the risk of infection with influenza A (H3N2). The cumulative effect first increased and then decreased with increasing lag days (peak: CO = 1.1 mg/m3, accumulated lag = 16 days, RR = 2.68, 95% CI = 0.15–47.66).

NO2 (≥ 60 µg/m3) was a risk factor for infection with influenza A (H3N2), and the risk effect increased with increasing NO2 concentration (peak: NO2 = 127 µg/m3, lag = 21 d, RR = 2.78, 95% CI = 0.81–9.53); however, the cumulative effect decreased sharply and was no longer significant after 2 days.

O3 (≤ 30 µg/m3) was a risk factor for infection with influenza A (H3N2), and the cumulative effect first increased and then decreased with increasing O3 concentration and lag days (peak: O3 = 12 µg/m3, accumulated lag = 20 d, RR = 1.60, 95% CI = 0.18–14.36). The larger the O3 concentration was (> 85 µg/m3) and the longer the lag time, the greater the cumulative effect.

With increasing concentrations of high PM10 (> 107 µg/m3) and PM2.5 (> 93 µg/m3), the cumulative risk effect on influenza A (H3N2) increased. However, the impact of high PM10 on influenza A (H3N2) was short term, so the cumulative effect was not significant. Moreover, medium PM10 concentrations (30–60 µg/m3) were also a risk factor for influenza A (H3N2), and the cumulative effect increased with increasing lag days (peak: PM10 = 46 µg/m3, accumulated lag = 21 d, RR = 1.39, 95% CI = 0.55–3.51).

SO2 (14–42 µg/m3) was a risk factor for influenza A (H3N2) infection, and the cumulative effect first increased and then decreased with increasing SO2 concentration and lag time (peak: SO2 = 26 µg/m3, accumulated lag = 11 d, RR = 3.62, 95% CI = 1.40–9.35).

Additional characteristics of MFs and air quality related to influenza A (H3N2) are presented in Fig. 6.

Fig. 6
figure 6

3-D plots and cumulative lag effect plots of the impacts of MFs and air quality on the risk of A(H3N2), 2013/01/01-2020/01/24

Interaction between influencing factors

The interaction between TEM and SSD was statistically significant (t=-4.09, P < 0.01). With increasing SSD, the number of influenza A (H3N2) cases in the low-TEM group first increased and then decreased.

There was also an interaction effect between TEM and SO2 (t=-8.09, P < 0.01). When the air SO2 concentration was low, the number of influenza A (H3N2) cases showed a faster downward trend with increasing TEM, but this downward trend became more gradual with increasing air SO2 concentration. In addition, there were interaction effects between RHU and PM10, RHU and PM2.5, SSD and SO2, TEM and NO2, CO and SO2, NO2 and SO2, PM10 and SO2, PM10 and PM2.5, and PM2.5 and SO2. Figure S5 and Table 1 show the interactions between the influencing factors studied.

Table 1 Interaction between the influencing factors studied, 2013/01/01-202/01/24

Prodrome period of the abnormally influenza epidemic in 2022

The daily values of PRE, RHU, SSD, and TEM in the prodrome period of the abnormal influenza A (H3N2) epidemic (19–22 weeks) in the summer of 2022 were significantly different from the average values of the same period from 2013 to 2019 (P < 0.05). The detailed information are presented in Table 2.

Table 2 Descriptive statistics and difference tests of various factors during the prodrome period of the abnormal influenza epidemic in 2022 (daily within 19–22 weeks) and the same period from 2013 to 2019

In addition, during the prodrome period of the abnormal A (H3N2) epidemic in the summer of 2022, there were two instances of continuous precipitation, each lasting for 9 days, with a daily maximum value of 47.7 mm. RHU showed a trend of first decreasing and then increasing, with a minimum value of 70.5%. Meanwhile, TEM and SSD were lower than the average values of the same period in previous years (2013–2019), with the lowest TEM value being 16.0 ° C and no sunlight exposure for 9 consecutive days. The O3 concentration first increased and then decreased, reaching a maximum value of 164 µg/m3. The range of these factors were consistent with the risk factor range of A (H3N2). The detailed information are presented in Figs. 3, 4 and 6.

The concentrations of CO, NO2, PM10, PM2.5 and SO2 during the prodrome period of the abnormal A (H3N2) epidemic in the summer of 2022, were significantly lower than the average levels of the same period in previous years (2013–2019). The range of these factors did not fall within the risk factor range of A (H3N2). The detailed information are presented in Table 2.

Floating population and vaccination

From 2013 to 2022, the dose of influenza vaccine in Xiamen showed an upward trend, but the vaccination rate was relatively low (only 2.85% in 2022), while the number of mobile population in Xiamen (around 300,0000) had not changed significantly over the years. There was no significant correlation between the number of A (H3N2) cases and influenza vaccine dose (r = 0.38, P = 0.28), as well as between the number of A (H3N2) cases and the number of floating population (r = 0.15, P = 0.70) in Xiamen from 2013 to 2022. Figure S6 shows the annual changes in influenza vaccine doses, the floating population, and the number of influenza A (H3N2) cases in Xiamen from 2013 to 2022.

Pathogenic variation evolution

Phylogenetic analysis revealed that the A (H3N2) influenza virus prevalent in Xiamen during the 2022 epidemic season was similar to the vaccine strain A/Cambodia/e0826360/2020 (H3), which was recommended for the 2021–2022 epidemic season and belongs to 3 C.2a1b.2a.1a.

The evolutionary branch of influenza A (H3N2) virus that was prevalent before COVID-19 in Xiamen was 3 C.2a1b.2, while the evolutionary branch of influenza A (H3N2) virus that was prevalent in 2022 was 3 C.2a1b.2a. The detailed results are shown in Fig. 7.

Fig. 7
figure 7

Genetic evolution analysis of influenza A (H3N2) virus circulate in Xiamen, 2022

Mfs prediction and prediction based on this for a (h3n2)

The predicted MAD values for PRE on days 1, 4, 7, and 10 in 2022 were 6.18 mm, 7.50 mm, 8.08 mm, and 10.02 mm, respectively; for RHU, the values were 7.75%, 10.36%, 10.52%, and 10.43%, respectively; and for TEM, the values were 1.14 °C, 2.14 °C, 2.46 °C, and 2.49 °C, respectively. There was a significant difference in the trend test of the MAD value of PRE (F = 46.25, P = 0.02). Figure 8 and Table 3 show the predicted and actual values of the MFs.

Table 3 Prediction of MAD values using different MFs and Cox Stuart trend testing
Fig. 8
figure 8

Predicted and actual daily values of MFs and A (H3N2) from Jan to Aug 2022 (Note: The predicted values of A (H3N2) were based on the actual MFs values including PRE, RHU, and TEM, as well as the predicted values for days 1, 4, 7, and 10.)

Based on the actual values of the MFs, including PRE, RHU, and TEM, the predicted RMSE, MAE, MAPE, and SMAPE for influenza A (H3N2) were 82.42, 30.40, 0.60, and 0.90, respectively. Table 3 also shows the evaluation effect of the influenza A (H3N2) prediction based on the MF prediction values. Figure 8 shows the predicted and actual values of influenza A (H3N2) cases.

Discussion

The research results indicate that the end of each year, the beginning of the following year (December to February), and the summer (May to July) in Xiamen are the epidemic seasons for influenza, with sporadic cases occurring between epidemics. This study shows that during the 2022 summer pandemic, the activity level of influenza significantly exceeded historical normal levels, and A (H3N2) dominated absolutely (during the 2022 summer pandemic season, A (H3N2) accounted for 95.60%, accounting for 83.90% of all A (H3N2) from 2013 to 2022). In China, as the screening measures for respiratory tract infection pathogens during the COVID-19 were strengthened, the proportion of laboratory confirmed cases of influenza in Xiamen in 2021 and 2022 exceeded 80% (since the influenza cases in 2020 mainly occurred before the COVID-19, laboratory confirmed cases accounted for 49.75%), which may increase the number of influenza reports. However, the proportion of laboratory confirmed cases of influenza during the COVID-19 epidemic in 2021 accounted for 81.86%, which was not accompanied by a high incidence of influenza, and the correlation between laboratory confirmed cases and the number of influenza cases from 2013 to 2022 was not significant, so the impact was limited. Moreover, there were differences in the distributions of various pathogenic subtypes of influenza and heterogeneity in the dominant strains during the epidemic and interepidemic phases (Figure S4). Therefore, it is crucial to explore the influencing factors and predict influenza outbreaks at the pathogenic subtype level to overcome the issue of the vague and changeable characteristics of influenza epidemics.

With the influence of the driving factors in the early phase of the abnormally high-incidence period of influenza, the density of infected people gradually increased, human-to-human transmission became the dominant driving factor, and the number of infected people in the outbreak period showed an exponential sharp increase to an abnormally high level. Therefore, to avoid significant bias in association analysis results due to abnormal data caused by nonearly detection drivers during the outbreak period, the DLNM was not used in this study to analyse the effects of MFs and air quality on influenza A (H3N2) in 2022.

The correlations between influenza A (H3N2) and PRS and WIN from January 1, 2013, to January 24, 2020, were not significant (Fig. 5). Moreover, there were no abnormal occurrences of PRS or WIN during the precursor period of abnormally high incidence in 2022 compared to the weekly average values from 2013 to 2019 (Fig. 4). Therefore, PRS and WIN were excluded from the subsequent analysis in this study.

The DLNM analysis results in this study revealed a positive correlation between PRE (20–46 mm) and influenza A (H3N2) during the non-COVID-19 epidemic period, which is consistent with the inverted U-shaped results reported by Lu et al. [30]. However, their study showed that weekly (not daily) precipitation (10–97.5 mm) was positively correlated with influenza (not A (H3N2)), which is different from the findings of this study. In addition, a study in which virus subtypes were not classified suggested a monotonically increasing positive correlation between precipitation and influenza, which is inconsistent with the results of this study [31]. The relationship between RHU and influenza A (H3N2) had an inverted U shape, with RHU (70–78%) having the most significant cumulative effect on the risk of infection with influenza A (H3N2), which increased with the number of lag days. Zhou et al. also showed that low RHU had no significant effect on the incidence of influenza A (H3N2), and relatively high RHU was also associated with influenza A (H3N2), which further proved that persistently high relative humidity may be related to an increased risk of influenza A (H3N2). However, inconsistent with the findings of this study, Zhou et al. reported that high RHU also increases the risk of influenza A (H3N2) infection, possibly because the GAM and the DLNM used in the study may have contributed to the inconsistency of the results. In addition, the difference in the data types (weekly vs. daily) used by Zhou et al. and in this study may also impact the results [32].

Studies have shown that humid and rainy conditions are conducive to the spread of influenza A (H3N2) in low-latitude areas, respiratory droplets are the main route of influenza transmission, and their abundance of salt and protein is beneficial for the survival of A (H3N2). The low-temperature and humid environment indirectly affects the evaporation rate of respiratory droplets, thereby affecting the survival and transmission of A (H3N2) [5]. However, some studies also suggest that precipitation does not directly affect the onset of influenza but rather affects the onset of influenza by affecting relative humidity [33]. Relevant mechanistic studies have shown that the humid tropical rainy season can promote the spread of influenza by increasing virus deposition on surfaces and enhancing virus survival in droplets [34].

In this study, in the early stages of the abnormal A (H3N2) epidemic (May June) in 2022, there was sustained precipitation (Fig. 3), which is consistent with the findings of the Xiamen Climate Bulletin of 2022 that “Xiamen City experienced sustained long-term precipitation from May to late June 2022, with abnormally high precipitation during this period” [35]. The weekly average PRE during this period was also significantly higher than the average level from 2013 to 2019 (Fig. 4). In addition, compared with the same period of previous years (2013–2019), there was a significant difference (P < 0.05) in the daily values of PRE and RHU during the prodrome period (19–22 weeks) of this abnormal epidemic, during which two precipitation events lasting for 9 days occurred and RHU showing a trend of first decreasing and then increasing (Figs. 3 and 4; Table 2). The range of these factors is consistent with the risk factor range of A (H3N2). Therefore, the extreme precipitation event caused by continuous precipitation and the sustained high relative humidity may be the driving factor for the abnormal epidemic of influenza A (H3N2) in Xiamen in 2022.

The results of this study indicated that TEM was negatively correlated with the onset of influenza A (H3N2). The risk of infection with influenza A (H3N2) decreased with increasing temperature at low temperatures (≤ 21 °C). Zhang et al. also reported the contributions of low temperature to the transmission efficiency of influenza A (H3N2) [6]. Low temperature can reduce the body’s immune function through mechanisms such as vasoconstriction, changes in metabolic rate, and decreased antioxidant capacity leading to a greater incidence rate. In addition, some researchers suggest that temperatures between 18 °C and 23.5 °C may lead to larger outdoor gatherings and increased risk of disease transmission [29].

A low SSD (≤ 3 h) and high SSD (≥ 7 h) had more significant cumulative impacts on the risk of influenza A (H3N2) infection in this study. However, Du et al.‘s study showed a negative correlation between average temperature and sunshine duration and influenza, but they did not target specific subtypes [36]. For sunlight duration, some studies suggest that a shorter duration of sunlight cannot promote the synthesis of vitamin D in the human body, thus failing to have an immune boosting effect [36]. However, increased exposure to sunlight can increase the risk of infection by increasing people’s activity [37].

During the prodrome period of the abnormal A (H3N2) epidemic in the summer of 2022, TEM and SSD were lower than the average values of the same period in previous years (2013–2019), with the lowest TEM value being 16.0 ° C and no sunlight exposure for 9 consecutive days (Fig. 3; Table 2). The 2022 Climate Bulletin of Xiamen also reported extreme weather events with low temperatures and abundant rainfall during this period [35]. Thus, this may be one of the important reasons for this anomaly.

Although all six air pollutants in this study were risk factors for influenza A (H3N2) (Fig. 6), the concentration ranges of CO, NO2, PM2.5, PM10, and SO2 did not match those of the precursor period of the abnormally high incidence in Xiamen (Fig. 3; Table 2), and the weekly average values from May to June 2022 were lower than those from 2013 to 2019 (Fig. 4), which suggests that these air quality factors were not the main cause of this outbreak.

The weekly average O3 concentration in the pre-peak period of influenza was lower than that from 2013 to 2019 (Fig. 4). According to the DLNM analysis, the longer the lag period was, the more obvious the cumulative effect when O3 was at a high concentration (> 85 µg/m3). As a secondary pollutant, high concentrations of O3 have been shown to increase the risk of viral infection by generating strong oxidative free radicals and disrupting metabolism and other mechanisms in the human body [38]. In addition, this study revealed that a low concentration of O3 (≤ 30 µg/m3) was also a risk factor for the incidence of influenza A (H3N2). With increasing lag days, the incidence risk first decreased and then increased with increasing O3 concentration. Ali et al. suggested that the link between environmental ozone and decreased influenza transmission may be due to the antiviral activity of ozone and its effect on host defence [39]. However, the antiviral effect of low concentrations of O3 was not significant, enhancing the risk of influenza onset. O3 at medium concentrations was shown to reduce the risk of influenza, which may explain the reduced risk of influenza A (H3N2) at medium concentrations of O3. Yang et al.‘s study also reported the U-shaped relationship of O3 with influenza transmission, which is consistent with the results of this study [40]. During the abnormally high incidence of influenza in Xiamen, the O3 content was within the range of medium to high concentrations, with a weekly average lower than that in previous years. However, due to the UV intensity and NOx concentration, the O3/NOx balance was affected, and in the early stage of the abnormally high incidence, the temperature and sunshine duration in Xiamen were relatively low, resulting in a decrease in the O3 concentration. Due to the reaction balance, the NOx concentration increased, and research has shown that NOx increases the risk of disease through a decrease in virus-specific immunity and an increase in cellular inflammation [41]. Although the concentration range of O3 in this study was not sufficient to confirm the impact of O3 on the abnormally high incidence, and the increase in NOx (non-NO2) concentration caused by reaction equilibrium may have had an impact on the incidence, NO2 was not a risk driving factor for the abnormally high incidence in Xiamen in the summer of 2022.

This study revealed interactions between MFs and air pollutants, suggesting that their influence is not independent but rather mutual. For instance, the interaction between TEM, SO2, and SSD was statistically significant (P < 0.05). The interaction between low to medium TEM (3.9-22.29 ° C) and low to medium SSD (0–4 h) enhanced the impact on A (H3N2), which may be due to a decrease in human immunity and an increase in virus stability. Low sunshine hours can lead to lower temperatures, and during lower temperatures, the survival time of the virus will be prolonged. In addition, low sunshine hours can indirectly affect vitamin D in the human body, leading to a decrease in immunity [36]. The interaction between the two increases the risk of contracting A (H3N2). Low concentration SO2 (1.0–6.9 µg/m3) interacted with medium and low SSDs (0–4 h), significantly increasing the influence on the incidence of influenza A (H3N2). Several studies have shown that low concentrations of sulfur dioxide do not result in acidic environments, which to some extent benefits the survival of viruses and reduces their transmission [14]. In addition, short sunshine duration may increase atmospheric stability and limit the diffusion of SO2 [42]. The interaction between the two increases the risk of A (H3N2) infection. During this precursor period of abnormally high incidence in Xiamen, both SO2 and SSD were at low levels, and these interactions may have contributed to the abnormally high incidence of influenza in Xiamen in 2022.

This study involved a systematic evolutionary analysis of the HA gene of the A (H3N2) influenza virus and revealed that the administration of the vaccine component A/Cambodia/e0826360/2020 (H3) before the abnormally high-incidence period may have had a good protective effect against the prevalent A (H3N2) influenza virus at that time.

According to the data from the 37th weekly report of the National Influenza Center of China in 2022, 86.4% of the A (H3N2) influenza viruses circulating in China at that time were similar to the A/Cambodia/e0826360/2020 (H3) chicken embryo strain [43], but only 77.2% were similar to the A/Darwin/9/2021 (H3) chicken embryo strain [44]. This finding indicates that the results of systematic evolution analysis are consistent with those of antigen analysis.

In addition to antigen matching between vaccine strains and epidemic strains, the vaccination rate is an important factor affecting the protective effect of vaccines. Compared with that in other developed countries and regions, the influenza vaccination rate in mainland China is extremely low [45] and reached only 2.84% in Xiamen in 2022, indicating that vaccination cannot form a sufficient immune barrier in the population, which will contribute to the spread and prevalence of the virus. In addition, this study showed that there was no significant correlation between floating population, vaccination rate and influenza incidence rate (P < 0.05). Therefore, from a demographic and statistical perspective, the vaccination rate and floating population may not have been the main driving factors for the abnormally high incidence of influenza.

Influenza A (H3N2) was not detected domestically in China during the COVID-19 epidemic, but a variant strain appeared in 2022. First, the mutated strain was imported from overseas, where it reached a population with insufficient immune barriers, which is conducive to rapid transmission. However, the abnormally high incidence of influenza occurred in only the southern provinces, despite the presence of similar immune barrier populations in the north, thus, the driving effect of the mutated strain on the abnormally high incidence of influenza was limited. Second, the immune barrier formed by the influenza epidemic before COVID-19 may not have provided an effective immune effect against the virus that was prevalent during this time. Even if the immune barrier formed by this natural infection had a certain cross-immune effect, the 2-year span of the COVID-19 epidemic may have also allowed this effect to disappear.

However, on the basis of the main driving forces of MFs, low vaccination rates, the emergence of new mutant strains and insufficient immune barriers formed by natural infections may further strengthen the driving force behind this abnormally high incidence.

Table 3 shows that a low MAD indicates a good meteorological prediction effect in Xiamen, but as the number of predicted days increased, the MAD gradually increased (although only the MAD trends on the 1st, 4th, 7th, and 10th days of PRE were significant, P < 0.05), indicating an increase in prediction error. The prediction results for influenza A (H3N2) showed that the four evaluation indicators predicted by the actual meteorological value of influenza A (H3N2) were greater than the meteorological prediction value, and the difference in the four evaluation indicators predicted by the four sets of meteorological prediction values of influenza A (H3N2) was not significant (Table 4). It was suggested that using meteorologically predicted values to predict influenza A (H3N2) was more accurate than predicting actual meteorological values, and the prediction effect did not decrease with increasing meteorologically predicted days.

Table 4 Evaluation effect of A (H3N2) prediction based on MFs from Jan to Aug 2022

Both meteorologically predicted values and true values are important features of influenza A (H3N2) prediction, but the predicted values integrate historical, current, and future meteorological information, which has temporal characteristics and contains richer information. Moreover, MFs that affect diseases often have a certain lag effect, and meteorologically predicted values can capture long-term meteorological trends and changes, such as seasonal or periodic changes, which can alleviate the lag effect to a certain extent. Obtaining meteorological values with temporal characteristics does not involve disease data and can be seen as a process of enriching feature information. Therefore, using meteorologically predicted values will have a better effect on predicting influenza A (H3N2) than using only the meteorological values of the day (meteorologically true values). These findings suggest that in the future, additional meteorological predictions can be used for predicting influenza A (H3N2) to achieve earlier intervention goals.

Currently, the climate is constantly changing worldwide, and climate change will cause extreme weather, such as continuous precipitation, high temperatures and low temperatures [46]. Relevant review studies have shown that climate change and the potential vulnerability of a country or region (weak coping ability in health and social fields) will intensify each other, amplifying the negative impact on people [47, 48]. Therefore, a study of the sensitivity and tolerance of influenza to climate change, combined closely with predictions, is beneficial for better addressing the challenges of climate change.

This study has some limitations. First, because the daily prototype incidence of influenza was calculated based on pathogen monitoring data and influenza data reported by Xiamen medical and health institutions, the influenza data in the reporting system of the China Center for Disease Control and Prevention includes laboratory confirmed cases and a small number of clinically diagnosed cases, and there may be some deviation in the use of the prototype proportion monitored by sentinel hospitals for the counter calculation of clinical diagnosis cases. Second, due to the interaction effect of SSD and TEM on the incidence of influenza A (H3N2), this study chose to accurately predict the incidence of influenza A (H3N2) by using three other MFs and historical data on influenza A (H3N2) without the predicted value of SSD. However, if the SSD prediction values can be captured, the prediction effect of influenza A (H3N2) may improve. Third, the pre-existing immune system in the population might had reduced its protective effect against A (H3N2) variant genotype 3 C.2a1b.2a.1a, thereby increasing the transmission capacity of A (H3N2), but there was currently a lack of evidence. The newly evolved A (H3N2) data for the prediction training set was very limited, therefore, the prediction results of this study may underestimate the true values.

Conclusion

Extreme weather conditions, including continuous rainfall (15–70 mm), low TEM (below 23 °C), high RHU (70–78%), and low SSD, may be the important driving factors for the abnormally influenza A (H3N2) epidemic in the summer of 2022 in Xiamen. The low vaccination rate, the emergence of new mutant strains, and the insufficient immune barrier formed by natural infections may have further strengthened the epidemic intensity. A high concentration of air pollutants was also a risk factor for influenza A (H3N2), but it may not be the driving factor for this abnormally high incidence. However, the reaction equilibrium between O3 and NOx, as well as the interaction between SSD and TEM and SO2, may have had an impact on this incidence. The combination of the LSTM and RF models for influenza A (H3N2) prediction had good accuracy, and the use of meteorologically predicted values was better than the use of only meteorologically true values for prediction. This study can help relevant departments prepare for influenza outbreaks during extreme weather, provide a scientific basis for prevention strategies and risk warnings, better adapt to climate change, and improve public health.