Introduction

In early April 2009, cases of human infection with an influenza A (H1N1) virus emerged in Mexico and the United States (Real-Time 2009). Then the virus spread quickly and widely around the world, resulting in great damage to the public health. As of 1 August 2010, more than 214 countries and regions worldwide had reported laboratory confirmed cases of pandemic influenza H1N1 2009, including over 18,449 deaths (WHO 2010). Of these, 128,051 confirmed cases and 805 deaths were from China (CDC 2010). Assessing the drivers of this novel H1N1 virus quantitatively will provide a useful reference for optimizing prevention and control measures in public health.

Owing to the pronounced seasonal and geographic patterns that influenza outbreaks display (Jiang et al. 2012), meteorological factors are considered to be among the potential drivers of influenza transmission. Previous work has pointed out that lower temperature and lower relative humidity favor both influenza virus survival (McDevitt et al. 2010; Tang 2009) and disease transmission (Lowen et al. 2007, 2008; Polozov et al. 2008). Cold weather and precipitation may also increase the possibility of indoor activities, which promotes virus transmission through social contacts (Mikolajczyk et al. 2008). Subsequent research about influenza A (H1N1) suggested that the transmission of this influenza is significantly associated with geographical differences in minimum temperature and special humidity (Chowell et al. 2012).

Recently, studies have indicated that absolute humidity strongly modulates both the survival and transmission of the influenza virus much more than relative humidity (Lipsitch and Viboud 2009; Shaman and Kohn 2009; Shaman et al. 2010). Although analysis of influenza transmission has gradually begun to devote more attention to absolute humidity gradually, the research about determinants of influenza A (H1N1), especially absolute humidity and the lag time of its effects on outbreaks still needs more quantitative evidence.

Undoubtedly, both influenza cases and meteorological factors display a marked variation of geographic distribution over time. However, the common methods for correlation analysis of influenza drivers typically focus on nonspatial statistical approaches or multivariate regression models (Barreca and Shimshack 2012; Chowell et al. 2012), which neglect the spatial information and temporal series of influenza outbreaks. Spatio-temporal statistic is increasingly used in the investigation of the transmission and dispersal of diseases (Riley 2008; Wang et al. 2010; Yu et al. 2011). When consider the spatio-temporal characteristics of influenza transmission and meteorological factors, it is necessary to explore the relationship between them through spatio-temporal methods.

In this study, we first characterized the trajectory of the occurrence and dispersal of an influenza A (H1N1) pandemic in mainland China by a series of spatial mass centers of the disease. Then a spatio-temporal similarity index was applied to evaluate the meteorological driving factors of the influenza A (H1N1) pandemic, especially assessing the effect and time lag of the key driver. By exploring the spatio-temporal similarity between the influenza A (H1N1) pandemic and meteorological factors, we aimed to provide a better understanding of the relationship between them, and this will help to benefit the surveillance strategies.

Materials and methods

Influenza surveillance data processing

Our estimates relied on daily laboratory-confirmed cases from the nationwide internet-based disease reporting system, the China Information System for Disease Control and Prevention (CISDCP) (Wang et al. 2008). The CISDCP covers all the provincial, prefectural and county centers for disease control and prevention, as well as 95.3 % of the provincial, prefectural, and county hospitals (9084 in total), and 84.0 % of township clinics (38,175 in total) in mainland China (Fang et al. 2012). All the information on patients with influenza-like-illness during the pandemic period was provided by the database above, including age, sex, occupation, residential address, onset date, diagnosis date, location and clinical outcome, etc.

In our study, the pandemic period was divided into 51 7-day time slices from 10 May 2009 to 30 April 2010. The start date corresponding to each time slice was outlined in Table 1 and mainland China was divided into 2866 counties, which are political subdivisions of provinces. To allocate each case to the correct county, we identified all the cases by their locations using the county-level geographical map of China in ArcGIS 10.2 (ESRI Inc.). Then weekly, the total number of confirmed cases in each county was calculated based on their diagnosis date for the subsequent analysis.

Table 1 Correspondence between the week number and start calendar date of it

Meteorological data processing

We relied on the China Surface Climate Dataset (Daily) of China Meteorological Data Sharing Service System (China Meteorological Data Sharing Service System 2008) for daily meteorological data during the study period. This dataset was completed using monthly national surface information-based reports from each province of China, following the statistical method rules for national surface climate data (1961–2000) and surface weather observing regulation.

In our study, potentially influential indicators were recorded by 673 weather stations across the mainland China, including temperature (transferred from °C to K), relative humidity, atmospheric pressure, wind speed, precipitation and sunshine hours. Week level meteorological data were calculated by averaging the daily value of each meteorological variable in the same 7-day time slice as the case data from 10 May 2009 to 2 May 2010. The meteorological data for each county were acquired from the nearest weather station based on the distances between the center of the county and all weather stations.

Absolute humidity data was not included in the raw meteorological dataset we obtained, but it could be derived from the existing meteorological data. In contrast to relative humidity, which is defined as the ratio of vapor pressure to saturation vapor pressure with respect to water (Dai 2006), absolute humidity is defined as vapor density or vapor concentration isolated the water vapor content. In other words, in a system of moist air, it is the ratio of water vapor pressure to the volume occupied by the mixture. Absolute humidity is calculated by the following equations (Xu et al. 2014).

$$\rho_{\omega } = \frac{e}{4.615 \cdot T}$$
(1)
$$e = r \cdot e_{s}$$
(2)
$$e_{s} = \left\{ \begin{aligned} 6.1078\exp \left[ {\frac{{17.2693882\left( {T - 273.16} \right)}}{T - 35.86}} \right], \quad {\text{ if }}T > 273.16 \hfill \\ 6.1078\exp \left[ {\frac{{21.8745564\left( {T - 273.16} \right)}}{T - 7.66}} \right], \quad {\text{ otherwise}} \hfill \\ \end{aligned} \right.$$
(3)

where ρ ω is the absolute humidity (kg/m3), e is the vapor pressure (hPa), e s is the saturation vapor pressure (hPa), T is temperature (K), and r is the relative humidity (hPa/hPa).

On the basis of the equation, we calculated the absolute humidity and found its relationship with the number of confirmed cases during the pandemic. By comparing the significance of impacts of absolute humidity on the outbreaks with that of other meteorological indicators, the main determinants of the influenza was assessed quantitatively.

Analysis of spatio-temporal transmission by mass center method

To identify the overall high-risk areas at nationwide scale during the pandemic, we calculated the county-level influenza mass centers relied on the mass center method. A definition of this method is briefly described below.

Given a series of discrete points: (x i y i ), i = 1, 2, 3,…, n with an attribute value q i for each in a time slice, the location of the weighted mass center (xy) in this time slice can be calculated as follows:

$$x = \frac{{\sum\limits_{i = 1}^{n} {q_{i} x_{i} } }}{{\sum\limits_{i = 1}^{n} {q_{i} } }}, \, y = \frac{{\sum\limits_{i = 1}^{n} {q_{i} y_{i} } }}{{\sum\limits_{i = 1}^{n} {q_{i} } }}$$
(4)

where, q i is the total number of influenza cases for county i in 1 week. Therefore, the weight \(\frac{{q_{i} }}{\sum q }\) of county i’s centroid is the ratio of infected cases in that county to the total infected cases across the mainland China. This method was performed to obtain the influenza mass centers for every time slices. Then, by connecting all the mass centers in order, an epidemic curve was created to assess the general spatial transmission and total trend of influenza A (H1N1) over time. In addition, the meteorological distribution curves were completed in the same way for the similarity analysis with the epidemic curve.

Spatio-temporal analysis of indicators and the time lag analysis of key indicators

The Fréchet distance and similarity index

The relationship between the occurrence of influenza cases and meteorological factors was estimated by a similarity index based on the Fréchet distance approach (Alt and Godau 1995).

The Fréchet distance is a measure of similarity between two curves, which takes into account both the location and order of points along the curves. A popular illustration of the Fréchet distance is as follows: suppose a man is walking his dog on a leash. He is walking on the one curve while the dog is on the other. Both of them may vary their speed, but are prohibited from going backwards. Then the Fréchet distance is the shortest leash that is sufficient to control two separate curves all the way (Alt and Godau 1995). In case of our study, the dog resembles the disease while the man represents the driving force of it. The dynamic association between the influenza and its determinants will alter with spatio-temporal changes, just as the leash between the man and the dog.

The function of this distance was first defined by Fréchet (Fréchet 1906). Given two curves in a metric space: f:[aa′] → Vg:[bb′] → V, δ F (fg) denoting their Fréchet distance, is defined as follows:

$$\delta_{F} (f,g) = \mathop {\inf }\limits_{\begin{subarray}{l} \alpha [0,1] \to [a,a'] \\ \beta [0,1] \to [b,b'] \end{subarray} } \mathop {\hbox{max} }\limits_{t \in [0,1]} \left\| {f[\alpha (t)] - g[\beta (t)]} \right\|$$
(5)

where, α(0) = aα(1) = a′, β(0) = bβ(1) = b’, α and β are arbitrary continuous nondecreasing function. The difference between the two curves fg is evaluated by δ F (fg) as the equation defined above.

However, since the direct computation of the Fréchet distance is complicated, a discrete variation of it was proposed to simplify this approach. This discrete Fréchet distance δ dF (fg) provides a good approximation to the continuous evaluation δ F (fg). By computing the coupling distance instead of the exact Fréchet distance between two arbitrary curves, this measurement can be computed efficiently through a simple equation (Eiter and Mannila 1994).

On the other hand, it is also complicated to provide a threshold for the comparison with the Fréchet distance to decide whether these two curves are similar or not. Therefore, we standardized the distance by dividing the discrete Frechet distance by the largest Fréchet distance, in order to facilitate determining the degree of similarity in the limited bound. The similarity between the two curves is derived as follows based on the definition of the Fréchet distance (Wang et al. 2013).

$$SI(f,g) = \frac{{\mathop {\inf }\limits_{\begin{subarray}{l} \alpha [0,1] \to [a,a'] \\ \beta [0,1] \to [b,b'] \end{subarray} } \mathop {\hbox{max} }\limits_{t \in [0,1]} \left\| {f[\alpha (t)] - g[\beta (t)]} \right\|}}{{\mathop {\sup }\limits_{\begin{subarray}{l} \alpha [0,1] \to [a,a'] \\ \beta [0,1] \to [b,b'] \end{subarray} } \mathop {\hbox{max} }\limits_{t \in [0,1]} \left\| {f[\alpha (t)] - g[\beta (t)]} \right\|}}$$
(6)

where the denominator is the maximum Fréchet distance and the numerator is the discrete Fréchet distance. Obviously, SI(fg) ∊  [0, 1], a value of SI near 1 indicates great similarity between the two curves, while a value near 0 indicates less similarity between them.

Curves of the cases number and meteorological indicators were normalized into the same length [0, 1]. Then the similarity between the influenza and its drivers was measured by the index SI(fg) above to evaluate the impact of each indicator. The same method was applied to the calculation of the time lag, by moving the key driving curve with meteorological data specific weeks ahead of the pandemic. By determining the lagging week with the highest similarity, we could determine the time lag that influenza A (H1N1) responded to the variation of the key meteorological factor.

Results

Overall description of the pandemic

During the pandemic period from 10 May 2009 to 30 April 2010, a total of 174,675 cases with confirmed influenza A (H1N1) virus infection were reported, of which 127,793 (73.2 %) were laboratory confirmed. The total confirmed number of cases varied a lot between 2699 counties in mainland China, ranging from 1 (in 143 counties) to 4235 (in Liuyang City, Hunan Province). The overall provincial-level distribution of the total number of influenza cases across China is presented in Fig. 1 (Fig. 1). During the whole pandemic, all 31 provinces were infected with the influenza to a different degree. Among these, the high-prevalence areas were mainly distributed in the eastern and southeastern coastal areas and the northwestern part of China, while the Northeast China was a relatively low-risk region. Shaanxi, Hunan, Zhejiang, Guangxi, Beijing and Guangdong were the six provinces with the largest number of confirmed cases, numbering far more than the other provinces. This significant variation of the occurrence in different regions revealed the spatial characteristic of the influenza outbreaks, which should be included into the analysis of its transmission.

Fig. 1
figure 1

Distribution of total influenza A (H1N1) cases in 31 provinces across the mainland of China, from 10 May 2009 to 30 April 2010

From the time profile of the reported daily total cases (Fig. 2), we concluded that the number of the confirmed cases increased rapidly at the beginning of September. Then there was a marked decline in October, followed by a resumed rapid growth peaking by the end of November 2009. Of all the 174,675 cases, 114,061 (65.3 %) were confirmed from November 2009 to January 2010 when weather conditions were cold and dry, while only 3883 (2.2 %) were confirmed from June to August 2009. The apparent seasonal occurrence of the influenza A (H1N1) cases suggests that the temporal information and the potential driving force of meteorological indicators have an influence on the transmission of the disease.

Fig. 2
figure 2

Daily number of influenza A (H1N1) cases from 10 May 2009 to 30 April 2010

General trend of spatio-temporal transmission of the pandemic

The weekly mass center curve of influenza A (H1N1) cases during the study period is shown in Fig. 3, and represents a general trend of the spatial spread of the high-risk areas over time (Fig. 3). All the mass centers of influenza A (H1N1) distributed from 26.39°N–35.38°N and 99.36°E–117.82°E, following an East to West spreading pattern. From the track of the influenza, we concluded that the epidemic areas were sporadic in Eastern China in the early phase, and then transferred to the southeast of China during the first 21 weeks from 10 May 2009. In central China, in some regions such as the northern part of Shaanxi, Hubei, Hunan, and the northern part of Jiangxi, a similar pattern of southerly movement occurred between the 22nd week to the 47th week, except that the amplitude of the transmission was reduced. In the anaphase of the pandemic period, the movement of high-risk areas slightly transferred westwards and spread to its largest-scale in the 48th week. Thereafter, mass centers of the influenza transferred backwards quickly to South China again at the end of the pandemic.

Fig. 3
figure 3

The mass center curve of influenza A (H1N1) during the 51-week pandemic period

The similarity indices between influenza A (H1N1) and meteorological indicators

The similarity indices between the curves of influenza A (H1N1) and potential meteorological drivers were calculated (Fig. 4). Among all the indicators, absolute humidity had the highest similarity value with the number of influenza cases (0.55), which indicated that it played a predominant role in the occurrence and transmission of the influenza. Compared with absolute humidity, temperature tended to have a modest association with the influenza as the similarity value was smaller (0.43). Moreover, the similarity indices of sunlight hours and precipitation, atmospheric pressure and relative humidity were all probably between 0.3 and 0.4, suggesting a relatively weak driving force on the outbreaks. Wind speed had the weakest relationship with the disease as the similarity index was less than 0.1.

Fig. 4
figure 4

Values of similarity index between the number of influenza A (H1N1) cases and meteorological indicators

The time lag of influenza A (H1N1) compared to absolute humidity

Absolute humidity could be considered as the dominant determinant, since its similarity index of it was much higher than all the other meteorological factors (at least 0.12 larger than the others). Therefore, the time lag of its effectiveness was further evaluated (Fig. 5).

Fig. 5
figure 5

Similarity index between the number of influenza A (H1N1) cases and absolute humidity. The X-axis denotes the time lag in weeks

As shown in Fig. 5, the similarity between influenza A (H1N1) and absolute humidity had a globally decreasing tendency over time, indicating that the feedback effects of absolute humidity declined as the time gap increased. The association between influenza A (H1N1) and absolute humidity tended to become stronger gradually and maintained strength until the 5-week long lagging time. Then, the similarity index between them continued declining, with small fluctuations and rebounds, and finally reached its lowest value. The effect of absolute humidity on influenza A (H1N1) transmission could be divided into three phases.

Phase 1: coupled period of 5-week time lag. In this phase, the influenza cases and absolute humidity were intensively coupled, since the similarity index between them was consistently high (over 0.5). The similarity index between the two curves increased gradually at first from 0 to 3 week’s delay and reached a peak value of 0.59 at the 3-week delay. The results indicated that the incremental influence of absolute humidity on the occurrence of influenza A (H1N1) was a maximum for a 3-week time lag.

Phase 2: 10 weeks with relatively weak effect. With a time lag from 6 to 15 week, the similarity index between the two mass center curves fluctuated from 0.2 to 0.5 and decreased gradually, suggesting there was an unapparent and unstable association between influenza A (H1N1) and absolute humidity.

Phase 3: recession period from the 15th week. When the time lag was longer than 15 weeks, the role absolute humidity played on the disease became slight as the similarity between them dropped rapidly and dropped to 0.026 finally.

Discussion

Our study described the spatial and temporal transmission pattern of the Influenza A (H1N1) Pandemic from 10 May 2009 to 30 April 2010 in mainland China, on the basis of the influenza surveillance data collected from CISDCP. This epidemic spanned a large geographical scale and varied significantly over both different regions and periods. Given this variation, a similarity index, which considered the spatial and temporal characteristics of confirmed cases, was proposed to identify the trend of high-risk areas across mainland China. Our analysis indicates that the regions with the highest risk of infection are distributed in the southeastern coastal part while the Northeastern China has a low-risk. The general direction of the spread was from east to west, with a similar pattern of southerly movement in each phase. The spatio-temporal method was also applied to the evaluation of the role that meteorological factors played on influenza outbreaks and the time lag of the key factor. The results indicated that the spatial transmission of influenza A (H1N1) is associated with multiple meteorological variables, and is largely driven by absolute humidity with a 3-week delay. Among all the meteorological factors, absolute humidity has the highest similarity with the outbreaks, implying that it has much more significant constraints than the others, especially than relative humidity. The findings also suggest the modest associations between the disease and temperature, sunlight hours and precipitation, atmospheric pressure and relative humidity. These conclusions are consistent with animal experiments on seasonal influenza virus and previously related findings (Lowen et al. 2007; Shaman et al. 2011; Shaman and Kohn 2009). Furthermore, the 3-week delay of the occurrence of influenza A (H1N1) cases responding to an absolute humidity is reasonable. It is approximately equal to the sum of the time lag that the virus responded to the absolute humidity variation, the influenza incubation (median was 4 days), the duration of symptoms (median was 7 days) (Tuite et al. 2010), and the period that patients were reported to China CDC.

Geoinformatics plays an increasingly important role in studies of the transmission and dispersal of disease outbreaks. As it has been recognized, both influenza cases and meteorological factors displayed a marked geographical variation in different regions over time, which should be involved in the related analyses. The mass center method and similarity index of two curves were applied in our study. Compared to the nonspatial statistics, this method takes into consideration both spatial location and temporal order. Thus, it has advantages in the analysis of the influenza with spatio-temporal attributions. This mass center method was completed by abstracting the spread into a curve with high-risk points ordered along the curve. However, substantive details of cases would be lost during the abstracting process, as it regarded the mass center as representative across the whole of mainland China in each time slice. Further study should explore potential methods offering more accuracy for mass centers, such as providing a weight for the location of each county according to its population. Some spatial approaches for hot spots detection like the local indicators of spatial association (LISA) or the statistics Gi and Gi* could also be applied to identify the high-risk areas (Anselin 1995; Getis and Ord 1992).

As the overall description of the pandemic shown, the daily count of confirmed cases increased rapidly at the beginning of September, which was caused by students starting a new school term (Yu et al. 2012). Previous work also highlighted the importance of school cycles on the transmission dynamics of influenza A (H1N1), as well as population density, public holidays, travel flow and so on (Fang et al. 2012; Yu et al. 2013). The driving force of these social factors deserves to be explored in further work, especially combined with the meteorological indicators studied above.

Understanding the pattern of the transmission and the driving force of meteorological factors on the disease will contribute to the provision of effective prevention and control measures. The finding of a 3-week delay of the response to absolute humidity variation fills the gap between the meteorological forecast and disease control. Moreover, it makes the prediction of the influenza A (H1N1) by absolute humidity variation possible, and this should be studied further.