1 Introduction

Several studies have demonstrated the adverse effects of air pollution on morbidity and mortality [1,2,3]. The daily counts of patients can be described as the result of a series of air pollution exposure events of different intensities for a few days. The health outcomes of air pollution exposure at a given day may be sustained in the future and may vary according to the intensity and the lag period. Gasparrini [4] referred to this type of dependency as the exposure–lag–response association.

The distributed lag nonlinear model (DLNM) [4,5,6] was developed to quantify the effect. The model is based on the definition of a cross-basis, which is obtained by combining of two linear or nonlinear functions to model the exposure–response and lag–response relationships, respectively. Researchers commonly use DLNM combining with the generalized linear model (GLM) [7] or the generalized additive model (GAM) [8]. With this modeling framework, the main complexity of quantifying the association lies in model selection, including the selection among various candidates of confounders, smoothing functions, and parameters of the GAM smooth terms, as well as the lag–response functions, exposure–response functions, and parameters of the DLNM cross-basis. Sensitivity analysis is also required when a priori options are used in the models.

To our knowledge, the effect of air pollution on respiratory disease morbidity is seldom quantified with DLNM, especially in China, which is suffering from smog pollution severely [9]. Smog pollution has led to public displays of dissatisfaction and became a big political issue. According to the 2014 Chinese Environmental State Report published by the Ministry of Environmental Protection of China, only eight of the 74 monitored cities met urban air quality standards in 2014 in terms of annual average value, accounting for 10.8%. In other words, 89.2% of the monitored cities experienced air pollution at a range of different levels.

In this paper, we aim to quantify the association between air pollution and daily respiratory disease admissions in Chongqing City. The most significant confounders are identified by model selection. The relative risk of air pollution is estimated with DLNM. The result sensitivity to DLNM functions and parameters, especially the lag range, is investigated.

2 Data

Chongqing City is the capital of Chongqing province in China, with an area of 5473 km2 and a population of 8.18 million in 2014. Located in southwest China, the city is between the Qinghai Tibet Plateau and the Middle-Lower Yangtze plains, with complex geographic features of mountains and hills. Chongqing is also called the city of hills. As a well-known “stove,” summers in Chongqing are hot, with temperatures between 27 and 38 °C in July and August, and a recorded maximum temperature of 43.8 °C. Chongqing is a foggy city, with an average of 104 foggy days per year, more than London and Tokyo.

A total of 8841 patients with respiratory diseases were hospitalized at two top level hospitals in Chongqing City from January 1, 2008 to December 31, 2013. One hospital is the Chongqing General Hospital, which has 736 beds and admitted 19,713 patients in 2013. The other hospital is the Second Affiliated Hospital of Chongqing Medical University, which has 1380 beds and provided service to approximately 45,000 inpatients in 2013. Patients are selected according to their primary diagnosis ICD-10 codes in electronic medical record (EMR). All respiratory diseases appearing as inpatient primary diagnosis except tumor are used, including pneumonia, rhinitis, sinusitis, bronchitis, chronic obstructive pulmonary disease, and asthma. ICD-10 codes used are A15, A36, A37, J00-J06, J11, J12, J15, J18, J20, J21, J30-J32, J40-J46, J67, and J82.

Daily meteorological data from January 1, 2008 to December 31, 2013 is obtained from the Chongqing Meteorological Bureau. The data includes the minimum, maximum, and mean temperature; the maximum wind speed; the minimum, maximum, and mean atmospheric pressure; and the mean relative humidity.

Daily air pollution data for the same period was collected by the Chongqing Environmental Protection Bureau. Daily air pollution intensities, including the intensities of inhalable particulate matter (PM10), sulfur dioxide (SO2), and nitrogen dioxide (NO2), are collected at 29 monitoring stations distributed across the city and averaged. The actual levels of air pollution intensity are not published. Instead, the intensity of each air pollutant is represented as an air pollution indicator (API), which is calculated with piecewise interpolation as following equation:

$$ \mathrm{API}=\frac{I-{I}_i}{I_{i+1}-{I}_i}\left({\mathrm{API}}_{i+1}-{\mathrm{API}}_i\right)+{\mathrm{API}}_i. $$
(1)

In Eq. (1), I represents air pollutant intensity. Ii and APIi are parameters shown in Table 1, which are defined in the ambient air quality standards of China (GB 3095-1996) [10].

Table 1 API standard

The daily meteorological and air pollution data have no missing values. A summary of the data is provided in Table 2.

Table 2 Descriptive statistics of the daily respiratory disease admission, meteorological, and air pollution data of Chongqing City from Jan 1, 2008 to Dec 31, 2013

3 Methods

A Poisson distribution and a log link are used during fitting. This study uses GAM and DLNM to estimate the effect of air pollution on daily admissions in the following two steps.

At first, confounders are controlled in a forward stepwise search style. All candidate confounders, as listed in Table 3, are added into the model iteratively. In each iteration, the best confounder, which minimizes the Akaike’s information criterion (AIC) and Bayesian information criterion (BIC) of the model, is selected. A confounder or its moving averages will not be used since it or one of its moving averages has been selected. In other words, a confounder and its moving average terms are never simultaneously added to a model. The value of n that fitted the data best for each meteorological data item is set by model selection. When all candidate confounders are used or no confounder can decrease the AIC and BIC of the model, we get a baseline model, shown as Eq. (2):

$$ \mathrm{Log}\left(E\left({y}_t\right)\right)=\alpha +{\upbeta}_1{\mathrm{Hol}}_t+{\upbeta}_2{\mathrm{DOW}}_{\mathrm{t}}+\mathrm{s}(t)+s\left(\operatorname{Max}\_{\mathrm{Temp}}_{t,8}\right)+s\left({\mathrm{Hum}}_t\right). $$
(2)
Table 3 All candidate confounders

In Eq. (2), t is the date sequence. yt refers to the number of the observation. E(yt) denotes the estimated daily hospital admissions on day t. α is the intercept. Holt denotes the holiday indicator. DOWt is the day of week indicator. The last three terms are the smooth terms of the selected confounders. All smooth terms are defined by a thin plate regression spline. The smoothing parameters are automatically selected with generalized cross validation (GCV). The term s(t) is the smooth term of date sequence. The term s(Max_Tempt,8) is the smooth term of the last 8-day moving average of maximum temperature. The term s(Humt) is the smooth term of mean relative humidity. The descriptive statistics of the original data of maximum temperature and mean relative humidity is shown in Table 2. Equation (2) is a generalized additive model (GAM) with five explanatory variables, which is often combined with a distributed lag nonlinear model (DLNM) to explore the nonlinear exposure–lag–response associations, as in [4, 5, 12].

Then, a cross-basis matrix obtained by applying DLNM to air pollutant intensity is added to the right side of the baseline model. For air pollutant ak (ak ∈ {PM10, NO2, SO2}), the cross-basis matrix M(f(ak, t), w(l)) is a combination of the exposure–response function f(ak, t) and the lag–response function w(l), where ak, t is the observed intensity of ak at day t, and 0 < = l < = L. L is the lag range, which is set a priori to 40. The candidates of f(ak, t)  are linear and B-spline functions. The knots for the B-spline of f(ak, t) are placed at 25th, 50th, and 75th quartiles of ak, t. The candidates of w(l) are piecewise constant, linear functions, and B-spline functions with or without intercepts. The knots for the B-spline of w(l) are placed at 1/3 and 2/3 of L. The B-spline without an intercept will left-constrain the smooth lag–response curve to start from 0. This assumption implies that the air pollution exposure at a given day does not exhibit health effects for that day. A total of 10 models are tested for each ak. And the best model with minimum AIC and BIC is selected to estimate the health effects of the air pollutant.

To test the sensitivity of DLNM to different parameters, we change the lag range L, B-spline degrees, and knots when building the cross-basis matrix M(f(ak, t), w(l)). New models with same parameter configurations are tested by their AIC and BIC. Best-fitted models for each configuration are selected and compared with each other.

All modeling and analysis are implemented with R (version 3.0.2), especially with R packages of mgcv (version 1.8-6) and dlnm (version 2.1.3).

4 Results

The best-fitted baseline model is represented by Eq. (2). The confounders and their coefficients and significance of terms are listed in Table 4. The AIC and BIC of the baseline model are 9153.037 and 9568.598, respectively. The significant confounders are date sequence day of the week indicator, holiday indicator, maximum temperature of the previous 8 days, and mean relative humidity.

Table 4 Coefficients and significance of baseline model terms

The cross-basis matrixes of DLNM are added into the baseline model to quantify the health effects of air pollution exposure. Table 5 presents models with cross-basis from the combination of different f(ak, t) and w(l). The fit results for PM10 are expressed by AIC and BIC. Model 2 achieves the best performance.

Table 5 Models with different combination of f(ak, t) and w(l), with the AIC and BIC values of alternative models to estimate the exposure–lag–response association between PM10 and daily admissions

Relative risk (RR) of respiratory disease admissions associated with PM10 exposure is calculated versus PM10 API = 0. As shown in Fig. 1, this model indicates a linear decrease of the relative risk along lag days for all PM10 intensities. The greater PM10 intensity, the more quickly the relative risk decreases. The relative risk is higher than 1.0 in 0 to 23 days after exposure. The greater PM10 intensity, the higher the relative risk is. On the contrary, the relative risk is lower than 1.0 after 23 days. The greater PM10 intensity, the lower the relative risk is.

Fig. 1
figure 1

Relative risk (RR) of respiratory disease admissions associated with PM10 exposure of 0–200 API and a lag period of 0–40 days. (1.1) 3-D surface of the exposure–lag–response association estimated from model 2. (1.2) Lag–response curves for PM10 exposure of 30, 80, 100, 130, and 180 API. (1.3) Exposure–response curves at lag of 5, 15, 20, 25, and 35 days. The two boldface lines in (1.1) correspond to the dotted lines (1.2) and (1.3), which represent the lag–response curve for a given PM10 level of 100 API and the exposure–response curve for lag 20, respectively. Relative risk is calculated versus PM10 API = 0

Some results of the sensitivity analysis are shown in Table 6. The best-fitted models are consistent across different functions and parameters used to fit the cross-basis. This trend suggests that DLNM is robust in this aspect for our study design. When we change the lag range to 45 days or greater, the fit results become unstable. This trend suggests that DLNM is sensitive to a lag range greater than 40 days for our study design.

Table 6 AIC and BIC values of models with different parameter configurations

The fit results of SO2 and NO2 are unstable for a lag range of 5–40 days. The t tests show that the linear or smooth terms of SO2 and NO2 on each exposure day are insignificant (results not shown).

A Q-Q plot of the deviance residuals of the model versus a standard normal distribution is shown in Fig. 2, which shows that the model fits the data very well. The autocorrelation plots of the raw data and the deviance residuals are shown in Figs. 3 and 4 separately. Although the raw data plot shows strong positive autocorrelation, the residuals plot shows that the model results in random residuals. The result of Ljung-Box test also supports the adequacy of the model. For significance level 0.05 and lags number 40, the test statistic is 55.704, less than \( {\chi}_{1-0.05,40}^2, \) which is 55.758, and p value is 0.051.

Fig. 2
figure 2

Q-Q plot of the residuals of the best-fitted model

Fig. 3
figure 3

Autocorrelation plot of the daily admissions

Fig. 4
figure 4

Autocorrelation plot of the residuals of the best-fitted model

5 Discussion

In this paper, we use DLNM and GAM to estimate the risk of air pollution exposure for respiratory disease admission in Chongqing City, China. The results suggest that the exposure–lag–response association between the PM10 intensity and the risk can be represented by a bi-dimensional cross-basis. This cross-basis combines two linear functions. One models the exposure–response curve along the PM10 intensity and the other models the lag–response curve along the lag days. The results do not suggest a direct linear relationship between the relative risk and the PM10 intensity or PM10 exposure lag days because the coefficients of the two linear functions are different at different PM10 intensities or lag days. The respiratory disease morbidity surface is skewed as shown in Fig. 1(1.1).

A basic assumption of DLNM is that the exposure–response and lag–response functions are independent [4]. This assumption is not supported by our results because the coefficients of the exposure–response function change at different lag days and the coefficients of the lag–response function change with different PM10 intensities. This trend may cause the undercoverage of confidence intervals and inflated type I errors for tests, as suggested by Gasparrini [4].

Sensitivity analysis shows that the fit with the best performance is robust when different functions and parameters are used to model the exposure–lag–response surface. However, the fit is unstable when the lag range is 45 lag days or more. This result suggests that the health effect of PM10 exposure may be too small to estimate efficiently after 45 days in our study design. A lag range of 30 or 40 days is widely used in this issue [4, 5, 11, 12]. A shorter lag period of a week or less was also suggested by Schwartz et al. [13] for long-term trends or seasonal terms. A good practice would be to select an appropriate lag range by sensitivity analysis when DLNM is used.

One interesting result of the best-fitted model is that the relative risk estimated for low PM10 intensities and high lag days is relatively high as shown in the nearest corner of Fig. 1(1.1). This result may exhibit the so-called mortality displacement effect [14]. This effect decomposes the lag window into three successive periods. The first period represents the positive effect on morbidity or mortality for short lags of air pollution exposure. The second period shows a zero or negative effect for longer lags because of the depletion of vulnerable individuals who have had their death or admission brought forward by a few days or weeks by air pollution. The last period demonstrates a positive effect when the depletion effect subsides, or when vulnerable individuals are recruited because of the net result of aging and environmental factors, such as infection, smoking, and occupation. The relatively higher risk at the end of the lag window may correspond to the last period of the mortality displacement effect.

The health effects of NO2 and SO2 on the daily respiratory disease morbidity are insignificant. These results are influenced by the fact that the PM10 intensity is much higher than the SO2 and NO2 levels for most days in the study period (at 1794 of total 2192 days, 81.84%). The variation of the SO2 and NO2 intensity is also smaller as shown in Table 2. This result coincides with those of Oftedal et al. [15], who suggested that small variations in the air pollutant intensity reduce the precision of the estimated health effects.

The health effect of other air pollutants is explored in many researches, e.g., fine particulate matter (PM2.5) [2, 9, 16]. Conclusions different from this paper are drawn. For instance, NO and NO2 are two main air pollutants that are positively associated with respiratory diseases [17], as CO and NO2 in Qatar [18], and benzene in Drammen, Norway [15]. However, the data of PM2.5, NO, CO, and benzene was not collected until 2015 in China. So the data is not available and its health effect is not estimated in this study. On the other hand, those air pollutants show significant health effect only in particular local area or environment. In Chongqing, PM10 is more significant than NO2 and SO2, as shown in our results.

Since this study focuses on estimating the exposure–lag–response association between air pollution and local respiratory disease morbidity on a daily basis, some potentially relevant factors are not investigated, including gender, age, and smoking subgroups. The subgroup distribution of the factors neglected does not vary day by day in a particular city.

In addition, some confounders, which are significant in other papers, are not included in the best-fitted model, e.g., seasonal indicator. A seasonal subseries plot of the daily admissions is shown in Fig. 5. As we can see, the means for each month are relatively close and show no obvious pattern. Chongqing is a hot city. The mean ± SD of the daily maximum temperature is 22.74 ± 9.30 °C. Meanwhile, a correlation test between the 8-day moving average of the maximum temperature and the seasonal indicator suggests high association. The Pearson’s correlation coefficient is 0.828 ± 0.013 for 95% confidence interval. The result of the Augmented Dickey-Fuller (ADF) test shows that the two time series are nonstationary and cointegrated. That is the reason why the confounder term fitted is a smooth term of the last 8-day moving average of the maximum temperature, which may be more significant than the seasonal indicator.

Fig. 5
figure 5

Seasonal subseries plot of the daily admissions

6 Conclusion

The modeling framework of DLNM combining with GAM is commonly used to estimate the health effects of protracted exposure events of varying intensities sustained with time. In this paper, we demonstrate the fitting and testing processes used to quantify the health risk of air pollution on respiratory disease morbidity in Chongqing City, China. With meticulous model selection and result analysis, our results suggest that the model fits the data well. There are different results and conclusions existed in other related studies. However, all the differences may come from the different environment of the research areas. The result model can be used to predict the relative risk of increased air pollution or the cumulative health effects of a history of exposure to air pollution in Chongqing within a limited lag range of 40 days.