Introduction

Extensive studies have indicated the association between temperature and human health, which arouses public health concerns as the climate has changed drastically on a worldwide scale due to global warming in recent years (Basu 2009; Gasparrini et al. 2010). After accounting for climate changes and other factors, how hot and cold weather, or their delayed effects, trigger human death were widely discussed in different areas, including the USA (Curriero et al. 2002; Mills et al. 2015), Europe (Baccini et al. 2008), and Northeast Asia (Chung et al. 2015). In addition to temperature, it has also been documented that exposure to air pollutants, which includes particulate matter (PM), ozone (O3), nitrogen dioxide (NO2), and sulfur dioxide (SO2) according to the 2005 WHO Air Quality Guidelines, leads to adverse effects on human health, especially the respiratory and cardiovascular diseases. Several types of research have examined the relationship between PM10, PM2.5, and daily mortality. Some showed that exposure to polluted air in a period would harm health conditions such as the development of lung or heart diseases, where the sources of pollution come from air, second-hand smoke, ozone, or particle matters (Dai et al. 2004; Janssen et al. 2013).

In 2010, Gasparrini et al. (2010) carried out the distributed lag non-linear model (DLNM) to evaluate predictors’ lag effect. The DLNM fits the non-linear association between the outcome variable and predictors. A cross-basis function simultaneously depicts the exposure–response relationship and the predictor space and lag–response relationship along with the lag space. In 2018, a new approach assessed both the same-day and 1-day lag mortality in DLNM (Chen et al. 2018). Therefore, associations in both lag outcomes and exposures need more attention to describe such a complex structure.

This research collected both weather and air pollution data as predictors and daily mortality as the health outcome in Taipei City from 2012 to 2016. Since the DLNM is widely adopted in public health and environmental research (Vicedo-Cabrera et al. 2016), we aim to extend the DLNM with Poisson link function and natural cubic splines (Bhaskaran et al. 2013) to model the cumulative mortality outcomes using lag predictors. The new methods’ validity and performance would be evaluated by the simulation study based on permutation techniques. Finally, a real data application shows a significant improvement attributable to the new method.

Materials and methods

Anonymous daily mortality counts are the outcome of interest. Hence, no patients were involved in this research. All-cause mortality in Taipei City was obtained from the Cause of the Death Database published by the Ministry of Health and Welfare.

The Institutional Review Board (IRB) of National Yang-Ming University approved the use of anonymous mortality data (IRB number YM107045E). Daily mean temperature measures were obtained from Taipei Weather Station, available through the Central Weather Bureau (CWB n.d.) Observation Data Inquiry System website (CWB n.d.). Data on air pollution were obtained from Taipei Air Quality Monitoring System, available through Environmental Protection Administration Executive Yuan website (EPAEY n.d.), where we collected daily mean ozone concentration and daily mean PM2.5 concentration (Table 1). Although some air pollutions were missing, we could omit these observations since the missing rate is dismal, with an ignorable impact on the analyses.

Table 1 Daily all-cause deaths and three primary parameters, mean temperature, ozone concentration, and PM2.5 concentration, in Taipei from 2012 to 2016

The DLNM model is defined as the following:

$$ \log \left({\mu}_t\right)=\upalpha +\mathrm{s}\left({x}_t,l,\beta \right)+{\beta}_{O_3}{O_3}_t+{\beta}_{PM2.5} PM{2.5}_t+\kern0.5em \sum \limits_{i=1}^pf\left({z_t}^i;\theta \right) $$

The independent variable (xt) is daily mean temperature and other pollutant variables (O3t and PM2.5t) are treated as potential confounders. The outcome variable (μt) was all-cause mortality. The DLNM model was fitted through a cross-basis function s(xt, l, β) simultaneously describing the effect of the daily mean temperature xt and its lag structure with maximum lag l on the expected mortality. Daily mean ozone concentration O3t and daily mean concentration PM2.5t are fixed effects. A natural cubic spline f(zti ; θ) with 8 degrees of freedom for each year is used to adjust for the seasonal effect. We selected 10, 20, and 30 days for the maximum exposure lag l. The cross-basis consists of a quadratic B-spline for temperature with the knots placed at 10th, 75th, and 90th percentiles and a natural cubic spline for the lag with 5 degrees of freedom, indicating three internal knots are equally spaced in the log scale.

In order to extend the DLNM to accommodate the lag mortalities, we propose five different multivariate (MV) approaches to transform the lag outcomes (n × l) into a one-dimensional dependent variable (n × 1) to be integrated by the DLNM.

For illustration purposes, assume that the Y matrix consists of 4 days of mortality with two lag days. Hence, the dimension of Y is (4 × 3). The second column of Y is the 1-day lag mortality. The third column of Y represents the 2-day lag mortality.

Let \( Y=\left[\begin{array}{ccc}60& 51& 62\\ {}73& 60& 51\\ {}\begin{array}{c}61\\ {}55\end{array}& \begin{array}{c}73\\ {}61\end{array}& \begin{array}{c}60\\ {}73\end{array}\end{array}\right] \), with \( eigenvector=\left[\begin{array}{ccc}a& d& g\\ {}b& e& h\\ {}c& f& e\end{array}\right], eigenvalue=\left[\begin{array}{ccc}{\lambda}_1& 0& 0\\ {}0& {\lambda}_2& 0\\ {}0& 0& {\lambda}_3\end{array}\right] \) in the principal component analysis (PCA).

Method 1:

MVsum: The most straightforward idea is to obtain the total mortalities from today to previous lag days. The new Y matrix (4 × 1) contains the summation of mortalities from the current day to the maximal lag day:

$$ {MV}_{sum}={\left[\begin{array}{c}60+51+62\\ {}73+60+51\\ {}61+73+60\\ {}55+61+73\end{array}\right]}_{4\ast 1}={\left[\begin{array}{c}173\\ {}184\\ {}194\\ {}189\end{array}\right]}_{4\ast 1} $$

Method 2:

MVAR: A commonly used longitudinal structure is autoregressive (AR). The lag mortality could be integrated into the current mortality by this weighted summation. The earlier a day lags, the less impact of mortality would contribute. We assumed a geometric progression with different ratios (0.8, 0.9, and 0.98):

  1. i.

    The n-day lag mortality is multiplied by coefficients 0.8n

    $$ {MV}_{AR1}={\left[\begin{array}{ccc}60& 51& 62\\ {}73& 60& 51\\ {}\begin{array}{c}61\\ {}55\end{array}& \begin{array}{c}73\\ {}61\end{array}& \begin{array}{c}60\\ {}73\end{array}\end{array}\right]}_{4\ast 3}\ast {\left[\begin{array}{c}{0.8}^0\\ {}{0.8}^1\\ {}{0.8}^2\end{array}\right]}_{3\ast 1}={\left[\begin{array}{c}140.48\\ {}153.64\\ {}157.8\\ {}150.52\end{array}\right]}_{4\ast 1} $$
  2. ii.

    The n-day lag mortality is multiplied by coefficients 0.9n

    $$ {MV}_{AR2}={\left[\begin{array}{ccc}60& 51& 62\\ {}73& 60& 51\\ {}\begin{array}{c}61\\ {}55\end{array}& \begin{array}{c}73\\ {}61\end{array}& \begin{array}{c}60\\ {}73\end{array}\end{array}\right]}_{4\ast 3}\ast {\left[\begin{array}{c}{0.9}^0\\ {}{0.9}^1\\ {}{0.9}^2\end{array}\right]}_{3\ast 1}={\left[\begin{array}{c}156.12\\ {}168.31\\ {}175.3\\ {}169.03\end{array}\right]}_{4\ast 1} $$
  3. iii.

    The n-day lag mortality is multiplied by coefficients 0.98n

    $$ {MV}_{AR3}={\left[\begin{array}{ccc}60& 51& 62\\ {}73& 60& 51\\ {}\begin{array}{c}61\\ {}55\end{array}& \begin{array}{c}73\\ {}61\end{array}& \begin{array}{c}60\\ {}73\end{array}\end{array}\right]}_{4\ast 3}\ast {\left[\begin{array}{c}{0.98}^0\\ {}{0.98}^1\\ {}{0.98}^2\end{array}\right]}_{3\ast 1}={\left[\begin{array}{c}169.5249\\ {}180.7804\\ {}190.164\\ {}184.8892\end{array}\right]}_{4\ast 1} $$

Method 3:

MVPCA: The principal component analysis (PCA) (Jolliffe and Cadima 2016) is an unsupervised methodology to reduce numerous variables’ dimensionality. The first component represents the maximum variability explained. Therefore, we use only the first component in the first attempt. The second employs all eigenvectors such that all variabilities are maintained.

  1. i.

    Only multiply the first eigenvector (to obtain the first principal component):

    $$ {MV}_{PCA1}={\left[\begin{array}{ccc}60& 51& 62\\ {}73& 60& 51\\ {}\begin{array}{c}61\\ {}55\end{array}& \begin{array}{c}73\\ {}61\end{array}& \begin{array}{c}60\\ {}73\end{array}\end{array}\right]}_{4\ast 3}\ast {\left[\begin{array}{c}a\\ {}b\\ {}c\end{array}\right]}_{3\ast 1}\to {\left[\begin{array}{c}60a+51b+62c\\ {}73a+60b+51c\\ {}61a+73b+60c\\ {}55a+61b+73c\end{array}\right]}_{4\ast 1} $$
  2. ii.

    Multiply all eigenvector (to obtain all principal components) and the corresponding percentage:

    $$ {MV}_{PCA2}={\left[\begin{array}{ccc}60& 51& 62\\ {}73& 60& 51\\ {}\begin{array}{c}61\\ {}55\end{array}& \begin{array}{c}73\\ {}61\end{array}& \begin{array}{c}60\\ {}73\end{array}\end{array}\right]}_{4\ast 3}\ast {\left[\begin{array}{ccc}a& d& g\\ {}b& e& h\\ {}c& f& e\end{array}\right]}_{3\ast 3}\ast {\left[\begin{array}{c}\begin{array}{c}\frac{\lambda_1}{\left({\lambda}_1+{\lambda}_2+{\lambda}_3\right)}\\ {}\frac{\lambda_2}{\left({\lambda}_1+{\lambda}_2+{\lambda}_3\right)}\end{array}\\ {}\frac{\lambda_3}{\left({\lambda}_1+{\lambda}_2+{\lambda}_3\right)}\end{array}\right]}_{2\ast 1} $$

Method 4:

MVadjust: Separate the current mortality from the lag mortalities. Create a reduced matrix that sums over L lag mortalities but not the current mortality, \( sum\left({x}_{sL}\right)={\left[\begin{array}{c}51+62\\ {}60+51\\ {}73+60\\ {}61+73\end{array}\right]}_{4\times 1}={\left[\begin{array}{c}113\\ {}111\\ {}133\\ {}134\end{array}\right]}_{4\times 1} \), and adjust sum(xsL) as a covariate in the DLNM.

$$ {\left[\begin{array}{c}60\\ {}73\\ {}61\\ {}55\end{array}\right]}_{4\times 1}\mathrm{is}\ \mathrm{the}\ \mathrm{outcome}\kern0.5em \& sum\left({x}_{sL}\right)={\left[\begin{array}{c}113\\ {}111\\ {}133\\ {}134\end{array}\right]}_{4\times 1}\mathrm{is}\ \mathrm{adjusted}\ \mathrm{in}\ \mathrm{the}\ \mathrm{DLNM} $$

Method 5:

MVDLNM: Similar to method 4, but instead of treating the sum of previously lag mortalities as a covariate, \( sum\left({x}_{sL}\right)={\left[\begin{array}{c}113\\ {}111\\ {}133\\ {}134\end{array}\right]}_{4\times 1} \) is considered as the offset of the current mortality in the DLNM.

$$ {\left[\begin{array}{c}60\\ {}73\\ {}61\\ {}55\end{array}\right]}_{4\times 1}\mathrm{is}\ \mathrm{the}\ \mathrm{outcome}\kern0.5em \& sum\left({x}_{sL}\right)={\left[\begin{array}{c}113\\ {}111\\ {}133\\ {}134\end{array}\right]}_{4\times 1}\mathrm{is}\ \mathrm{the}\ \mathrm{offset}\ \mathrm{in}\ \mathrm{the}\ \mathrm{DLNM} $$

To validate the above approaches’ performance, we conducted a simulation study under the null hypothesis 1000 times. The null distribution was generated by permutations of mortality such that the outcome mortality and temperature measures were not correlated. The validity of each model was assessed. If the proportion of rejecting the null hypothesis does not exceed the significance level of 5%, the proposed strategy is a valid test.

All the statistical analyses and simulations were conducted by the software R (R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing), equipped with the package "dlnm" by Gasparrini et al. (2010).

Results

According to the permuted samples, the observed type I error for MVsum is presented in Table 2. Methods 1 and 2 are based on the summation of previous outcomes but with different weights. Therefore, the results of MVAR1, MVAR2, and MVAR3 are similar and not shown.

Table 2 Type I errors of MVsum

Due to a negative value in principal components, MVPCA1 and MVPCA2 failed to satisfy the Poisson distribution model’s assumption and did not generate any DLNM package results. Hence, type I errors were not obtained. Note that type I error rates for MVsum, MVAR1, MVAR2, and MVAR3 were much larger than the nominal level of 0.05. Inflation is increasing for the number of lag outcomes. Therefore, these methods are not valid, although the idea is simple and could be easily implemented.

Table 3 shows that the type I error rate of MVadjust was between 0.058 and 0.067 when the lag exposure was up to 10 days. The type I error became 0.083–0.183 when the lag exposures were up to 20 days. Finally, type I errors are 0.076–0.308 for 30 lag temperature measures. Although the type I error rate was consistently larger than 0.05, MVadjust yielded much smaller type I errors than summation-based methods.

Table 3 Type I errors of MVadjust

Finally, Table 4 shows that the type I error rate of MVDLNM was smaller than 0.05 when the lag exposure was 10 days. The type I error would range from 0 to 0.078 if the lag exposure were 20 days. When the lag exposure was 30 days, the type I error ranges from 0 to 0.102. Therefore, the results indicated that MVDLNM is the only valid test. For the 10, 20, and 30 lag exposures, the cumulative outcome mortality could be implemented up to 10, 10, and 13 days, respectively.

Table 4 Type I errors of MVDLNM

The research aims to provide a novel method with an ensured valid type I error and sufficient statistical power. Therefore, in addition to type I error simulations, we examined computer simulations to compare the performance between the DLNM and the MVDLNM. We conducted 1000 repetitions for each scenario. The R function for power simulation is freely available. Researchers could use various datasets with different environmental factors and structures in other countries worldwide to confirm that both the lag outcomes and exposures could demonstrate a significant association.

In power simulations, we kept the temperature and air pollution structures in Taipei from 2012 to 2016. According to the Poisson distribution, we simulated the outcome variable with the mean parameter λ equals the daily mean temperature. In this way, the temperature determines the number of mortality, and the association is significant. Scenarios included different lengths of the study period from 120 days to 1 year since the statistical power is 100% for both methods with a sample size of more than 1 year. Therefore, besides statistical power, we recorded the percentage when the p value of the MVDLNM is smaller than the p value of the DLNM. Simulation results revealed that the MVDLNM outperforms the conventional DLNM in the scenarios we examined (Table 5). The percentage when the MVDLNM reveals a more significant result is higher than 50%, and the power of the MVDLNM is consistently higher than the power of the DLNM.

Table 5 Power simulations with 1000 repetitions

Our previous work using six major cities in Taiwan (Guo et al. 2014) reported a significant temperature impact on mortality. In this research, only Taipei City is available for recent years. However, the MVDLNM could provide significant overall p values (Table 6). Regarding the temperature measure up to 10 lag days, significant p values were observed for four or more lag mortalities incorporated in the model. For temperature in 20 and 30 lag days, if five or more lag mortalities are used in the MVDLNM, the result would suggest significant associations. Hence, the cumulative outcomes could contribute to the association with lag exposures. In Figs. 1, 2, 3, 4, and 5, the overall relative risk (RR) for 30 lag days is displayed. The RR on the current day is approximately 2.3, but the RR increases with respect to the lag effects. For the lag of 10 days, the RR is as high as 5. The figures on the rest lag days were not shown. There are too many figures, and the pattern was observed according to the five figures.

Table 6 Real data application of MVDLNM: overall p value
Fig. 1
figure 1

Overall RR on the current day

Fig. 2
figure 2

Overall RR on the 5th lag day

Fig. 3
figure 3

Overall RR on the 10th lag day

Fig. 4
figure 4

Overall RR on the 20th lag day

Fig. 5
figure 5

Overall RR on the 30th lag day

Since the MVDLNM extends the DLNM with an offset, the MVDLNM models the mortality rates comparing to the DLNM that models mortality counts. The cross-basis and all covariate structures of the MVDLNM are identical to that of the DLNM. This phenomenon is also an advantage of the new approach.

Discussion

Conventional DLNMs can be interpreted as one day’s exposure influences outcome over several subsequent days, discussed in various publications by Gasparrini. However, the DLNM only considers the outcome on the current day. In this research, several strategies were proposed to explore further the possibilities of extending the DLNM to incorporate the previous days’ outcomes. Looking at the specific models proposed, one would think that what this research means is that mortality on one day depends on mortality on several previous days (not just exposures). Most statisticians would consider these as autoregression models. Therefore, this research also provides epidemiological motivation, noting the potential reasons for such autocorrelation, which may be introduced by unmeasured slow-changing covariates, such as infectious diseases (Imai et al. 2015). Simple autoregression (Brumback et al. 2000) has been considered in the environmental time series literature, but not much discussion of the various types of models proposed here (though there is some—e.g., Imai et al. 2015).

In a different point of view, all of the models proposed in this research could be considered as DLNMs for the dependence of mortality on earlier mortality: (1) MVsum is equivalent to stratum-constrained DLNM; (2) MVPCA is the same as DLNM with lag weights determined by PCA; (3) MVadjust is a different stratum-constrained DLM; (4) MVDLNM could be considered as MVadjust but with coefficient constrained to 1.

Through simulation studies, we examined several novel approaches to characterize the effect of the delayed mortality and lag temperature measures. Results suggested that most methods are invalid, although these statistical concepts are intuitive and could be implemented effortlessly. The negative findings could provide researchers a great idea to avoid such types of analyses. Fortunately, there is one valid model, the MVDLNM, where the log-transformed summation of the delayed mortalities is treated as an offset in the DLNM model. The MVDLNM model is \( \log \left({\mu}_t\right)=\upalpha +\mathrm{s}\left({x}_t,l,\beta \right)+{\beta}_{O_3}{O_3}_t+{\beta}_{PM2.5} PM{2.5}_t+\kern0.5em {\sum}_{i=1}^pf\left({z_t}^i;\theta \right)+ offset\left(\log \left( sum\left({x}_{sL}\right)\right)\right) \).

Because the new method MVDLNM could not be easily implemented in the DLNM package, we provide the R functions for researchers to utilize the MVDLNM effortlessly. The example data in Taipei City is also enclosed. Please see the supplementary materials for details. The R code generates the plots for relative risks. Besides, we prepared another R function for power simulations such that researchers could assess if their environmental data have the advantage of incorporating the lag outcomes in addition to the lag exposures. We have made the corresponding changes accordingly.

The illustration of real data analysis of Taipei City from 2012 to 2016 confirmed that the delayed mortality records could significantly increase the association signal along with lag temperature measures, which matches the conclusions as we previously reported (Guo et al. 2014). Nevertheless, this new strategy is a handy tool and could be adopted by various research fields when the cumulative outcome provides a more significant signal than the current one.

In public health research, the exposure may post a delayed effect, but the outcome of interest could signal the lag effect. This methodological study provides a simple yet valid test that jointly models the lag exposure and the delayed mortality records to enhance the ability to discover such a complex association structure.

In summary, this research proposed a novel strategy to account for cumulative mortality in the distributed lag temperature records. According to computer simulations, the new model MVDLNM demonstrated a much more significant association than the conventional method with the current mortality.

The simulation results revealed that the type I error of MVDLNM does not exceed the nominal level of 5% within ten lag mortalities. Since 10 days is an intuitive interval, we recommend incorporating up to 10 days of lag outcomes in the new approach. In conclusion, the new approach MVDLNM models lag outcomes within 10 days and lag exposures up to 1 month and provide valid results.

Strengths and limitations

This research proposed several novel statistical models accounting for daily mortality in previous days. Although the concept is intuitive and one could quickly implement the methods, the four methods were not valid tests. However, the negative findings could prevent researchers from such types of erroneous models. Comparing to the conventional analysis model that only assesses the current mortality, the new approach MVDLNM yielded a much more significant association. The RR’s maximum value in Fig. 1 increases to the RR in Fig. 5 and showed explicit evidence that the lag exposure and outcome of interests contribute to the statistical significance.

The data used in this study are limited to Taipei, the capital of Taiwan, while the relationship between temperature and mortality may consist of various profiles in other regions. For example, the accessibility and quality of medical care may be different in smaller towns. In addition, we considered the all-cause mortality since we could not further classify death causes into more categories, such as sudden cardiac death or myocardial infarction, which are more likely to be related to temperature and air pollution. As for the temperature, only daily mean temperature was considered in this study. We did not explore the highest, lowest temperature, and intraday temperature variation in the contribution to human death. Finally, some researchers proposed a threshold to differentiate the impact of hot and cold temperatures on mortality. In contrast, we use the continuous temperature measures to employ spline functions and polynomials.