Extensions of the distributed lag non-linear model (DLNM) to account for cumulative mortality

The effects of meteorological factors on health outcomes have gained popularity due to climate change, resulting in a general rise in temperature and abnormal climatic extremes. Instead of the conventional cross-sectional analysis that focuses on the association between a predictor and the single dependent variable, the distributed lag non-linear model (DLNM) has been widely adopted to examine the effect of multiple lag environmental factors health outcome. We propose several novel strategies to model mortality with the effects of distributed lag temperature measures and the delayed effect of mortality. Several attempts are derived by various statistical concepts, such as summation, autoregressive, principal component analysis, baseline adjustment, and modeling the offset in the DLNM. Five strategies are evaluated by simulation studies based on permutation techniques. The longitudinal climate and daily mortality data in Taipei, Taiwan, from 2012 to 2016 were implemented to generate the null distribution. According to simulation results, only one strategy, named MVDLNM, could yield valid type I errors, while the other four strategies demonstrated much more inflated type I errors. With a real-life application, the MVDLNM that incorporates both the current and lag mortalities revealed a more significant association than the conventional model that only fits the current mortality. The results suggest that, in public health or environmental research, not only the exposure may post a delayed effect but also the outcome of interest could provide the lag association signals. The joint modeling of the lag exposure and the delayed outcome enhances the power to discover such a complex association structure. The new approach MVDLNM models lag outcomes within 10 days and lag exposures up to 1 month and provide valid results. Supplementary Information The online version contains supplementary material available at 10.1007/s11356-021-13124-0.


Introduction
Extensive studies have indicated the association between temperature and human health, which arouses public health concerns as the climate has changed drastically on a worldwide scale due to global warming in recent years (Basu 2009;Gasparrini et al. 2010). After accounting for climate changes and other factors, how hot and cold weather, or their delayed effects, trigger human death were widely discussed in different areas, including the USA (Curriero et al. 2002;Mills et al. 2015), Europe (Baccini et al. 2008), and Northeast Asia (Chung et al. 2015). In addition to temperature, it has also been documented that exposure to air pollutants, which includes particulate matter (PM), ozone (O 3 ), nitrogen dioxide (NO 2 ), and sulfur dioxide (SO 2 ) according to the 2005 WHO Air Quality Guidelines, leads to adverse effects on human health, especially the respiratory and cardiovascular diseases. Several types of research have examined the relationship between PM 10 , PM 2.5 , and daily mortality. Some showed that exposure to polluted air in a period would harm health conditions such as the development of lung or heart diseases, where the sources of pollution come from air, second-hand smoke, ozone, or particle matters (Dai et al. 2004;Janssen et al. 2013).
In 2010, Gasparrini et al. (2010) carried out the distributed lag non-linear model (DLNM) to evaluate predictors' lag effect. The DLNM fits the non-linear association between the outcome variable and predictors. A cross-basis function simultaneously depicts the exposure-response relationship and the predictor space and lag-response relationship along with the lag space. In 2018, a new approach assessed both the same-day and 1-day lag mortality in DLNM (Chen et al. 2018). Therefore, associations in both lag outcomes and exposures need more attention to describe such a complex structure.
This research collected both weather and air pollution data as predictors and daily mortality as the health outcome in Taipei City from 2012 to 2016. Since the DLNM is widely adopted in public health and environmental research (Vicedo-Cabrera et al. 2016), we aim to extend the DLNM with Poisson link function and natural cubic splines (Bhaskaran et al. 2013) to model the cumulative mortality outcomes using lag predictors. The new methods' validity and performance would be evaluated by the simulation study based on permutation techniques. Finally, a real data application shows a significant improvement attributable to the new method.

Materials and methods
Anonymous daily mortality counts are the outcome of interest. Hence, no patients were involved in this research. All-cause mortality in Taipei City was obtained from the Cause of the Death Database published by the Ministry of Health and Welfare.
The Institutional Review Board (IRB) of National Yang-Ming University approved the use of anonymous mortality data (IRB number YM107045E). Daily mean temperature measures were obtained from Taipei Weather Station, available through the Central Weather Bureau (CWB n.d.) Observation Data Inquiry System website (CWB n.d.). Data on air pollution were obtained from Taipei Air Quality Monitoring System, available through Environmental Protection Administration Executive Yuan website (EPAEY n.d.), where we collected daily mean ozone concentration and daily mean PM 2.5 concentration (Table 1). Although some air pollutions were missing, we could omit these observations since the missing rate is dismal, with an ignorable impact on the analyses.
The DLNM model is defined as the following: The independent variable (x t ) is daily mean temperature and other pollutant variables (O 3t and PM2.5 t ) are treated as potential confounders. The outcome variable (μ t ) was allcause mortality. The DLNM model was fitted through a cross-basis function s(x t , l, β) simultaneously describing the effect of the daily mean temperature x t and its lag structure with maximum lag l on the expected mortality. Daily mean ozone concentration O 3t and daily mean concentration PM2.5 t are fixed effects. A natural cubic spline f(z t i ; θ) with 8 degrees of freedom for each year is used to adjust for the seasonal effect. We selected 10, 20, and 30 days for the maximum exposure lag l. The cross-basis consists of a quadratic Bspline for temperature with the knots placed at 10th, 75th, and 90th percentiles and a natural cubic spline for the lag with 5 degrees of freedom, indicating three internal knots are equally spaced in the log scale.
In order to extend the DLNM to accommodate the lag mortalities, we propose five different multivariate (MV) approaches to transform the lag outcomes (n × l) into a one-dimensional dependent variable (n × 1) to be integrated by the DLNM.
For illustration purposes, assume that the Y matrix consists of 4 days of mortality with two lag days. Hence, the dimension of Y is (4 × 3). The second column of Y is the 1-day lag mortality. The third column of Y represents the 2-day lag mortality.  MV sum : The most straightforward idea is to obtain the total mortalities from today to previous lag days. The new Y matrix (4 × 1) contains the summation of mortalities from the current day to the maximal lag day: Method 2: MV AR : A commonly used longitudinal structure is autoregressive (AR). The lag mortality could be integrated into the current mortality by this weighted summation. The earlier a day lags, the less impact of mortality would contribute. We assumed a geometric progression with different ratios (0.8, 0.9, and 0.98): i. The n-day lag mortality is multiplied by coefficients 0.  ii. Multiply all eigenvector (to obtain all principal components) and the corresponding percentage:  is the offset in the DLNM To validate the above approaches' performance, we conducted a simulation study under the null hypothesis 1000 times. The null distribution was generated by permutations of mortality such that the outcome mortality and temperature measures were not correlated. The validity of each model was assessed. If the proportion of rejecting the null hypothesis does not exceed the significance level of 5%, the proposed strategy is a valid test.
All the statistical analyses and simulations were conducted by the software R (R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing), equipped with the package "dlnm" by Gasparrini et al. (2010).

Results
According to the permuted samples, the observed type I error for MV sum is presented in Table 2. Methods 1 and 2 are based on the summation of previous outcomes but with different weights. Therefore, the results of MV AR1 , MV AR2 , and MV AR3 are similar and not shown.
Due to a negative value in principal components, MV PCA1 and MV PCA2 failed to satisfy the Poisson distribution model's assumption and did not generate any DLNM package results. Hence, type I errors were not obtained. Note that type I error rates for MV sum , MV AR1 , MV AR2 , and MV AR3 were much larger than the nominal level of 0.05. Inflation is increasing for the number of lag outcomes. Therefore, these methods are not valid, although the idea is simple and could be easily implemented. Table 3 shows that the type I error rate of MV adjust was between 0.058 and 0.067 when the lag exposure was up to 10 days. The type I error became 0.083-0.183 when the lag exposures were up to 20 days. Finally, type I errors are 0.076-0.308 for 30 lag temperature measures. Although the type I error rate was consistently larger than 0.05, MV adjust yielded much smaller type I errors than summation-based methods.
Finally, Table 4 shows that the type I error rate of MV DLNM was smaller than 0.05 when the lag exposure was 10 days. The type I error would range from 0 to 0.078 if the lag exposure were 20 days. When the lag exposure was 30 days, the type I error ranges from 0 to 0.102. Therefore, the results indicated that MV DLNM is the only valid test. For the 10, 20, and 30 lag exposures, the cumulative outcome mortality could be implemented up to 10, 10, and 13 days, respectively.
The research aims to provide a novel method with an ensured valid type I error and sufficient statistical power. Therefore, in addition to type I error simulations, we examined computer simulations to compare the performance between the DLNM and the MV DLNM . We conducted 1000 repetitions for each scenario. The R function for power simulation is freely available. Researchers could use various datasets with different environmental factors and structures in other countries worldwide to confirm that both the lag outcomes and exposures could demonstrate a significant association.
In power simulations, we kept the temperature and air pollution structures in Taipei from 2012 to 2016. According to the Poisson distribution, we simulated the outcome variable with the mean parameter λ equals the daily mean temperature. In this way, the temperature determines the number of mortality, and the association is significant. Scenarios included different lengths of the study period from 120 days to 1 year since the statistical power is 100% for both methods with a sample size of more than 1 year. Therefore, besides statistical power, we recorded the percentage when the p value of the MV DLNM is smaller than the p value of the DLNM. Simulation results revealed that the MV DLNM outperforms the conventional DLNM in the scenarios we examined ( Table 5). The percentage when the MV DLNM reveals a more significant result is higher than 50%, and the power of the MV DLNM is consistently higher than the power of the DLNM. Our previous work using six major cities in Taiwan (Guo et al. 2014) reported a significant temperature impact on mortality. In this research, only Taipei City is available for recent years. However, the MV DLNM could provide significant overall p values (Table 6). Regarding the temperature measure up to 10 lag days, significant p values were observed for four or more lag mortalities incorporated in the model. For temperature in 20 and 30 lag days, if five or more lag mortalities are used in the MV DLNM , the result would suggest significant associations. Hence, the cumulative outcomes could contribute to the association with lag exposures. In Figs. 1, 2, 3, 4, and 5, the overall relative risk (RR) for 30 lag days is displayed. The RR on the current day is approximately 2.3, but the RR increases with respect to the lag effects. For the lag of 10 days, the RR is as high as 5. The figures on the rest lag days were not shown. There are too many figures, and the pattern was observed according to the five figures.
Since the MV DLNM extends the DLNM with an offset, the MV DLNM models the mortality rates comparing to the DLNM  This phenomenon is also an advantage of the new approach.

Discussion
Conventional DLNMs can be interpreted as one day's exposure influences outcome over several subsequent days, discussed in various publications by Gasparrini. However, the DLNM only considers the outcome on the current day. In this research, several strategies were proposed to explore further the possibilities of extending the DLNM to incorporate the previous days' outcomes. Looking at the specific models proposed, one would think that what this research means is that mortality on one day depends on mortality on several previous days (not just exposures). Most statisticians would consider these as autoregression models. Therefore, this research also provides epidemiological motivation, noting the potential reasons for such autocorrelation, which may be introduced by unmeasured slow-changing covariates, such as infectious diseases (Imai et al. 2015). Simple autoregression (Brumback et al. 2000) has been considered in the environmental time series literature, but not much discussion of the various types of models proposed here (though there is some-e.g., Imai et al. 2015).
In a different point of view, all of the models proposed in this research could be considered as DLNMs for the dependence of mortality on earlier mortality: (1) MV sum is equivalent to stratum-constrained DLNM; (2) MV PCA is the same as DLNM with lag weights determined by PCA; (3) MV adjust is a different stratum-constrained DLM; (4) MV DLNM could be considered as MV adjust but with coefficient constrained to 1.
Through simulation studies, we examined several novel approaches to characterize the effect of the delayed mortality and lag temperature measures. Results suggested that most The column "MV DLNM Wins" represents the percentage when the p value of the MV DLNM is smaller than the p value of the DLNM  Table 4 methods are invalid, although these statistical concepts are intuitive and could be implemented effortlessly. The negative findings could provide researchers a great idea to avoid such types of analyses. Fortunately, there is one valid model, the MV DLNM , where the log-transformed summation of the delayed mortalities is treated as an offset in the DLNM model. The MV DLNM model is logð Þ . Because the new method MV DLNM could not be easily implemented in the DLNM package, we provide the R functions for researchers to utilize the MV DLNM effortlessly. The example data in Taipei City is also enclosed. Please see the supplementary materials for details. The R code generates the plots for relative risks. Besides, we prepared another R function for power simulations such that researchers could assess if their environmental data have the advantage of incorporating the lag outcomes in addition to the lag exposures. We have made the corresponding changes accordingly.
The illustration of real data analysis of Taipei City from 2012 to 2016 confirmed that the delayed mortality records Fig. 1 Overall RR on the current day Fig. 2 Overall RR on the 5th lag day could significantly increase the association signal along with lag temperature measures, which matches the conclusions as we previously reported (Guo et al. 2014). Nevertheless, this new strategy is a handy tool and could be adopted by various research fields when the cumulative outcome provides a more significant signal than the current one.
In public health research, the exposure may post a delayed effect, but the outcome of interest could signal the lag effect. This methodological study provides a simple yet valid test that jointly models the lag exposure and the delayed mortality records to enhance the ability to discover such a complex association structure.
In summary, this research proposed a novel strategy to account for cumulative mortality in the distributed lag temperature records. According to computer simulations, the new model MVDLNM demonstrated a much more significant association than the conventional method with the current mortality.
The simulation results revealed that the type I error of MV DLNM does not exceed the nominal level of 5% within Fig. 3 Overall RR on the 10th lag day Fig. 4 Overall RR on the 20th lag day ten lag mortalities. Since 10 days is an intuitive interval, we recommend incorporating up to 10 days of lag outcomes in the new approach. In conclusion, the new approach MV DLNM models lag outcomes within 10 days and lag exposures up to 1 month and provide valid results.

Strengths and limitations
This research proposed several novel statistical models accounting for daily mortality in previous days. Although the concept is intuitive and one could quickly implement the methods, the four methods were not valid tests. However, the negative findings could prevent researchers from such types of erroneous models. Comparing to the conventional analysis model that only assesses the current mortality, the new approach MV DLNM yielded a much more significant association. The RR's maximum value in Fig. 1 increases to the RR in Fig. 5 and showed explicit evidence that the lag exposure and outcome of interests contribute to the statistical significance.
The data used in this study are limited to Taipei, the capital of Taiwan, while the relationship between temperature and mortality may consist of various profiles in other regions. For example, the accessibility and quality of medical care may be different in smaller towns. In addition, we considered the all-cause mortality since we could not further classify death causes into more categories, such as sudden cardiac death or myocardial infarction, which are more likely to be related to temperature and air pollution. As for the temperature, only daily mean temperature was considered in this study. We did not explore the highest, lowest temperature, and intraday temperature variation in the contribution to human death. Finally, some researchers proposed a threshold to differentiate the impact of hot and cold temperatures on mortality. In contrast, we use the continuous temperature measures to employ spline functions and polynomials.
Acknowledgments Mortality data were supported by the research grant at National Yang-Ming University (YM107C005).
Availability of data and materials Public data are available from the websites indicated in the manuscript.
Authors' contributions C-Y.G. proposed the study design, supervised the research project, and drafted the manuscript. X-Y.H. carried out analysis, simulations, and generated results. P-C.K.: substantial edits of the manuscript. Y.-H.C.: funding support and edited the manuscript.
Funding No funding was received for this research.

Declarations
Ethics approval and consent to participate The IRB at National Yang-Ming University approves the use of anonymous mortality data. The data satisfy ethics approval. The ID of the IRB approval is "YM107045E." All authors have approved the manuscript.
Competing interests The authors declare that they have no conflicts of interest. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.