Introduction

Atmospheric pollution is representing globally the highest environmental risk for the human health. The WHO has published an estimation showing that 4.2 million deaths every year as a result of exposure to ambient (outdoor) air pollution and 3.8 million deaths every year as a result of household exposure to smoke from dirty cookstoves and fuels (WHO 2020). Large longitudinal studies (Fisher et al. 2015) were capable of demonstrating that the exposure over a long period of time to air pollutants can be linked to nonaccidental and cause-specific mortality. A recent catalog and synthesis review of the relationships of air pollution with long-term morbidity (Lipfert 2018) has listed 489 publications which covered respiratory, cardiovascular, cancer, neurological, diabetic, and birth outcomes in relation to 12 air pollutants. In approximately 50% of these publications, the relationships that have been found were statistically significant. Also short time exposure to air pollution can cause health effects, e.g., it may trigger an asthma attack (Noh et al. 2016) or worsen a preexisting medical condition. Short-term effects of pollution may trigger or exacerbate the diseases of the eye and adnexa (Song et al. 2019), respiratory diseases (Xie et al. 2019), and cardiovascular diseases (Ma et al. 2017) and may also cause premature death (Dastoorpoor et al. 2018). “The time series, case-crossover, and panel studies are best suited for estimating the acute effects of air pollution” (Peng and Dominici 2008). When it comes to evaluating hospitalizations or healthcare visits or mortality as outcomes of a given study, the data structure and the type of effects that need to be studied typically lead to the model selection. Often the choice falls on regression models. Both time-series (Bell et al. 2004; Zheng et al. 2015; Jie et al. 2018) and case-crossover designs are widely used (Taj et al. 2017; Leiser et al. 2018). Although, in addition to these consolidated study designs, for the same purposes, also other methods are being routinely used, e.g., cross-sectional studies or case-control studies (Nieuwenhuijsen 2015). In any case, time-series analyses are commonly used to investigate the association of pollution exposures (which vary in time) with the investigated event counts (e.g., hospitalizations), while the case-crossover design was originally designed to study “acute transient effects of intermittent exposures” (Maclure 1991). While originally most of such studies (Fung et al. 2003; Hajat 2003) used time-series analyses, progressively case-crossover studies have become more and more the alternative analytical approach (Duan et al. 2016). Both techniques have their own advantages and drawbacks. Only very few studies have compared the output of the two methodologies (Lin et al. 2002; Yuming et al. 2010; Kayo et al. 2009; Tong et al. 2012; Meng et al. 2017; Zheng et al. 2015), in particular on the respiratory hospitalizations and healthcare visits associated with air pollution. This paper compares the already published results of the impact of air pollution on hospital admissions with a focus on respiratory diseases performed with a time-series multicity analysis (Slama et al. 2019) with a case-crossover design performed on the same dataset.

Methods

The detailed description of the data sources, cities, health data, and pollutants data have been published previously (Slama et al. 2019). In brief, five cities in Poland have been chosen to provide the appropriate representation of the country data. Two large cities (Warsaw 2 M inhabitants and Krakow 0,8 M), one medium (Gdansk 0,5 M) and two smaller (Bielsko-Biała and Białystok 0,3 M); in total 3,9 M – approximately 10% of the total population of Poland, have been selected to represent different city-size demographics. In addition, historical data show different pollution levels, with Krakow, Warsaw, and Bielsko-Biała being significantly more polluted than the other cities. As a last point, their climatic and geographical location (costal Gdansk, continental Warsaw, Gdansk north, Krakow and Bielsko-Biała South, Białystok East) as well as availability of data, drove the ultimate choice. A map of the study area is reported in Fig. 1. The data related to the number of hospitalizations in the cities of Warsaw, Białystok, Bielsko-Biała, Krakow, and Gdansk were obtained from the reporting system of the NHF (in Polish: Narodowy Fundusz Zdrowia) and covered a period of almost 4 years (2014–2017, 1255 days). Data on the concentration of air pollution were obtained from the Chief Inspectorate for Environmental Protection (GIOS) and included NO, NOx, NO2, O3, SO2, PM2.5, PM10, PM10_24, and PM2.5_24. Daily (obtained from manual stations) and hourly data (obtained from automatic stations, coded as 24, e.g., PM10_24, for which the 24 h average have been recorded) have been used in the analysis. Meteorological data have been gathered from the Institute of Meteorology and Water Management (IMGW) that have beacons in the Polish cities and included temperature, main wind speed, and precipitations.

Fig. 1
figure 1

Map of Poland* and the cities chosen in both study design analysis. *The map is taken from the CIA World Factbook

Statistical analysis

The study has been designed to compare the results of two different study designs (time series and case-crossover).

  1. a.

    In the original, already published time series analysis (Slama et al. 2019), the hypothesized association between air pollution and respiratory hospitalizations has been analyzed using a simple correlation analysis, later combined with a DLNM model based on Almon method, to explore the lag effect. Normalization of the sample for factors like temperature and day of the week had to be put in place to adjust for seasonality and time dependent trends.

b. The case-crossover study design is per se designed to adjust for seasonality trends and time (Perrakis et al. 2014), so no normalization for temperature and/or day of the week had to be put in place. A “Gasparrini”-type (Gasparrini et al. 2010, 2012; Gasparrini 2011) DLNM model has been used to explore the association between air pollution and respiratory hospitalization with respect to the potential time delay, the so-called time lag. The lag assumption in the analysis was that the effect of increased air pollution may have a lag effect from 0,1,2… up to 10 days. The decision to stop the analysis at day 10 has been taken after analysis of the relevant literature, which insofar reported lag effects of few days only for short-term respiratory effects. In the study model, smooth temperature functions were included, as well as wind speed and precipitation. Natural cubic splines with three degrees of freedom were used for both the nonlinear weather factors effect and the lagged effect. The month, year, and national holidays as non-overlapping strata helped to avoid bias due to overlap. In addition, the day of the week was matched with control days. The analysis has been carried out utilizing the statistical software R version 3.5.1 with “dlnm” and “splines” package.

Like for the already published results utilizing a time-series analysis, all estimates have been displayed as percentage increase in the number of respiratory disease hospitalizations, associated with a 10-μg/m3 increase of the air pollution level.

Results

The summary of the air pollution data and the main meteorological data throughout the study period are displayed in graphical format in Fig. 2. For a more thorough review of these data, we revert to the published data referred to in the introduction. In general, the cities show a pollution level which, especially for the particulate matter and in some cases (Warsaw and Krakow) for NO2, is often higher than the EU limits.

Fig. 2
figure 2

Comparison of the air pollution mean concentrations, their range (±), and the corresponding EU Air Quality Standards

The results utilizing both the time-series analysis and the case-crossover design are displayed in Table 1, results which include the confidence interval (CI). The results are displayed in the form of expected percentage increase in hospitalizations per 10 unit increase in concentration of the air pollutant. The most significant increases in hospitalizations for respiratory disease have been identified (by both methods) in PM2.5_24 (Warsaw, 6.4 and 4.5, respectively, for time series (first value) and case-crossover (second value), and Białystok, 5.6 and 4.5, respectively), for SO2 (Krakow, 4.8 and 7.5, respectively, and Biełsko-Biala, 5.5 and 5.4, respectively), and for PM2.5 (Gdansk, 4.0 and 3.1, respectively).

Table 1 Comparison of percentage increase in hospitalization per each 10 units increase in pollutant concentration for both time-series and case-crossover study designs

In Fig. 3, the same results are displayed in graphical form, to ease the comparison of the two datasets.

Fig. 3
figure 3

Comparison of percentage increase in hospitalizations for respiratory disease (analysis time-series vs. case-crossover design)

Thirty-five values out of the total 40 show that the calculated % increases of hospitalizations from the two different methods overlap within their respective CI range or show a maximum of 0.5% difference between their respective CI minimums/maximums. As a descriptive example, for Bielsko-Biala, this happens for PM10 24, PM2.5 24, PM10, NOx, NO, NO2, O3, and SO2. In other words, 88% of the calculated values with both methods display a matching qualitative trend and a quantitative result displaying a CI overlap within a maximum of 0.5% difference in the hospitalization values.

The only major discrepancies of the results have been found in Warsaw (for PM2.5 24 and NOx), in Krakow (NOx), in Biełsko-Biala (O3), and in Białystok (SO2).

The relative risk (RR) plots, needed to analyze the lag effect for all pollutants and cities, are likewise reported in Fig. 4. The maxima in these graphs represent the highest RR for the event to occur, i.e., the highest probability at which the hospitalization increase is expected to happen.

Fig. 4
figure 4

Lag-specific relative risks (with 95% confidence interval) for respiratory hospitalizations associated with a 10-μg/m3 increase in air pollutant concentrations. The solid red line represents the predicted relative risk, and dashed lines indicate 95% confidence intervals

In Fig. 5, the results of the calculated lag where the maximum effect of increased hospitalizations is expected are reported for both case-crossover (Gasparrini DLNM) and time-series (Almon) analysis. From Fig. 5, for 28 out of total 40 calculated lags, the results (RR maxima, i.e., maximum probability of hospitalization after the pollution peak) do not differ for more than 1–2 days between the two methods.

Fig. 5
figure 5

Comparison of calculated maximum lag-specific relative risks for respiratory hospitalizations associated with a 10-μg/m3 increase in air pollutant concentrations or both the case-crossover (Gasparrini) and time-series (Almon) analysis

Discussion

Poland is facing, together with north of Italy and large metropolis (Fusco et al. 2001), some of the highest EU28 levels of air pollution exposures (EEA 2019; Badyda et al. 2016). The main sources of air pollution in Poland originate from exhausts from vehicles, heating (both coal and oil), coal-fired energy production, and industrial emissions. The findings of our previous study and the findings of this analysis both show that the daily average concentrations of the major air pollutants in these five polish cities, object of the analysis, were higher in the winter period and lower in the summer period (with exception of O3, which showed an inverse seasonality) and were often well above the Air Quality Standards set forth by the EU legislative frameworks.

Day of the week and temperature was shown in the previous time-series study to have an important effect on the hospitalizations and to remove the bias a normalization of the patients has been performed in the data sample preparation within that analysis. A challenge while performing case-crossover studies using air pollution data is trends and seasonality in the air pollution over time. Case-crossover design is based on the assumption that the exposure distribution is stationary (Peng and Dominici 2008), which is not the case in air pollution levels which fluctuate over time and have distinct trends and seasonal patterns. By matching the day of the week and season, case-crossover study allows control for these confounding by design. The year, month, and holidays as non-overlapping strata have been used to help avoid the bias due to overlap (Taj et al. 2017), matching the days of the week as control days.

Very few studies have been comparing these two methods in general epidemiologic air pollution studies on the same data sample and in particular in the respiratory illness field (Lin et al. 2002; Fung et al. 2003; Hajat 2003; Yuming et al. 2010; Kayo et al. 2009; Tong et al. 2012; Meng et al. 2017). In addition, we did not find so far any published study for which respiratory diseases hospitalizations and air pollution associations have been compared using these two study methodologies.

The difference between the two study results is summarized in Fig. 3, Taking into consideration the very highly different statistical methodologies, these results are encouraging. To a large extent, this could be probably the result of the relatively large number of data points constituting the sample (almost 4 years of hospitalization data). The few studies that did compare these two methodologies, although with different endpoints (e.g., mortality – Meng et al. 2017; Yuming et al. 2010) and not hospitalizations, reported a somehow similar range. In the Meng study (NB with a different end-point, i.e., mortality and not hospitalization), a “10 μg/m3 increment of SO2 levels was associated with 2.4% and 1.9% increase of relative risk of overall respiratory disease mortality in case-crossover and time-series analyses, respectively”, i.e., a difference of 0.5%. Kayo et al. (2009) (also looking at the mortality endpoint) reported results differing between the methods from 0.53 to 0.88% (i.e., 0.4%), and Tong et al. (2012) reported also differences between the methods in similar ranges, although the endpoint in this latter case was mortality with respect to heatwaves, so a profoundly different setting. In this paper, 88% of the calculated values show overlap of their respective CI’s by a maximum of 0.5%.

The main difference between the results of the two study methodologies seems to be related to the output of the relative magnitude of the CI’s. With the case-crossover design, the CI’s are generally larger. This is coherent with the same findings of the Meng study, where the confidence interval of the time-series analysis was narrower than the one in the case-crossover design and also to earlier literature (Guo et al. 2010).

Where the two study methodologies differed slightly more is in the identification of the maximum lag effect as seen in Table 1.

The distributed lag structure in hospitalizations for respiratory illness has shown to vary largely from maximum effects being recorded at lag = 0, to maximum effects recorded at lags as long as 10–14 days (Kim et al. 2012; Lall et al. 2010; Pun et al. 2015; Tao et al. 2014). The differences in such results have been explained with the typology of the pollutant, the sub-type of respiratory illness, and the stratification of the patient population (kids, elderly, etc.), but there does not seem to be a homogeneous pattern of lag effect predictability.

Like for the % increase in hospitalization, where the data arising from the 2 different methodologies where overlapping at a great extent, also for the lag at which the highest RR (Relative Risk) is displayed, 70% (28 out of 40) of the calculated data points result within 1–2 maximum lag day difference. Plotting the RR utilizing the case-crossover “Gasparrini” design resulted in an easier identification of the lags and a more intuitive understanding of the entity of the effect.

Study limitations and further research opportunities

The study limitations for the time-series analysis have been discussed in our earlier publication. Common to both methodologies and typical in similar studies in this research area, the data on ambient air pollutant concentrations and meteorological conditions were obtained by databases that utilize mainly fixed monitoring stations. The assumption is that these values represent appropriately the mean exposure for the population, which obviously leads to a measurement error as not all individuals are exposed to the same pollution level. The two study designs have different biases, and the analysis of both together offers the opportunity to look at the same dataset with two different methodologies and hence potentially reduce or test the bias. Typically, the case-crossover design assumes a step-like seasonal change (Zanobetti et al. 2003), while time-series studies assumes a smooth change in the seasonal pattern. Consistent results have been published using both the time-series and the case case-crossover analysis, which supports that both statistical analyses can be performed to estimate air pollution and hospitalization patterns producing robust and comparable results. Further analysis would be beneficial in general to understand better the lag effect structures and perhaps a third statistical method, perhaps using neural networks, could be used to further evaluate the data set. In addition, a further analysis on a lag window longer than 10 days could be performed, also in the light of new literature findings that show that in some cases there can be short time health effects recorded at longer lag intervals (Heresh et al. 2019).

Conclusions

These analyses show a positive association between air pollutants and hospitalizations for respiratory disease in 5 cities in Poland during the years 2014–2017. The case-crossover design study confirms the previous time-series findings on the same dataset, corroborating the finding that both methodologies are good for studies where case-related health effects are being investigated. By comparing the two methodologies, we found the time series to have narrower CIs. Also for the lag effect, the time-series analysis has shown a narrower time window between the lowest and highest potential lag.