Introduction

Ambient air pollution is recognized to adversely affect health (Arbex et al. 2012). Several studies conducted in almost all parts of the world have found that day-to-day increases in pollution levels are associated with different pathologies, respiratory tract disease (Kim et al. 2018; Cerezo Hernández et al. 2018), asthma (Zheng et al. 2015), increased COPD exacerbations (Moore et al. 2016), cardiovascular (Analitis et al. 2006) and cerebrovascular diseases—stroke (Tian et al. 2017), etc. Several theories on the pathogenesis of these effects of ambient pollution have been put forward (Bernstein et al. 2004), but overall, the area remains poorly understood, and there is no consensus on which constituents (Kampa and Castanas 2008) of air pollution are most harmful (Brunekreef and Holgate 2002). The large Polish (Nabrdalic and Samora 2018) cities and in general Eastern European cities are recognized to have poorer air quality relative to other cities in Europe (Katsouyanni et al. 1996; Zmirou et al. 1998). Higher pollution levels in Polish cities are caused, in part, by sources that include coal-powered electricity generating stations and heating sources. Despite the recognition that Polish cities have perhaps poorer air quality than other Western European cities, up to date only few large-scale statistical analysis have been performed (Pac et al. 2013; Haluszka et al. 1998; Niepsuj et al. 1998) on the potential impact associations between day-to-day fluctuations in air pollution levels and hospitalizations. At this time, we know of no study that analyses data from more than 20 million hospitalizations and ED visits (ED visits which turn into hospitalizations and/or ED visits which require a hospital diagnostic or specialized medical visit or intervention are logged into the database with an ICD-10 classification and therefore will be reported as “hospitalizations”—these data will include the specialized ambulatory care) in the study area. The goal of this analysis was to investigate the association between different air pollutants and hospitalizations in a multi-city time-series observation, considering the potential influence of key meteorological parameters. This data analysis has been possible thanks to the access to the electronic registry of the Polish National Healthcare fund, to the data of the Institute of Meteorology and Water Management and Chief Inspectorate of Environmental Policy.

Methods

Core data source

The data related to the number of hospitalizations in the cities of Warsaw, Białystok, Bielsko-Biała, Kraków, and Gdańsk were obtained from the reporting system of the NHF (in Polish: Narodowy Fundusz Zdrowia) and covered a period of almost 4 years (2014–2017, 1255 days). The International Classification of Diseases 10th (ICD-10) revision coding was used to identify the different diagnoses at admission (the following ICD-10 categories were considered: F, mental and behavioral disorders; G, diseases of the nervous system; H, dis. of the eye, adnexa, of the ear mastoid process; I, diseases of the circulatory system; J, diseases of the respiratory system; L, diseases of the skin and subcutaneous tissue; S/T, injury, poisoning, and other cons. of ext. causes) for the period between January 1, 2014, and August 1, 2017.

Data on the concentration of air pollution were obtained from the Chief Inspectorate for Environmental Protection (GIOS) and included NO, NOx, NO2, O3, SO2, PM2.5, PM10, PM10_24, and PM2.5_24. Daily (obtained from manual stations) and hourly data (obtained from automatic stations, coded as 24) have been used in the analysis. Meteorological data have been gathered from the Institute of Meteorology and Water Management (IMGW) that have beacons in the Polish cities and included temperature, main wind speed, and precipitations.

Sample preparation

In this time-series analysis, to account for the great data variability encountered on the different week days (Faryar 2013; De Pablo Dávila et al. 2013; Sun et al. 2009; Tai et al. 2006), we normalized the data sample per week day, season, and bank holidays, calculating a ratio of observed number of patients by mean number of patients in the particular day of the week, or holiday. In addition, and specifically for the analysis explained in the “Cardiovascular and respiratory test” section below, 7-day averages for weather and air pollution data have been used and holiday periods and bank holidays have been omitted from the sample.

From a preliminary data correlation analysis, it was evident, that at least for some ICD-10 categories (mainly ICD-10 = J, respiratory diseases), a strong correlation between temperature (Chan et al. 2013) and hospitalizations was present. To account for this fact, we normalized the data set also by temperature.

No further significant correlations with other meteorological values—windspeed, precipitations, pressure, and humidity—has been found (Zhang et al. 2014), and therefore, no further normalizations have been added to the dataset.

Correlation analysis and DLNM

On such normalized data set, the hypothesized association between air pollution and number of hospitalizations was analyzed using at first a simple correlation analysis. As the association between air pollution and respiratory illness may be delayed in time (Zhang et al. 2018; Sinclair and Tolsma 2004), a potential lag effect from 0 to 10 days has been taken into consideration (Taj et al. 2017; Lall et al. 2011). To further explore the lag effect, on the data that showed bigger potential association, the correlation analyses was combined with a distributed lag nonlinear model (DLNM) (Gasparrini et al. 2010, 2012). The lag cumulative effect was considered over all lags from 0 to 10 days. The chosen method was the Almon method (Almon 1965), which can handle DLNM (Almon lag model 2018), is largely used, and for which several open softwares are available. The distributed lag analysis has been performed in Statistica software using the Almon lag model (Statistica 2018).

The model can be shortly written as

\( y(t)=\sum \limits_{i=\phi}^k{\beta}_{{}_ix\left(t-i\right)+\varepsilon (t)} \)

where the xi predictor variables of y used in the model represent observations made periodically during a continuous time period beginning at some time before y was observed and ending at the time of observation of y. Models of this kind are known as distributed lag models and are useful when changes in the independent variable x have an effect on the value of y over many samples of y. Typically, in this bivariate distributed lag model, if x and y are observed at identical periods at the same frequency, t, bivariate observations will be made of y(t) and x(t). The percentage of number of patients’ increase was calculated based on the results of multiple regression, where response variable was number of patients (normalized by the day) and independent variables were pollution level and temperature. The increase of number of patients was estimated using coefficient of regression (slope) for pollution level multiplied by 10 (number of units of pollutants).

Cardiovascular and respiratory test

A subset of data has been selected (cardiovascular diseases and respiratory diseases—ICD-10: I10–I15, I20–I24, I26, I40, I41, I44–I49, I50, I60–I68, I74, I80–I82, J00–J46) to focus the analysis on both a broader dataset first, and narrowing data next with a higher probability of association, as well as to test the sensitivity of the results. The pollutant (PM2.5 and PM10) levels and data on weather conditions were computed as a 7-day moving average. Furthermore, data points of Saturday, Sunday, and holidays were omitted. Regression analysis was further performed using the following variables, separately for each city:

  1. (a)

    Logarithm values from the average of the last 7 days for particulate matter concentrations PM10 and PM2.5—where PM logarithm: y = (/100)% x)

  2. (b)

    Average of the last 7 days for weather data (temperature—n °C), maximum wind speed (in 10 m/s), humidity (in %), pressure (in hPa), and sum of precipitation (in 10 mm)

  3. (c)

    The values of average squares from the last 7 days for weather variables

  4. (d)

    Zero variables for days of the week

Results

Descriptive statistics of the study setting

The hospitalizations statistics per ICD category are displayed in Table 1.

Table 1 Mean hospitalizations per day per ICD-10

A proportionally large variability due to seasonality and day of the week was clearly observed (large SDs).

The air pollutant statistics are displayed in Table 2. The cities that displayed the highest pollution index were Krakow (mean NO2 65.34 ppb, PM2.5 68.38 μg/m3, PM10 89.85 μg/m3) and Warsaw (mean NO2 61.82 ppb, PM2.5 38.09 μg/m3, PM10 57.15 μg/m3). For several cities, the air pollutant values have often crossed significantly the EU guideline “Air Quality Standards” level, and in some cases (particulate matter and NO2), the mean value during the study period was already above such limits (Air Quality Standards 2008; Dąbrowiecki et al. 2018).

Table 2 Descriptive statistics of air pollution

In Figs. 1 and 2, the hospitalization statistics and the air pollution values are graphically displayed for the largest city, Warsaw. It has been chosen as a representative city because of the highest number of inhabitants and highest pollution grade.

Fig. 1
figure 1

Mean weekly pollutant concentrations in Warsaw. SDs are presented as error bars

Fig. 2
figure 2

Mean number of patients per day in Warsaw based on weekly data for various types of ICD-10 categories. SDs are presented as error bars

The meteorological statistics are displayed in Table 3. The values are coherent with a continental climate region with relatively little wind—in comparison with other climates, e.g., Mediterranean—so with a limited chance for the weather to dilute the air pollutants.

Table 3 Meteorological values statistics

Relationship of meteorological values with hospitalizations

To evaluate the relationships between weather variables and the number of hospitalized patients, we calculated the corresponding correlation coefficients. Through this, we evaluated the strength of relationships and the direction (negative or positive) of their influence. Squared values of correlation coefficients (R2) in case of temperature were recorded between 35 and 50% (depending on the ICD-10 category), for wind speed 1–3% and for precipitation only about 1%. As an example, the correlation coefficients for all the measured meteorological variables for ICD-1 = J (i.e., respiratory diseases) are reported in Table 4.

Table 4 Correlation coefficients between weather variables and number of patients ICD-10 = J (normalized by day of the week)

Relationship of different pollutants and effects on hospitalizations

For each pollutant and for each city, a correlation table was generated plotting the ICD-10 hospitalization diagnosis versus the lag (days from 0 to 10). The highest absolute value of the correlation coefficients recorded by this methodology is reported in Table 5: the highlighted values represent the highest 25th percentile. The highest recorded correlation is clearly identified in the ICD-10 J column (i.e., respiratory diseases).

Table 5 Correlation coefficients between pollutants and patients hospitalized in the different ICD-10 categories

Respiratory disease hospitalizations

The deeper analysis using the Almon model algorithm has been applied to the respiratory disease sub-data, and the P values for the distributed lag model (Almond method) were plotted, where the pollutant was the independent variable (cause) and the number of events (hospitalizations) was the dependent variable (always with lags from 0 to 10 days). The results of the highest recorded correlation/lag day, as well as the calculated coefficient of % increase per each 10 units of increased pollutant at the specific identified lag are displayed in Table 6.

Table 6 Percent increase of hospital admissions for respiratory disease/lag (days)

Several pollutants show a statistically significant and correlated increase in hospitalizations, with the largest effect (as well as consistent among the different cities) being the one of the particulate matter, PM2.5 and PM10.

Subset analysis on cardiovascular disease and respiratory disease hospitalizations

The results of the subset data analysis on cardiovascular disease and respiratory tract disease with the 7-day average pollutant values and the method described in “Cardiovascular and respiratory test” section have provided a similar result, displayed in Table 7. In Fig. 3, the sample plot of the respiratory patients in Warsaw (ICD-10=J) versus the 7 day particulate matter concentration average (linear and logarithm) is being displayed.

Table 7 Percent increase in patients per each 10-unit increase in pollutant concentration

For all examined cities, the impact of changes in the average PM10 concentration level from the last 7 days on hospital admissions due to respiratory diseases is statistically significant at the significance level of 0.01. With 10% increase in PM10 concentration from the mean, for example, the number of patients increases on average by 25 patients (1.7% of the average number patients) in Bialystok, 26.8 (1.4% of mean) in Gdansk, 46 (1.4%) in Krakow, 18 patients (2.3% of average) in Bielsko-Biała, and 111.7 people (2.1%) in Warsaw. The average PM10 concentration from the previous 7 days had a statistically significant impact on hospital admissions due to cardiovascular system diseases for all cities studied except of Warsaw. Although statistically significant, correlations were weak. Similar dependencies apply to the model for PM2.5 and cardiovascular disease. The biggest effect of the increase in concentration particulate matter by 10% is an increase of 0.9% in the number of patients in Bialystok

Fig. 3
figure 3

Relationships between the number of patients with respiratory diseases (J) and PM particulate matter concentration level in Warsaw (logarithm of 7-day average)

Discussion

Key results

In this analysis, we have found positive associations between ambient levels of pollutants (mainly PM2.5 and PM10) and hospitalizations. A positive association between air pollution and acute respiratory disease health impact/hospitalization was to be expected (Zhang et al. 2018; Sinclair and Tolsma 2004, Sinclair et al. 2010; Vahedian et al. 2017). The pollution levels at which these results were recorded, even though on a proportionally high pollution range (Air Quality Standards; Directive 2008; WHO Air Quality 2005; WHO 2013), were still not in a “critical” range similar to the London smog 1952 or the Asian smoke-haze event of 1997 (Bell and Davis 2001and Bell et al. 2004, Heil and Goldammer 2001) (see Table 2).

The peak effect on hospitalizations increase has been found with a time lag of 3–6 (sometimes up to 7) days (see Table 6). Such lag effect for respiratory disease hospitalizations show similarities with earlier findings (Sinclair and Tolsma 2004; Sinclair et al. 2010; de Souza et al. 2014).

Several mechanisms have been suggested (Esposito,  Tenconi et al., 2014) to explain the adverse effects of air pollutants. The most consistent and most widely accepted explanation (Chauhan and Johnston 2003; Arbex et al. 2012) is that, once in contact with the respiratory epithelium, high concentrations of oxidants and pro-oxidants in environmental pollutants such as PM of various sizes and compositions and in gases cause the formation of oxygen and nitrogen-free radicals, which in turn induce oxidative stress in the airways. In other words, an increase in free radicals that are not neutralized by antioxidant defenses initiate an inflammatory response with release of inflammatory cells and mediators (cytokines, chemokines, and adhesion molecules) that reach the systemic circulation, leading to subclinical inflammation, which not only has a negative effect on the respiratory system but also causes systemic effects. These processes may take a discrete amount of days to lead to clinically relevant symptoms that require, due to their severity, medical attention and/or hospitalization.

A more limited correlated and statistically significant association has been found between the different pollutant levels and cardiovascular disease (CVD). Several years ago, in a study covering some Eastern European cities, on a smaller sample case (Katsouyanni et al. 1997) and looking at mortality rate, and not on hospitalization rates, a somehow similarly trending result has been reported (Samoli et al. 2001).

To some extent, the result of poorer correlation with CVD could be explained first in a high preexisting baseline, i.e., the underlying relatively high prevalence of cardiovascular disease which shows a relatively high baseline rate of CVD hospitalizations (Szafraniec-Burylo et al. 2016), rendering the peaks which might be generated by the excess exposure to pollutants less visible. Another possible input in the study results is the source of the pollutant. In general, the “smoke” pollution in Poland is particularly influenced by a high degree of utilization of charcoal as heating (Nabrdalic and Samora 2018). In all the European Union, 80% of private homes using coal are in Poland. Scientific debate is currently ongoing (Hime et al. 2018) on the health effects on particulate pollution depending on the source of pollutant. The charcoal burning fumes would contain PM particles with a higher SOx-bound component than, for example, diesel exhaust and might therefore have a greater influence on the respiratory tract (sulfur oxides are toxic urticants). In addition, other bias factors such as underlying morbidity, age, etc., could have had an impact to the results.

Study limitations

Limitation of this study are represented by the missing stratification of the hospitalizations between age groups and gender, which could help identifying clearer trends and additional subanalysis could have been made. In addition, as the data source consisted of aggregated daily records of hospitalizations, the statistical significance of the results is weaker. As a last point, while the temperature effect on hospitalization (e.g., flu season related hospitalizations) has been tackled normalizing the sample data (as described in the “Sample preparation” section), such normalization does not totally separate the potential cause/effect or combined effect of pollution/season from the results.

Generalizability and further analysis

The overall number of hospitalizations captured by the analysis (over 20 million) is large compared to literature (Moore et al. 2016; Zhang et al. 2018; Stieb et al. 2009), in particular in Europe, and the results per se could therefore be interesting to also help predict hospitalization trends and quantify the needs for environmental preventive measures to help minimize the healthcare impact and costs associated with such events. It is important to note, however, that the aggregated daily records’ data source do constitute a limitation to the strength of the statistical analysis.

Also for this reason, on the same dataset, further analysis could be performed using different statistical methodologies, e.g., a case-crossover data setup (Lu and Zeger 2007) or artificial neural networks (Fang 2018; Polezer et al. 2018) to further test the sensitivity of these results.

Conclusions

Ambient air pollution exposure increases were associated with an increase of hospitalizations due to respiratory tract diseases in a large time-series observation in five major polish cities in the years 2014–2017. The most prominent effect was recorded with the correlation of PM2.5 and PM10. There was weak evidence of short-term associations between peaks of air pollution concentrations and increased hospitalizations for cardiovascular diseases. A further work on a dataset enabling a better stratification of the sample (e.g., age, gender, and likewise a detailed admission analysis, e.g., sub-stratum for COPD, lower respiratory tract infection, asthma, etc.) would provide a better insight on the subject matter.