Impact of air pollution on hospital admissions with a focus on respiratory diseases: a time-series multi-city analysis.

Together with the growing availability of data from electronic records from healthcare providers and healthcare systems, an assessment of associations between different environmental parameters (e.g., pollution levels and meteorological data) and hospitalizations, morbidity, and mortality has become possible. This study aimed to assess the association of air pollution and hospitalizations using a large database comprising almost all hospitalizations in Poland. This time-series analysis has been conducted in five cities in Poland (Warsaw, Białystok, Bielsko-Biała, Kraków, Gdańsk) over a period of almost 4 years (2014–2017, 1255 days), covering more than 20 million of hospitalizations. The hospitalizations have been extracted from the National Health Fund registries as daily summaries. Correlation analysis and distributed lag nonlinear models have been used to investigate for statistically relevant associations of air pollutants on hospitalizations, trying by various methods to minimize potential bias from atmospheric parameters, days of the week, bank holidays, etc. A statistically significant increase of respiratory disease hospitalizations has been detected after peaks of particulate matter concentrations (particularly PM2.5, between 0.9 and 4.5% increase per 10 units of pollutant increase, and PM10, between 0.9 and 3.5% per 10 units of pollutant increase), with a typical time lag between the pollutant peak and the event of 2 to 6 days. For other pollution parameters and other types of hospitalizations (e.g., cardiovascular events, eye and skin diseases, etc.), a weaker and ununiform correlations were recorded. Ambient air pollution exposure increases are associated with a short-term increase of hospitalizations due to respiratory tract diseases. The most prominent effect was recorded with the correlation of PM2.5 and PM10. There is only weak evidence indicating that such short-term associations exist between peaks of air pollution concentrations and increased hospitalizations for other (e.g., cardiovascular) diseases. The obtained information could be used to better predict hospitalization patterns and costs for the healthcare system and perhaps trigger additional vigilance on particulate matter pollution in the cities.


Introduction
Ambient air pollution is recognized to adversely affect health (Arbex et al. 2012). Several studies conducted in almost all parts of the world have found that day-to-day increases in pollution levels are associated with different pathologies, respiratory tract disease (Kim et al. 2018;Cerezo Hernández et al. 2018), asthma (Zheng et al. 2015), increased COPD exacerbations (Moore et al. 2016), cardiovascular (Analitis et al. 2006) and cerebrovascular diseases-stroke (Tian et al. 2017), etc. Several theories on the pathogenesis of these effects of ambient pollution have been put forward (Bernstein et al. 2004), but overall, the area remains poorly understood, and there is no consensus on which constituents (Kampa and Castanas 2008) of air pollution are most harmful (Brunekreef and Holgate 2002). The large Polish (Nabrdalic and Samora 2018) cities and in general Eastern European cities are recognized to have poorer a i r q u a l i t y r e l a t i v e t o o t h e r c i t i e s i n E u r o p e (Katsouyanni et al. 1996;Zmirou et al. 1998). Higher pollution levels in Polish cities are caused, in part, by sources that include coal-powered electricity generating stations and heating sources. Despite the recognition that Polish cities have perhaps poorer air quality than other Western European cities, up to date only few large-scale statistical analysis have been performed (Pac et al. 2013;Haluszka et al. 1998;Niepsuj et al. 1998) on the potential impact associations between day-to-day fluctuations in air pollution levels and hospitalizations. At this time, we know of no study that analyses data from more than 20 million hospitalizations and ED visits (ED visits which turn into hospitalizations and/or ED visits which require a hospital diagnostic or specialized medical visit or intervention are logged into the database with an ICD-10 class i f i c a t i o n a n d t h e r e f o r e w i l l b e r e p o r t e d a s Bhospitalizations^-these data will include the specialized ambulatory care) in the study area. The goal of this analysis was to investigate the association between different air pollutants and hospitalizations in a multi-city time-series observation, considering the potential influence of key meteorological parameters. This data analysis has been possible thanks to the access to the electronic registry of the Polish National Healthcare fund, to the data of the Institute of Meteorology and Water Management and Chief Inspectorate of Environmental Policy.

Core data source
The data related to the number of hospitalizations in the cities of Warsaw, Białystok, Bielsko-Biała, Kraków, and Gdańsk were obtained from the reporting system of the NHF (in Polish: Narodowy Fundusz Zdrowia) and covered a period of almost 4 years (2014-2017, 1255 days). The International Classification of Diseases 10th (ICD-10) revision coding was used to identify the different diagnoses at admission (the following ICD-10 categories were considered: F, mental and behavioral disorders; G, diseases of the nervous system; H, dis. of the eye, adnexa, of the ear mastoid process; I, diseases of the circulatory system; J, diseases of the respiratory system; L, diseases of the skin and subcutaneous tissue; S/T, injury, poisoning, and other cons. of ext. causes) for the period between January 1, 2014, and August 1, 2017.
Data on the concentration of air pollution were obtained from the Chief Inspectorate for Environmental Protection (GIOS) and included NO, NO x , NO 2 , O 3 , SO 2 , PM 2.5 , PM 10 , PM 10_ 24, and PM 2.5_ 24. Daily (obtained from manual stations) and hourly data (obtained from automatic stations, coded as 24) have been used in the analysis. Meteorological data have been gathered from the Institute of Meteorology and Water Management (IMGW) that have beacons in the Polish cities and included temperature, main wind speed, and precipitations.

Sample preparation
In this time-series analysis, to account for the great data variability encountered on the different week days (Faryar 2013;De Pablo Dávila et al. 2013;Sun et al. 2009;Tai et al. 2006), we normalized the data sample per week day, season, and bank holidays, calculating a ratio of observed number of patients by mean number of patients in the particular day of the week, or holiday. In addition, and specifically for the analysis explained in the BCardiovascular and respiratory test^section below, 7-day averages for weather and air pollution data have been used and holiday periods and bank holidays have been omitted from the sample.
From a preliminary data correlation analysis, it was evident, that at least for some ICD-10 categories (mainly ICD-10 = J, respiratory diseases), a strong correlation between temperature (Chan et al. 2013) and hospitalizations was present. To account for this fact, we normalized the data set also by temperature.
No further significant correlations with other meteorological values-windspeed, precipitations, pressure, and humidity-has been found (Zhang et al. 2014), and therefore, no further normalizations have been added to the dataset.

Correlation analysis and DLNM
On such normalized data set, the hypothesized association between air pollution and number of hospitalizations was analyzed using at first a simple correlation analysis. As the association between air pollution and respiratory illness may be delayed in time (Zhang et al. 2018;Sinclair and Tolsma 2004), a potential lag effect from 0 to 10 days has been taken into consideration (Taj et al. 2017;Lall et al. 2011). To further explore the lag effect, on the data that showed bigger potential association, the correlation analyses was combined with a distributed lag nonlinear model (DLNM) (Gasparrini et al. 2010(Gasparrini et al. , 2012. The lag cumulative effect was considered over all lags from 0 to 10 days. The chosen method was the Almon method (Almon 1965), which can handle DLNM (Almon lag model 2018), is largely used, and for which several open softwares are available. The distributed lag analysis has been performed in Statistica software using the Almon lag model (Statistica 2018).
The model can be shortly written as where the x i predictor variables of y used in the model represent observations made periodically during a continuous time period beginning at some time before y was observed and ending at the time of observation of y. Models of this kind are known as distributed lag models and are useful when changes in the independent variable x have an effect on the value of y over many samples of y. Typically, in this bivariate distributed lag model, if x and y are observed at identical periods at the same frequency, t, bivariate observations will be made of y(t) and x(t). The percentage of number of patients' increase was calculated based on the results of multiple regression, where response variable was number of patients (normalized by the day) and independent variables were pollution level and temperature. The increase of number of patients was estimated using coefficient of regression (slope) for pollution level multiplied by 10 (number of units of pollutants).

Cardiovascular and respiratory test
A subset of data has been selected (cardiovascular diseases and respiratory diseases-ICD-10: I10-I15, I20-I24, I26, I40, I41, I44-I49, I50, I60-I68, I74, I80-I82, J00-J46) to focus the analysis on both a broader dataset first, and  narrowing data next with a higher probability of association, as well as to test the sensitivity of the results. The pollutant (PM 2.5 and PM 10 ) levels and data on weather conditions were computed as a 7-day moving average. Furthermore, data points of Saturday, Sunday, and holidays were omitted. Regression analysis was further performed using the following variables, separately for each city: (a) Logarithm values from the average of the last 7 days for particulate matter concentrations PM 10 and PM 2.5where PM logarithm: y = (/100)% x) (b) Average of the last 7 days for weather data (temperature-n°C), maximum wind speed (in 10 m/s), humidity (in %), pressure (in hPa), and sum of precipitation (in 10 mm) (c) The values of average squares from the last 7 days for weather variables (d) Zero variables for days of the week

Descriptive statistics of the study setting
The hospitalizations statistics per ICD category are displayed in Table 1. A proportionally large variability due to seasonality and day of the week was clearly observed (large SDs).
The air pollutant statistics are displayed in Table 2. The cities that displayed the highest pollution index were Krakow (mean NO 2 65.34 ppb, PM 2.5 68.38 μg/m 3 , PM 10 89.85 μg/m 3 ) and Warsaw (mean NO 2 61.82 ppb, PM 2.5 38.09 μg/m 3 , PM 10 57.15 μg/m 3 ). For several cities, the air pollutant values have often crossed significantly the EU guideline BAir Quality Standardsl evel, and in some cases (particulate matter and NO 2 ), the mean value during the study period was already above such limits (Air Quality Standards 2008; Dąbrowiecki et al. 2018).
In Figs. 1 and 2, the hospitalization statistics and the air pollution values are graphically displayed for the largest city, Warsaw. It has been chosen as a representative city because of the highest number of inhabitants and highest pollution grade.
The meteorological statistics are displayed in Table 3. The values are coherent with a continental climate region with relatively little wind-in comparison with other climates, e.g., Mediterranean-so with a limited chance for the weather to dilute the air pollutants.

Relationship of meteorological values with hospitalizations
To evaluate the relationships between weather variables and the number of hospitalized patients, we calculated the corresponding correlation coefficients. Through this, we evaluated the strength of relationships and the direction (negative or positive) of their influence. Squared values of correlation coefficients (R 2 ) in case of temperature were recorded between 35 and 50% (depending on the ICD-10 category), for wind speed 1-3% and for precipitation only about 1%. As an example, the correlation coefficients for all the measured meteorological variables for ICD-1 = J (i.e., respiratory diseases) are reported in Table 4.
Relationship of different pollutants and effects on hospitalizations For each pollutant and for each city, a correlation table was generated plotting the ICD-10 hospitalization diagnosis versus the lag (days from 0 to 10). The highest absolute value of the correlation coefficients recorded by this methodology is reported in Table 5: the highlighted values represent the highest 25th percentile. The highest recorded correlation is clearly identified in the ICD-10 J column (i.e., respiratory diseases).

Respiratory disease hospitalizations
The deeper analysis using the Almon model algorithm has been applied to the respiratory disease sub-data, and the P values for the distributed lag model (Almond method) were plotted, where the pollutant was the independent variable (cause) and the number of events (hospitalizations) was the dependent variable (always with lags from 0 to 10 days). The results of the highest recorded correlation/lag day, as well as the calculated coefficient of % increase per each 10 units of increased pollutant at the specific identified lag are displayed in Table 6.
Several pollutants show a statistically significant and correlated increase in hospitalizations, with the largest effect (as well as consistent among the different cities) being the one of the particulate matter, PM 2.5 and PM 10 .

Subset analysis on cardiovascular disease and respiratory disease hospitalizations
The results of the subset data analysis on cardiovascular disease and respiratory tract disease with the 7-day average pollutant values and the method described in BCardiovascular and respiratory test^section have provided a similar result, displayed in Table 7. In Fig. 3, the sample plot of the respiratory patients in Warsaw (ICD-10=J) versus the 7 day particulate matter concentration average (linear and logarithm) is being displayed.
For all examined cities, the impact of changes in the average PM 10 concentration level from the last 7 days on hospital admissions due to respiratory diseases is statistically significant at the significance level of 0.01. With 10% increase in PM 10 concentration from the mean, for example, the number of patients increases on average by 25 patients (1.7% of the average number patients) in Bialystok, 26.8 (1.4% of mean) in Gdansk, 46 (1.4%) in Krakow, 18 patients (2.3% of average) in Bielsko-Biała, and 111.7 people (2.1%) in Warsaw. The average PM 10 concentration from the previous 7 days had a statistically significant impact on hospital admissions due to cardiovascular system

Key results
In this analysis, we have found positive associations between ambient levels of pollutants (mainly PM 2.5 and PM 10 ) and hospitalizations. A positive association between air pollution and acute respiratory disease health impact/hospitalization was to be expected (Zhang et al. 2018;Tolsma 2004, Sinclair et al. 2010;Vahedian et al. 2017). The pollution levels at which these results were recorded, even though on a proportionally high pollution range (Air Quality Standards; Directive 2008; WHO Air Quality 2005; WHO 2013), were still not in a Bcritical^range similar to the London smog 1952 or the Asian smoke-haze event of 1997 (Bell and Davis 2001and Bell et al. 2004, Heil and Goldammer 2001) (see Table 2). The peak effect on hospitalizations increase has been found with a time lag of 3-6 (sometimes up to 7) days (see Table 6). Such lag effect for respiratory disease hospitalizations show similarities with earlier findings (Sinclair and Tolsma 2004;Sinclair et al. 2010;de Souza et al. 2014).
Several mechanisms have been suggested (Esposito, Tenconi et al., 2014) to explain the adverse effects of air pollutants. The most consistent and most widely accepted explanation (Chauhan and Johnston 2003;Arbex et al. 2012) is that, once in contact with the respiratory epithelium, high concentrations of oxidants and pro-oxidants in environmental pollutants such as PM of various sizes and compositions and in gases cause the formation of oxygen and nitrogen-free radicals, which in turn induce oxidative stress in the airways. In other words, an increase in free radicals that are not neutralized by antioxidant defenses initiate an inflammatory response All p values are below 0.000 with exception Gdańsk values of SO 2 (p value 0,237) and PM 10 (p value 0.054) % % increase of hospitalizations per each 10 additional pollutant units, Lag days intersection of the lowest P value from Almon model and the strongest correlation coefficient, A measurements not available *Highlighted values represent a lower correlation coefficient (as seen in Table 5) a The results for all cities together were calculated using multiple linear regression were city was treated as dummy variable with release of inflammatory cells and mediators (cytokines, chemokines, and adhesion molecules) that reach the systemic circulation, leading to subclinical inflammation, which not only has a negative effect on the respiratory system but also causes systemic effects. These processes may take a discrete amount of days to lead to clinically relevant symptoms that require, due to their severity, medical attention and/or hospitalization. A more limited correlated and statistically significant association has been found between the different pollutant levels and cardiovascular disease (CVD). Several years ago, in a study covering some Eastern European cities, on a smaller sample case (Katsouyanni et al. 1997) and looking at mortality rate, and not on hospitalization rates, a somehow similarly trending result has been reported (Samoli et al. 2001).
To some extent, the result of poorer correlation with CVD could be explained first in a high preexisting baseline, i.e., the underlying relatively high prevalence of cardiovascular disease which shows a relatively high baseline rate of CVD hospitalizations (Szafraniec-Burylo et al. 2016), rendering the peaks which might be generated by the excess exposure to pollutants less visible. Another possible input in the study results is the source of the pollutant. In general, the Bsmokep ollution in Poland is particularly influenced by a high degree of utilization of charcoal as heating (Nabrdalic and Samora 2018). In all the European Union, 80% of private homes using Fig. 3 Relationships between the number of patients with respiratory diseases (J) and PM particulate matter concentration level in Warsaw (logarithm of 7-day average) coal are in Poland. Scientific debate is currently ongoing (Hime et al. 2018) on the health effects on particulate pollution depending on the source of pollutant. The charcoal burning fumes would contain PM particles with a higher SO x -bound component than, for example, diesel exhaust and might therefore have a greater influence on the respiratory tract (sulfur oxides are toxic urticants). In addition, other bias factors such as underlying morbidity, age, etc., could have had an impact to the results.

Study limitations
Limitation of this study are represented by the missing stratification of the hospitalizations between age groups and gender, which could help identifying clearer trends and additional subanalysis could have been made. In addition, as the data source consisted of aggregated daily records of hospitalizations, the statistical significance of the results is weaker. As a last point, while the temperature effect on hospitalization (e.g., flu season related hospitalizations) has been tackled normalizing the sample data (as described in the BSample preparation^section), such normalization does not totally separate the potential cause/effect or combined effect of pollution/season from the results.

Generalizability and further analysis
The overall number of hospitalizations captured by the analysis (over 20 million) is large compared to literature (Moore et al. 2016;Zhang et al. 2018;Stieb et al. 2009), in particular in Europe, and the results per se could therefore be interesting to also help predict hospitalization trends and quantify the needs for environmental preventive measures to help minimize the healthcare impact and costs associated with such events. It is important to note, however, that the aggregated daily records' data source do constitute a limitation to the strength of the statistical analysis.
Also for this reason, on the same dataset, further analysis could be performed using different statistical methodologies, e.g., a case-crossover data setup (Lu and Zeger 2007) or artificial neural networks (Fang 2018;Polezer et al. 2018) to further test the sensitivity of these results.

Conclusions
Ambient air pollution exposure increases were associated with an increase of hospitalizations due to respiratory tract diseases in a large time-series observation in five major polish cities in the years 2014-2017. The most prominent effect was recorded with the correlation of PM 2.5 and PM 10 . There was weak evidence of short-term associations between peaks of air pollution concentrations and increased hospitalizations for cardiovascular diseases. A further work on a dataset enabling a better stratification of the sample (e.g., age, gender, and likewise a detailed admission analysis, e.g., sub-stratum for COPD, lower respiratory tract infection, asthma, etc.) would provide a better insight on the subject matter.