Introduction

Lung cancer was the most common cancer worldwide with an estimated 2.09 million cancer cases in the latest, 2018 World Health Organization (WHO) report. It ranked as the first most common cancer for men (31.5 per 100,000) and second most common for women (14.6 per 100,000) after breast cancer. Lung cancer contributed to 18.3% of all cancer deaths worldwide1. The age standardized incidence rate (ASR) of lung cancer in 2015, in Iran was 8.52 per 100,000 ranging from 12.37 per 100,000 in West Azerbaijan to 3.84 per 100,000 in Sistan and Bluchestan province; and in Tehran (the country capital) was 8.74 per 100,000, which was more than the incidence in the whole country (Iran)2.

Air pollution is likely to be a risk factor for lung cancer. According to WHO, about 24% of the global burden of disease and 23% of global deaths can be attributed to environmental factors1,3. Air pollution consists of a complex mixture of particles and gases4, in differing amounts, depending on the types of emission sources and atmospheric conditions. Typically, air pollution consists of a range of pollutants such as soot, particulate matter (PM) ≤ 10 µm (PM10), and PM ≤ 2.5 µm (PM2.5), nitrogen dioxide (NO2), sulfur dioxide (SO2), carbon monoxide (CO), ozone (O3) and volatile organic compounds (VOCs)5. The latest International Agency for Research on Cancer (IARC) report has mentioned PM2.5 as a risk factor for lung cancer6 and the ELAPSE study in Europe showed that long-term ambient PM2.5 exposure may cause lung cancer even at concentrations lower than current European threshold values and WHO Air Quality Guidelines7.

Humans are simultaneously exposed to a complex mixture of air pollutants; therefore, many researchers have investigated a multiple-pollutant approach for assessing air pollution exposure8, because in single pollutant models, it is not clear if an observed association reflects the effect of the specific pollutant under study, or the effect of coinciding pollutants9. However, there is no consensus on the method used for measuring multiple ambient air pollutants simultaneously. Previous review studies have evaluated multiple pollutants by using methods that can be classified into three groups; dimension reduction, variable selection, and grouping of observations4,8,10,11. However, Caban-Martinez, et al.12 and Kolpacoff, et al.13 have used latent profile analysis (LPA) to identify subgroups of cancerous patients with different multi-pollution profiles. LPA is a probabilistic or model-based technique and is a variant of the traditional cluster analysis, which better handles outliers and unequal cluster sizes14. This method enables identifying possible unobservable subgroups, or latent classes in a population using a number of related observable variables. Using this method, complex relations between groups of risk factors and disease outcomes, such as cancer, which may not be best explained by a single pollutant model, can be better understood. It also reduces the dimensionality of exposure data and decreases the burden of multiple testing, while enhancing the power of statistical analysis13.

Knowledge about geographical patterns of multiple pollution helps policy makers to target high risk regions for more intense interventions. Air pollution exacerbates the health disparity among socioeconomic groups, because usually the poor socio-economic areas are more polluted15,16. The relation between air pollution exposure and health outcomes can also be theoretically modified by socioeconomic status, through causing differences in access to medical care and healthy diet, and also by biological factors, such as age and psychological stress. However, the hypothesis that residents with a low socioeconomic position face more severe consequences from air pollution is debated16.

In the current study, we aimed to examine the association between single and multiple ambient air pollutants and lung cancer incidence in Tehran, Iran.

Methodology

Research location

This study was conducted in Tehran, a megacity which is the most populous city in Iran, with a residing population of about 9 million and a daytime population of over 14 million people. According to the World Population Review report, Tehran's 2020 population is now estimated to be 9,134,708. Tehran is the most populous city in Iran and Western Asia, and has the third-largest metropolitan area in the Middle East. It is ranked 24th in the world by the population of its metropolitan area. Tehran’s area extend is about 730 km2 and consists of 22 municipal districts with different concentrations of ambient air pollutants17.

Data sources

Lung cancer data

Totally, 1850 patients residing in Tehran were diagnosed with lung cancer (Trachea, Bronchus and Lung) between 2014 and 2016. The latest address of these patients' residence in Tehran was inquired from the Cancer Department of the Ministry of Health of Iran. The officials of the cancer registry claimed that the recorded addresses are more than 90% accurate.

The geographical coordinates (longitude and latitude) of the participants' residential addresses were determined according to the address of the patients' residence and was marked on the GIS map of Tehran.

Exposure assessment

The annual mean concentrations of PM10, SO2, NO, NO2, NOX, benzene, toluene, ethylbenzene, m-xylene, p-xylene, o-xylene (BTEX), and total BTEX in the 22 districts of Tehran were inquired from previously developed land use regression (LUR) models. The LUR models for PM10, SO2, NO, NO2 and NOX in Tehran were developed based on measurements conducted at 23 sites in Tehran in 201018,19. The models for volatile organic compounds (VOCs) were developed based on measurements at 179 sampling sites from April 2015 to May 201620,21.

Confounding covariates

Population-based data was extracted from the Urban Health Equity Assessment and Response Tool (Urban HEART-2), which has been conducted in 22 districts of Tehran and is a data repository that collected many district-level variables, such as population density, per capita urban green space, smoking rates and life expectancy in 2011. A detailed description of the Urban HEART-2 study can be found elsewhere22.

The socio-economic development situation of the 22 districts of Tehran was extracted from a study conducted by Sadeghi et al.23. In brief, sixteen economic and social indicators were incorporated to estimate the level of development in the 22 districts of Tehran, based on Exploratory Factor Analysis (EFA), Principal Component Analysis (PCA). This multivariate statistical technique is used to reduce the number of variables in a dataset into a smaller number of “dimensions” that explains most variations in the dataset using a few estimated substitutional latent variables23.

In Sadeghi et al.’s study, the variables used for estimating the social dimension included adult literacy level (among the 30–59 year olds), elderly literacy level (60 years and older), the proportion of university graduates in the total population, the proportion of the males with university education, the proportion of females with university education, and the percent of population that uses the internet. The variables used for estimating the economic part, included women's economic participation rate, the proportion of employees with high-rank jobs, the proportion of households with cars, the proportion of households with computers, indicators of household access to public facilities, the ratio of households who own their home, the proportion of homes with civil standards, the average price per square meter of residential building land, the average selling price per square meter of residential building area and the average monthly rent per square meter of residential area for each district. A detailed description of these variables can be found elsewhere23. A higher socio-economic development score showed a higher socio-economic level. This variable did not have a specific unit, its minimum was 36.6 and maximum was 67.423.

Statistical analyses

Latent profile analysis (LPA) was used to make multiple-pollution profiles14. A series of LPAs was performed, ranging from two to seven latent profiles. The 12 air pollutants (PM10, SO2, NO, NO2, NOX, benzene, toluene, ethylbenzene, m-xylene, p-xylene, o-xylene and TBTEX) were used as LPA indicators. The grand mean centering of all 12 pollutants were used in the analysis to facilitate interpretation. Grand-mean centering subtracts the overall mean from a variable24. To identify the best fitting model, measures of relative statistical fit and the interpretability of their profile structure were used. Models with low values in AIC, BIC, aBIC metrics, and a significant Bootstrap Likelihood Ratio Test (BLRT) were preferred25. The absolute value of log likelihood is not recommended to be used for model selection, because this value gradually improves by adding more parameters to the model. Entropy was also evaluated for each model; and values closer to 1 suggest a higher discrimination of the latent classes26. In addition, the interpretability and parsimony of the candidate models were compared. Models with profiles including less than 5% of the class size are considered spurious27, and are not acceptable. Data preparation was done in Stata version 14. LPA was performed using Mplus version 7.4 (Muthen & Muthen, 1998–2015) mixture modeling procedure, with the robust maximum likelihood (MLR) estimator. Missing data were addressed using full information maximum likelihood (FIML). To examine how the various multiple-pollution profiles differ in terms of each air pollution component, one-way ANOVA and post hoc follow-up tests were used.

Kolmogorov Smirnov tests were used to test the normality of the pollutants and because the data was normally distributed, the Pearson’s correlation test was used to estimate the correlation between pollutants. As the number of lung cancer cases was over-dispersed, negative binomial (NB) regression was performed to estimate the incidence rate ratios (IRR) and their 95% confidence intervals (CI) for each air pollutant and multiple pollution profiles, adjusted for age, sex, smoking at district level.

In order to calculate the population attributable fraction (PAF), the risk estimates for air pollutants were obtained from the results of negative binomial (NB) regression analysis.

The PAF for air pollutants as continuous variables, were estimated using the following equation28:

$$PAF=\frac{\mathrm{exp}\left[Ln\left(RRunit\right) \times \overline{x }\right]-1}{\mathrm{exp}[Ln\left(RRunit\right) \times \overline{x }]}$$

In this equation, RRunit is the relative risk for each one unit increment in exposure to the air pollutant and \(\overline{\chi }\) is the average of exposure.

Statistical analyses were performed using Mplus version 7.4, Stata version 14 (Stata Corp LLC; College Station, TX, USA) and ArcGIS version 10.8.

As the data was inquired in aggregated form and anonymously, informed consent from individuals or their family was not required. This data is not publicly available, but can be inquired by formal request in aggregated and/or anonymous form from the Ministry of Health of Iran. The Ethic approval code of this project was IR.KMU.REC.1398.230. All methods were carried out in accordance with relevant guidelines and regulations.

Results

Basic information about the area under study is shown in Table 1. The total number of lung cancer cases in 2014–2016 was 1850 in all districts of Tehran. We had to exclude the data of subjects who lived in remote suburbs of Tehran, which air pollutants estimation was not possible. Eventually, 1653 cases entered the final analyses. The distribution of lung cancer patients in different districts of Tehran is shown in Fig. 1. The highest number of patients per 100,000 populations was in regions 12, 6 and 11, respectively.

Table 1 Description of the study area, air pollution level, and district-level covariates.
Figure 1
figure 1

Spatial distribution of lung cancer patients (number of cases in 100,000) in different areas of Tehran in 2014–2016.

The spatial distributions and average levels of ambient air pollutants in different districts of Tehran are shown in Fig. 2. Districts with higher concentrations for pollutants were mostly in downtown (district 6, 7, 11, 12 and 14(, and around the railway (district 16 and 17), and a few of the southern districts of the city (district 18). District 16 had the highest concentration of SO2 and district 9, 2 and 6 had the highest concentration of NO, NO2 and NOx pollutants during these years. District 12 had the highest concentration of VOC pollutants.

Figure 2
figure 2

Spatial distribution and average levels of ambient air pollutants in different areas of Tehran in 2014–2016.

As indicated in Fig. 3, there was a strong correlation (Pearson's r) between pollutants, most notably for benzene compounds (benzene, toluene, ethylbenzene, m-xylene, p-xylene, o-xylene and TBTEX) (p-value < 0.001). The positive correlation between NO and NOx was weaker (p-value < 0.001).

Figure 3
figure 3

Pearson correlation matrix of air pollutants in the 22 districts of Tehran in 2014–2016.

Fit indices for the different LPA models are displayed in Table 2. All solutions provided acceptable classification accuracy, as indicated by entropy values close to 1. Although models with four, five, six, and seven latent class profiles had lower AIC, BIC, and aBIC than two and three latent class profiles, these models included classes with less than 1% of the sample. Therefore, the three latent profile model was preferred. The multi-pollution profiles are shown in Fig. 4. Profile 1 had the lowest scores for all pollutants, except SO2. We labeled this profile as “low multiple-pollution”. Profile 3 had the highest scores of all pollutants. We labeled this profile as “high multiple-pollution”. Profile 2 was in between and was labeled “medium multiple-pollution”.

Table 2 Fit indices for different models with number of profiles ranging from 2 to 7.
Figure 4
figure 4

Standard mean values of pollutants in the three latent profiles in different areas of Tehran in 2014–2016.

Summary statistics for each pollutant in different profiles are shown in Table 3. There was a significant difference between the means of all pollutants in the three profiles, except SO2.

Table 3 The mean of air pollutants in different profiles.

Table 4 shows the IRR estimates by single-pollutant and multiple-pollutant multivariable negative binomial regression models, adjusted for age, gender, socioeconomic status, life expectancy and smoking prevalence. In single-pollutant models, p-xylene, o-xylene, ethylbenzene, benzene, m-xylene and TBTEX were significantly associated with increased lung cancer incidence in model 3, which was adjusted for age, gender, socioeconomic status, life expectancy and smoking prevalence.

Table 4 The estimated incidence rate ratios using negative binomial regression analyses for the effect of each 10 unit increase in air pollutants on lung cancer incidence in the districts of Tehran.

In multi-pollutant models, the high multiple-air-pollutants profile was associated with higher lung cancer incidence when compared with the low multiple-air-pollutants profile.

The fraction of cancers attributable to air pollutant can be seen in Fig. 5. The highest fractions belong to m-xylene, o-xylene, and TBTEX.

Figure 5
figure 5

Estimated fraction of all lung cancer incidence attributable to each 1 unit increase in air pollutants in Tehran in 2014–2016.

Discussion

This study was the first to investigate the effect of single and multiple ambient air pollutants on lung cancer in Iran. The findings suggest that ambient air pollutants, especially p-xylene, o-xylene, ethylbenzene, benzene, m-xylene and TBTEX were associated with lung cancer. Previously several studies have also shown a strong association between air pollution and respiratory mortality29,30 and respiratory diseases31,32, including chronic obstructive pulmonary disease (COPD), asthma, bronchitis, and decreased lung function33. Recently, some studies have shown the association between air pollution and lung cancer as well3.

Iran’s national cancer registry data indicates an approximately sevenfold increase in the trend of lung cancer incidence over a 27-year span (1990 to 2016), in the whole country and in the capital city, Tehran34. In the past, lung cancer had been mainly attributed to direct tobacco smoke exposure. However, its increased incidence in never-smokers in the recent years shows that there are other risk factors that need to be discovered35. Some of the probable risk factors for lung cancer in never smokers could be environmental pollutants, such as air pollution, occupational carcinogens, radon and infections36. In Taiwan, air pollution was related with the incidence of lung cancer in never-smokers37; and the result of lung cancer screening programs in China and the United States in 2018 showed that the incidence of lung cancer in never smokers was significantly higher in China than the United States. Their data suggested that inclusion of ambient air pollution could improve the lung cancer risk models, especially for non-smokers38.

Air pollutants have been reported to be correlated in many studies. Faridi et al. reported positive correlations between PM2.5, PM10, NO2, SO2, CO and O3 in Tehran39, and another study from Los Angeles County, also reported correlations between multiple ambient air pollutants40,41. Studies from Spain have shown positive correlations between PM10 and PM2.5 and between nitrogen oxides (NO2 and NO). The correlations between nitrogen oxides (NO and NO2) and particulate matter (PM10) is probably due to the common sources of these pollutants that are traffic, heating systems, industries, and other combustion processes42,43. High spatial correlations between exposure variables preclude the possibility to do multivariate adjusted analysis in air pollution studies. In the present study, because of the high correlation between pollutants, we used LPA models to investigate the association between multiple -pollutants and lung cancer incidence. Exposure profile modelling for multiple exposures has been used in previous epidemiologic studies on health outcomes such as blood pressure44, low birth weight40, total mortality45, respiratory mortality46, and lung cancer in nonsmokers47.

This method enables identifying possible unobservable subgroups, or latent classes, in a population using a number of related exposure variables and can help better understand the complex relations between risk factors and health outcomes, such as cancer, that may not be best explained by a single exposure13.

The concentration of BTEX in Tehran ambient air between 2005 and 2018 was estimated in a meta-analysis and risk assessment conducted by Abtahi et al. The rank of BTEX concentrations was benzene (149.18 µg/m3: 31%) > o-xylene (127.16 µg/m3: 27%) > ethylbenzene (110.15 µg/m3: 23%) > toluene (87.97 µg/m3: 19%). While in the present study toluene, m-xylene and benzene had the highest concentrations among VOCs20.

The primary sources of benzene and toluene in the ambient air of Tehran includes both mobile and stationary sources of emission. According to the results of Abtahi et al., the pooled concentrations of benzene (149.18 µg/m3) and o-xylene (125.57 µg/m3) in Tehran were higher than those in other regions around the world such as Ontario (Canada), Orleans (France), Bari (Italy), Kuala Lumpur (Malaysia) and Beijing; and the population residing in Tehran is at a considerable risk of exposure to carcinogens48. The inversion phenomenon, fossil fuel consumption of old vehicles, low-quality fuel, population congestion, and high-traffic highways, and the existence of several factories in the south of Tehran such as iron and steel industries, are other reasons for the high level of BTEX in Tehran city48,49.

In the present study, about 70% of lung cancer patients were men and the prevalence of smoking among men was about 16%. But even after adjusting for smoking prevalence, the effect of air pollutants on lung cancer was significant.

Su et al. conducted an ecological study about ambient air pollution and all cancer incidences in Taiwan. Their results showed positive correlations between PM2.5 SO2, NOx, and O3 levels and age-adjusted total cancer incidence rates50; and a study conducted on data from 2002 and 2011 in Brazil showed that traffic density and NO2 were associated with an increased incidence of respiratory cancers51. In a large population of 16,209 Norwegian men, after a 27-year follow-up the risk ratios for developing lung cancer attributed to NOx and SO2 exposure were 1.08 (CI 95% = 1.06–1.11) and 1.03 (CI 95% = 0.77–1.38), respectively52. A cohort study conducted from January 1998 to December 2009 in four Northern Chinese cities including Tianjin, Shenyang, Taiyuan, and Rizhao, showed that the combined effect of NO2 and PM10 resulted in a significant increase in mortality from lung cancer53. Bai et al. in a population-based cohort study in Ontario, Canada (2001–2015) showed positive associations between lung cancer incidence with PM2.5 (hazard ratio [HR] = 1.02 [95% CI: 1.01–1.05] per 5.3 μg/m3) and NO2 (HR = 1.05 [95% CI: 1.03–1.07] per 14 ppb), and each ~ 5 μg/m3 increase in outdoor PM2.5 concentration was associated with a 2% (95% CI: 1%–5%) increased risk of lung cancer. However, no associations were observed for O3 or Ox and lung cancer54. In our analysis, NO2 was associated with an increased risk of lung cancer as well. Also a meta-analysis of 14 studies showed that the pooled risk of PM2.5 and PM10 for lung cancer mortality was respectively RR: 1.14, CI95%:1.07–1.21 and RR: 1.07, CI95%:1.03–1.1155.

A population-based cohort study from the Korean National Health Insurance Service (NHIS) database on 2006–2007 data showed that they did not find an increased risk of lung cancer with higher exposure to PM10 or NO2, in average concentrations of PM10 = 60.9 µg/m3 and NO2 = 32.1 ppm56. The largest population-based case–control study of lung cancer among never-smoking females in Xuanwei and Fuyuan, China, done by Song et al. showed that a cluster of 25 PAHs had the strongest association with lung cancer [OR = 2.21; 95% CI = 1.67–2.87); and nitrogen dioxide (NO2) was also directly related with lung cancer (OR = 2.06; 95%CI = 1.19,3.49). However, neither benzo (a) pyrene (BaP) nor PM2.5 were associated with lung cancer in the multipollutant models57. Studies in Canada58,59, nine European countries60, the UK61 and the US62 also found that PM2.5, nitrogen oxides, nitrogen dioxide, and sulfur dioxide were associated with greater risk of lung cancer63. The ESCAPE study based in nine European countries concluded for every 10 µg/m3 increase in PM10, the risk of lung cancer increases by 22%60.

Strengths and limitation

An important limitation of our study was the short time interval between the recorded pollutants and lung cancer incidence data. However, this was the latest available data about lung cancer in Tehran, at the time we started our study. Also, the study design was ecologic with no individual-level data, and the PAF would have been better estimated using individual-level data with adjustments for important confounding covariates. However, the results of this study still have important implications for public health, underscoring the need to reduce air pollution.

Tehran includes 22 districts, and the number of women with lung cancer was small in some districts, which prevented us to perform further analyses in gender subgroups. However, gender prevalence was adjusted in the analyses. Also, pathological data about cancer cell types was not available, and this prevented us to perform a separate analysis based on pathological type.

A novelty of our study was estimating the simultaneous effect of several air pollutants on lung cancer incidence. This helps to have a holistic picture of the effect of complex air pollution mixtures on human disease.

Conclusion

This is the first study to examine the associations between multiple air pollutants on lung cancer incidence in Iran. The findings suggest that lung cancer was associated with ambient air pollution in Tehran, and this association was stronger for p-xylene, o-xylene, ethylbenzene, benzene, m-xylene and TBTEX. Air pollution is a serious problem in Tehran, and decreasing the concentrations of air pollutants should be a key goal for policy makers to reduce the number of lung cancer cases in Tehran.