Background

SOX2, also known as sex-determining region Y (SRY)-box 2, is an oncogenic transcription factor located on chromosome 3. It maintains pluripotency or self-renewal in embryonic stem cells and has been associated with several types of cancer [1,2,3]. In cancers, SOX2 promoter hypermethylation has been shown to be in constant correlation with an unfavorable clinical course and poor prognosis. For example, in esophageal squamous cell carcinoma, SOX2 promoter hypermethylation seemed to have caused SOX2 silencing and the loss of SOX2 expression was associated with poor prognostic outcome [3]. In lung cancer particularly, the overexpression of SOX2 has been associated with better prognosis and survival [2, 4, 5].

Lung cancer is one of the leading causes of global cancer mortality [6, 7]. In Taiwan, it was ranked the second incident cancer in 2014 and the top cause of mortality in 2016 [8]. Histologically, it is classified into small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) [7]. SCLC and NSCLC respectively constitute about 15 and 85% of lung cancer [7]. SCLC is more common in former and current smokers even though certain uncommon cases have been reported in never-smokers [2, 6, 9,10,11]. NSCLC traditionally occurs in non-smokers and comprises three subtypes: adenocarcinoma (AC), squamous cell carcinoma (SCC), and large cell carcinoma (LCC) [7].

Several environmental, lifestyle, and genetic factors have been associated with lung cancer [6, 7, 11,12,13,14]. One of such factors is air pollution resulting from increased industrialization and urbanization [14,15,16,17]. Due to industrialization, PM2.5 pollution is lower in the northern regions compared to the central and southern regions of Taiwan [16, 18,19,20]. The carcinogenicity of air pollutants has not been fully explained. However, it is thought that markers like DNA methylation, mutation, and polymorphism could provide some insights into the mechanisms underlying this carcinogenicity [17].

SOX2 is believed to be a potential molecular marker for lung cancer [2]. In non-smokers exposed to air pollution, SOX2 promoter methylation might help in the early prediction of lung cancer risk. However, research has not been extensively conducted as far as this domain is concerned. Therefore, this study was conducted to investigate the association between SOX2 methylation in non-smoking Taiwanese adults residing in areas with different levels of air pollution especially PM2.5.

Results

The basic characteristics of the participants are shown in Table 1. The methylation levels in men (0.16310 ± 0.01230) and women (0.15740 ± 0.01240) were significantly different (P < .0001). The mean (± SD) concentrations of PM2.5 (μg/m3) from 2006 to 2011 in the northern and central/southern areas are shown in Table 2. PM2.5 concentrations in central/southern regions were higher compared to the northern regions, irrespective of the year. After adjusting for age, exposure to second-hand smoke (SHS), exercise, drinking, body fat, BMI, waist-hip ratio (WHR), asthma, and emphysema, SOX2 promoter methylation was significantly hypermethylated in both men and women residing in central and southern areas compared with the northern areas (Tables 3 and 4). In men residing in the central/southern areas, the regression coefficient (β) was 0.00331; P = 0.0257 (Table 3). That is, the methylation level in male participants residing in central/southern areas was higher than in those residing in the northern areas and the difference was 0.00331. Similarly, in women residing in the central/southern areas, the regression coefficient (β) was 0.00514; P < 0.0001 (Table 4). That is, the methylation level in female participants residing in central/southern areas was higher than in those residing in the northern areas and the difference was 0.00514. Furthermore, the mean PM2.5 level (2006–2011) was significantly associated with higher SOX2 promoter methylation levels: β = 0.00042; P < 0.0001 (Table 5). That is, a unit increase in PM2.5 levels was associated with 0.00042 (P < 0.0001) higher methylation levels. Analysis of CpG sites both in and outside the SOX2 promoter region also showed significant hypermethylation in participants residing in central/southern compared with the northern areas (Additional file 1: Tables S1-S3). In addition, analysis of the KRAS gene promoter CpG sites also showed significant hypermethylation in participants residing in the central and southern compared with the northern areas (Additional file 1: Table S4).

Table 1 Basic characteristics of the study participants by sex
Table 2 Mean (± SD) concentrations of PM2.5 (μg/m3) from 2006 to 2011 in the northern and central/southern areas
Table 3 Multiple linear regression showing beta coefficients of SOX2 promoter methylation in men
Table 4 Multiple linear regression showing beta coefficients of SOX2 promoter methylation in women
Table 5 Multiple linear regression showing the association between mean PM2.5 (μg/m3) from 2006 to 2011 and SOX2 promoter methylation

Discussion

Much has not been done concerning the association between SOX2 promoter methylation and air pollution exposure. Apparently, the current study is the first to explore SOX2 promoter methylation in non-smoking Taiwanese adults based on exposure to air pollution. In both men and women living in the central and southern areas of Taiwan, SOX2 was significantly hypermethylated. The consistent hypermethylation in both sexes implies that SOX2 promoter methylation could be a potential biomarker for industrial air pollution exposure. It is always challenging to estimate individual exposure to air pollution due to the lack of validated tools. In most cases, individual exposure to air pollution is usually estimated using air quality indices from nearby monitoring stations [17]. Exposure misclassification might have been minimized in the current study as participants were stratified based on the degree of exposure to air pollution. The central and southern areas have a heavy concentration of industries compared to the northern areas [16, 19, 21]. The number 6 Naphtha Cracking Complex found in Central Taiwan is one of the biggest power plants in the world [22]. Moreover, the Formosa Petrochemical Corporation and many other industries are found in Central-Southern Taiwan [22]. These make the areas more prone to industrial emissions. Hence, they naturally serve as experimental fields for industrial air pollution. To minimize confounding due to smoking, only non-smokers were included in the final analysis and adjustments were made for exposure to SHS.

To combat smoking and its associated effects, e.g., non-communicable diseases, the Health Promotion Administration (HPA), Taiwan, aims to reduce smoking to 14% by 2025 compared to 20% in 2010 [23]. A decrease in smoking prevalence could subsequently lead to a decrease in smoking-related diseases. However, this does not eliminate the possibility of exposures to pollutants and their carcinogenic impacts. For example, NSCLC which is most common in non-smokers accounts for approximately 85% of lung cancer [7]. In addition, a greater percentage of known lung cancer patients in Asia are never-smokers [11, 13]. Furthermore, smoking-related female lung cancer cases in Taiwan are few [14].

DNA methylation might help in the molecular staging of cancer as precancerous and cancerous states might be identified through this epigenetic modification [24]. This might also help to explain the mechanism for the initiation and progression of some tumors. For instance, lung cancer has been associated with epigenetic silencing [24,25,26]. Promoter hypermethylation is believed to be the molecular mechanism of this epigenetic gene silencing [24]. In esophageal cancer, SOX2 promoter hypermethylation was thought to be the main mechanism behind the silencing of SOX2 gene [3]. Moreover, in esophageal and hypopharyngeal cancer, the absence of SOX2 expression was associated with poor prognosis [3, 27]. SOX2 silencing through hypermethylation might promote tumor initiation by evading cell-cycle arrest in addition to resisting apoptosis [3]. In the current study, SOX2 promoter hypermethylation was observed in individuals who resided in central and southern areas of Taiwan which have high levels of air pollution. Moreover, analysis of the KRAS gene promoter CpG sites also showed significant hypermethylation in participants residing in the central and southern compared with the northern areas. Exposure to air pollution predisposes to cancer [14,15,16]. SOX2 is believed to be a hallmark of lung cancer [2]. Therefore, in healthy non-smokers, SOX2 hypermethylation in air pollution areas might help to identify those at precancerous stages that have not been clinically diagnosed.

One of the limitations of this study is that the association between SOX2 promoter methylation and SOX2 gene expression was not determined. However, hypermethylation is believed to be associated with reduced gene expression [25]. Specifically, SOX2 promoter hypermethylation has been shown to be a vital epigenetic event that causes the silencing of the gene [3]. Moreover, it has been significantly associated with reduced expression of SOX2 RNA in choriocarcinoma and hydatidiform mole [28]. Another limitation of the study is that only non-cancerous individuals were included in this study. As such, the results may not reflect the methylation patterns in individuals with cancer.

Conclusion

In conclusion, significant hypermethylation of the SOX2 promoter region was observed in both non-smoking men and women living in industrial areas with well-known higher levels of air pollution. SOX2 promoter methylation might serve as a biomarker for air pollution in industrial areas. Moreover, it might reflect predisposition to cancer. Hence, healthy non-smokers at precancerous stages who have not been clinically diagnosed could be identified and monitored. Subsequently, prompt actions could be taken against such a public health issue. It is recommended that more should be done to explore the relationship between SOX2 methylation and air pollution.

Methods

Data source

Data were retrieved from the Taiwan Biobank database (2008–2015). The Taiwan Biobank was established in 2005 with the aim of integrating the genetic and medical information of about 200,000 ethnic Taiwanese with no history of cancer [29, 30]. The enrollment of individuals in the Taiwan Biobank Project conforms to relevant regulations and guidelines. A total of 29 recruitment centers are distributed all over the country; each county or city possesses a minimum of one center [29]. A letter of consent was signed by each individual before data collection. Data were collected by well-trained researchers through questionnaires, physical examinations, and biochemical analyses of blood and urine samples.

Data collection and study participants

Participants’ data on SOX2 promoter methylation, residence, age, smoking, exposure to SHS, exercise, drinking, body fat, BMI, WHR, asthma, and emphysema were extracted from the Taiwan Biobank dataset (2008–2015). Residence, age, smoking, exposure to SHS, exercise, drinking, asthma, and emphysema were self-reported while DNA methylation, body fat, BMI, and WHR were measured.

Venous blood (9 ml) was collected into sodium citrate tubes and transported to the lab under 4 °C. DNA was extracted from blood using Chemagic™ Prime™ instrument which is an automated chemical extraction machine that uses magnetized rods to separate nucleic acids from solutions. The DNA length was determined with the Fragment Analyzer (Agilent) while the purity was assessed using the optical density (OD) at 260/280. Samples with an OD 260/280 ratio of 1.6–2.0 were considered to be pure. Pure samples intended for long-term use were stored at − 80 °C. DNA samples were subjected to sodium bisulfite treatment using the EZ DNA Methylation Kit (Zymo Research, CA, USA). DNA methylation at each CpG site was determined with the Infinium® MethylationEPIC BeadChipEPIC array (Illumina Inc.) which covers 850,000 methylation sites. Details on this platform are described elsewhere [31,32,33]. The methylation levels were expressed as beta-values (β) which range between 0 and 1. β-values were determined using the formula β = M/(M + U), where M is the methylated intensity and U is the unmethylated intensity. Quality control was done according to the Illumina® GenomeStudio® Methylation Module v1.8 [34]. Samples with detection P value > 0.05 and bead counts < 3 were eliminated. Dye-bias across batches was adjusted by normalization, and background correction was performed. Outliers were removed using the median absolute deviation method.

Participants were considered to be from an area if they lived there for at least 3 months. Where participants lived was considered to be where they were likely to be exposed to air pollution. The residences were grouped into northern and central/southern regions. This is because PM2.5 pollution in the northern areas is lower compared to the central and southern areas [16, 18,19,20]. The northern areas included Taipei and New Taipei Cities, while the central/southern areas included Taichung and Nantou Cities, Changhua and Yunlin Counties, Chiayi County, Tainan, and Chiayi Cities. PM2.5 statistics in these areas were obtained from the Air Quality Monitoring Database (AQMD) set up by the Environmental Protection Administration, Taiwan. There were 41 air quality monitoring stations in the study areas. Of these stations, 18 were located in the northern region while 23 were located in the central/southern regions. Annual PM2.5 readings (2006–2011) from the various stations were used to determine the annual average concentration for each region.

Participants were considered as (1) non-smokers: never smoked or did not continuously smoke for 6 months or more; (2) former smokers: continuously smoked for at least 6 months but were currently not smoking; (3) current smokers: continuously smoked for 6 months or more and were currently smoking; (4) non-drinkers: no history of alcohol drinking or weekly drinking of less than 150 cc of alcohol for continuously 6 months; (5) former drinkers: abstained from drinking for over 6 months; (6) current drinkers: weekly drinking of at least 150 cc of alcohol continuously for 6 months; (7) physically active: exercised for > 150 min per week; and (8) exposed to SHS: exposure for at least 5 min per hour.

After excluding former and current smokers as well as those with incomplete information, a total of 461 non-smokers comprising 176 men and 285 women were included in our study. A total of 24 SOX2 CpG sites located on SOX2 promoter region were available in the Taiwan Biobank dataset. These sites were cg24513480, cg05664581, cg19258425, cg18148179, cg07747133, cg24782772, cg00666105, cg04948892, cg15106134, cg02573703, cg20106776, cg11129008, cg08062338, cg12930100, cg01023203, cg22530053, cg27331851, cg14783675, cg01340005, cg17051733, cg11142406, cg09530873, cg25933341, and cg08464053. The β-values at all the 24 SOX2 CpG sites were summed up, and the mean value was determined. In order to reach the conclusion that SOX2 is a reliable biomarker of cancer risk in air pollution areas, the mean β-value of CpG sites at the promoter of KRAS, another lung cancer risk gene, was determined. Ethical approval for this study was obtained from Chung Shan Medical University Institutional Review Board (CS2-17070).

Statistical analysis

Data management and analyses were performed with the SAS 9.4 software (SAS Institute, Cary, NC). Continuous variables were analyzed using t test and expressed as mean ± standard deviation (SD) while categorical variables were analyzed using chi-square test and reported as percentages (%). Methylation data were normalized using the Illumina® GenomeStudio V2011.1 software [34]. The correction of cell-type heterogeneity was done with the R software using the Reference-Free Adjustment for Cell-Type composition (ReFACTor) method [35].

The association between SOX2 promoter methylation and residential area was determined using multiple linear regression. Multivariate adjustments were performed for age, exposure to SHS, exercise, drinking, body fat, BMI, WHR, asthma, emphysema, and cell-type composition. P values less than 0.05 were considered to be statistically significant.