Background

Cigarette smoking is the leading preventable risk factor responsible for a significant portion of premature deaths worldwide [1], causing about one of every five deaths in the United States each year [2]. Despite global efforts to eliminate smoking, tobacco still causes 8 million deaths each year, especially in low- and middle-income countries [3]. The link between smoking and a wide spectrum of diseases, such as cardiovascular diseases [4], chronic obstructive pulmonary disease (COPD) [5], and different types of cancer [6], is firmly established in the literature [7]. A significant number of tobacco-related fatalities are attributable to respiratory conditions such as lung cancer [8]. In this line, smoking is the leading risk factor for risk-attributable cancer burden [9] leading in the United States to almost 30% of all cancer deaths in 2019 [10]. In addition, smoking explains up to 90% in men and 70–80% in women of lung cancer risk [11], the leading cause of cancer incidence and mortality with 2.1 million new cases and 1.8 million deaths, worldwide in 2018 [12]. Although there are numerous proposed mechanisms explaining the link between tobacco use and development of lung cancer [13, 14], recent advances in omics-layers have contributed to a deeper understanding of this association [15, 16]. For instance, recent epigenomic analyses have provided a clearer understanding of the molecular pathways underpinning lung cancer development [17, 18].

Epigenetics is proposed as one of the mechanisms involved in the relationship between smoking and disease pathogenesis [19, 20], including lung cancer [17]. Epigenetics is an interface between environmental and genetic influences on disease risk, acting through gene expression, without changing the DNA sequence [21]. The major epigenetic mechanisms include DNA methylation, modification of histone proteins, and non-coding RNAs. DNA methylation is a known modulator of smoking-related alterations in cancer pathogenesis [17, 22], while non-coding RNAs have received far less attention. MicroRNAs (miRNAs) are a subset of non-coding RNAs that regulate post-transcriptional gene expression, targeting complementary messenger RNA, resulting in degradation and translational repression [23]. These small molecules are suggested to play a role in disease onset and progression [24,25,26] and have a potential role as biomarkers for numerous diseases [27,28,29,30], including smoking-related disorders [31].

Nevertheless, the smoking-effect on miRNA expression is not yet well-established as previous studies suffer from several limitations, including a targeted approach, limited sample size or utilizing arrays with lower coverage of miRNAs [32]. Moreover, smoking cessation is the primary prevention in reducing the risk of smoking-related diseases, such as lung cancer. Regarding miRNA expression, it is yet unknown whether the effect of smoking on miRNA expression in plasma is reversible following smoking cessation. A prior study reveals that alterations in messenger RNAs are reversible following cessation [33, 34]. However, smoking-induced dysregulation of miRNAs in small airway epithelium tissue was not reversible after quitting smoking for 3 months [35]. These contradicting results, and the short cessation time in the previous study [35], warrant the need for larger studies investigating the reversibility of the smoking-effect.

The present study aimed to investigate the smoking-related changes in plasma levels of miRNAs within the large population-based Rotterdam study cohort [36]. To this end, we tested the association between circulating miRNAs and smoking status, investigating i) current versus never smokers and ii) current versus former smokers. In addition, we explored the potential reversibility of the smoking effect following cessation by comparing current smokers with different smoking cessation time categories. Then, we examined the cumulative effect of smoking (pack-year) on miRNA expression. As smoking is the primary risk factor for lung cancer, we assessed if any of the smoking-associated miRNAs are linked to incident lung cancer in the Rotterdam study cohort.

Results

Study population

This study was embedded within the Rotterdam study (RS), a large prospective population-based cohort in the Netherlands [36]. Participants with smoking information and plasma miRNA levels were included in the study, resulting in 2686 independent individuals from three RS sub-cohorts (RS-I-4, RS-II-2, and RS-IV-1). Participants were categorized based on their smoking status (current, former, or never smokers), as obtained via questionnaires. The clinical characteristics, stratified by smoking status, are depicted in Table 1. In our study, 1534 (57.1%) participants were women and the mean age was 67.43 (± 11.07) years. A total of 921 (34.3%) individuals were classified as never, 1382 (51.5%) as former, and 382 (14.2%) as current smokers (Table 1).

Table 1 Clinical characteristics of the study population

Plasma miRNA levels associated with smoking habits

Multivariable linear regression models were used to test the association between smoking (current versus never smokers [reference group]) and 591 miRNAs well-expressed in plasma (log2 CPM), adjusting for age, sex, cohort, and body mass index (BMI). A recent study showed the association between circulatory miRNAs in plasma with obesity-related traits [37] and another study showed that higher adiposity causally influences smoking behavior [38]; hence, we decided to adjust our analyses for BMI. Out of the 591 miRNAs, 41 miRNAs were differentially expressed by applying the Bonferroni-corrected significance threshold P < 8.46 × 10–5 (0.05/591) (Table 2 and Fig. 1), while 192 miRNAs were nominally associated (P < 0.05) (Additional file 1: Table S1). Out of the 41 smoking-miRNAs, 34 were upregulated, while 7 were downregulated in association with current vs. never smoking status (Fig. 1). The expression distributions of the significantly associated miRNAs are presented in Additional file 2: Fig. S1.

Table 2 MicroRNAs are significantly associated with Current versus Never (reference) smoking
Fig. 1
figure 1

Association of plasma microRNA levels with current versus never smoking. This Volcano plot depicts the results from the linear regression model where the dots represent miRNAs tested in the association of current versus never smoking (reference) status in the Rotterdam study. The blue color depicts negatively associated miRNAs with smoking status, while the red color depicts positively associated miRNAs with smoking status. The top ten significantly associated miRNAs are annotated. The effect size per miRNA in the analysis is reflected on the X-axis while the magnitude of significance is shown on the Y-axis

When comparing current (reference group) versus former smokers, 42 miRNAs were significantly associated (P < 0.05/591 = 8.46 × 10–5), while 177 miRNAs were nominally associated (P < 0.05) (Additional file 1: Table S2 and Additional file 2: Fig. S2). In total, 149 miRNAs were nominally associated (P < 0.05) between current vs. former- and never smokers. (Additional file 1: Table S3).

Smoking cessation and pack-years impact on plasma miRNA levels

To determine the reversibility of the smoking effect on miRNA expression levels upon smoking cessation, we calculated the time of smoking cessation in the former smokers (N = 1382) by subtracting the age at smoking cessation from the current age. This variable was further categorized into three categories; i) cessation < 5 years, ii) cessation ≥ 5 and < 15 years, and iii) ≥ 15 years of cessation. We tested the association between current smokers (reference group) and the three cessation time categories using three multivariable linear regression models, adjusting for age, sex, cohort, and BMI. Out of the 41 smoking-miRNAs that were differentially expressed between current and never smokers, 38 miRNAs had a significant difference (P < 0.05/41 = 1.22 × 10–3, and another 3 miRNAs P < 0.05) between current smokers and more than 15 years of smoking cessation, while 19 miRNAs (P < 1.22 × 10–3, and another 17 miRNAs P < 0.05) for cessation time between < 15 years and ≥ 5 (Table 3, and Additional file 2: Fig. S3). Interestingly, we show that for 2 miRNAs (P < 1.22 × 10–3, and another 7 miRNAs P < 0.05) there is already a significant change noticeable within only 5 years of smoking cessation, including for miR-326 and miR-6769b-3p (Table 3). The results for all 591 well-expressed miRNAs are presented in Additional file 1: Table S2 and S4 and Additional file 2: Fig. S3.

Table 3 Current smokers (reference) versus the cessation time categories

Next, we investigated the association between the cumulative effect of smoking in current smokers (pack-year) as exposure and all miRNA levels as the outcome. Pack-years was calculated in all current smokers; as the number of cigarettes smoked per day, divided by 20, and multiplied by the total years of smoking. None of the miRNAs passed the Bonferroni-corrected threshold (P < 8.46 × 10–5), in association with pack-years in current smokers (n = 178), while 14 miRNAs were nominal significant (P < 0.05) (Additional file 1: Table S5).

Association of smoking-related miRNAs and incident lung cancer

As smoking is the main modifiable risk factor for lung cancer, we tested if the identified smoking-miRNAs are associated with incidence of lung cancer in an exploratory analysis. We used a subset of 1806 participants for which we had data available and that were free of lung cancer at the moment of miRNA quantification, while the clinical characteristics are presented in Additional file 1: Table S6. During a mean follow-up period of 8.8 (± 3.1) years, 37 participants developed lung cancer. The Cox proportional hazards regression was used to determine the hazard ratios (HRs) and 95% confidence intervals (CIs) between miRNA expression and incident lung cancer, adjusted for age, sex, cohort, smoking, chronic diseases, BMI, red blood cells (RBC), white blood cells (WBC), alcohol consumption, and education. Out of the 41 smoking-related miRNAs, we identified eight miRNAs related to lung cancer (P < 0.05), including miR-146b-5p, miR-6769b-3p, miR-1915-3p, miR-6085, miR-10a-5p, miR-100-5p, miR-149-3p, and let-7c-5p (Table 4 and Additional file 1: Table S7). Out of these, six miRNAs suggest an increased risk for lung cancer and two show a protective effect (Table 4).

Table 4 Association of smoking-related miRNAs with incident lung cancer

In silico analysis of target genes of smoking-related miRNAs

We examined whether predicted target genes of the smoking-related miRNAs are associated with smoking status. Focusing only on unique target genes reported by three (TargetScan [39], miRDB [40], and miRTarBase [41]) miRNA target gene prediction platforms, we extracted 407 predicted target genes for the 41 smoking-associated miRNAs (Additional file 1: Table S8). Seven target genes were identified to be associated with smoking in a genome-wide association study (GWAS) for smoking initiation [42], 140 genes have an annotated CpG that was identified in an epigenome-wide association study (EWAS) on smoking habits [43], while a single gene (CADM1) was identified in a transcriptome-wide association study (TWAS) on smoking trait (cigarettes/day) (Additional file 1: Table S9). Using miRPathDB 2.0 [44], we explored the KEGG pathways [45] for the identified miRNAs and observed that many of them were implicated in several cancer pathways (Additional file 2: Fig. S4). In addition, out of the eight miRNAs we identified to be linked with lung cancer, two (miR-146b-5p and let-7c-5p) showed to be implicated in several cancer-related pathways, and miR-146b-5p in both small cell lung cancer and non-small cell lung cancer.

Discussion

In this study, we investigated the association between smoking habits and 591 miRNAs well-expressed in plasma in a population-based cohort study. In total, 41 miRNAs were significantly associated with current versus never (reference) smoking after adjustment for multiple testing. Moreover, 42 miRNAs were significantly associated with current (reference) versus former smokers. While testing the reversibility of the smoking effect on miRNA levels, we compared miRNAs levels in current smokers (reference) with those of three cessation time categories. For 38 out of the 41 smoking-miRNAs there was a significant difference between current smokers and after more than 15 years of smoking cessation. Finally, we found eight of the 41 smoking-related miRNAs to be associated with incident lung cancer.

An increasing number of studies have investigated smoking with miRNA expression levels [32, 46, 47]. However, these studies were mainly conducted either on a subset of miRNAs using a qPCR-based method or in a modest sample size [32]. Despite this lack of method consensus, we validated some of the findings in previous studies. For instance, in a large-scale study (whole blood qPCR assessed miRNAs, n = 5000) by Willinger et al. [47], 6 miRNAs (out of 283) were linked to smoking, of which miR-1285-5p and miR-342-3p, arising from the same precursors, were replicated in our study [47]. In addition, miRNAs we have identified in this study (miR-126 and miR-195) were previously linked to smoking in small airway epithelium [35]. Some studies have demonstrated that the mature miRNA sequences from the 5’ and 3’ arms of the precursor duplex (assigned -5p or -3p) are often co-expressed [48, 49]. In this line, plasma level comparison of qPCR-quantified miRNAs in smokers (n = 11) and non-smokers (n = 7) identified 43 differently expressed miRNAs [46], of which eight pre-miRNAs overlap with our findings. Although other studies on smoking and miRNAs reported different methodologies and/or tissue, which makes reproducibility between cohorts challenging, we were able to validate some of their findings [35, 46, 47].

To the best of our knowledge, this paper is the first to explore, in addition to identifying smoking-related miRNAs utilizing novel next-generation sequencing (NGS) platform for a broad landscape of miRNAs, the potential for reversibility of the changes in plasma miRNA levels. In this study, we identified the potential for reversibility of smoking-related changes in plasma miRNA levels by comparing current smokers (reference group) in relation to former smokers at a population-based level. We found significant differences between current smokers and the length of smoking cessation, where smokers overall have higher miRNA levels than non-smokers and that these levels seem to lower with longer smoking cessation time. These findings may indicate that smoking-related changes in plasma miRNA levels are reversible upon smoking cessation. This conjecture is in line with previous studies on the reversibility of changes in gene expression (messenger-RNAs) following smoking cessation [33, 34]. Nevertheless, future research is needed to test these findings with repeated measurements.

Many of the miRNAs we have identified herein have previously been linked to smoking-related diseases [50,51,52,53]. As example, previous studies have demonstrated our top finding miR-150-5p to be associated with chronic obstructive pulmonary disease (COPD) [50, 51]. Interestingly, miR-150-5p is frequently deregulated in cancer and regulates the gene expression of several cancer driver genes, including in smoking-related cancers, such as lung cancer and colorectal cancer, and in non-smoking related cancer types [2], such as breast and prostate cancer [54,55,56,57,58,59,60]. Furthermore, newly identified smoking miR-27a-3p was previously linked with lung cancer and COPD [52]. Molina-Pinelo et al. [53] also identified three of smoking-related miRNAs identified herein (miR-486-5p, miR-146b-5p, and miR-342-3p) to be associated with COPD and/or lung adenocarcinoma. Notably, we also identified miR-146b-5p in association with incident lung cancer. Our results may endorse previous literature that respiratory diseases share common risk factors of smoking, potentially through pathways regulated partly by epigenetic mechanism including some of the miRNAs identified herein. In addition, our investigation of the cumulative effect of smoking (measured by pack-years) on miRNA levels did not reveal any significant associations. Nevertheless, these results might have been hindered by the limited sample size (n = 178).

Lastly, we investigated the putative target genes of smoking-associated miRNAs and identified links with smoking through previous (epi-)genetic and transcriptomic studies, including GWAS, epigenome- and transcriptome-wide association studies [42, 43, 61]. In addition, some of the identified smoking-related miRNAs are implicated in cancer pathways. These findings may suggest the potential of identified miRNAs to partially explain the link between smoking and lung cancer, which warrant confirmation by experimental studies in future.

The major strengths of the study presented, are the use of data from a large-scale population-based cohort and the measurement of miRNA levels via a specific, sensitive, and reproducible targeted RNA-sequencing method [62]. Nevertheless, the findings presented in this manuscript should be interpreted with caution. Our results need to be further replicated in an independent cohort. Though part of our results show overlap with previous studies, despite using different methodologies and/or tissues for quantifying miRNA levels. In addition, as miRNAs are tissue-specific, perhaps other tissues such as lung, would provide a better setting to confirm the smoking effect on miRNA levels linked to lung cancer. In addition, although we adjusted our models for possible confounding effects, we cannot exclude the possibility of residual confounding of smoking, as the passive smoking or smoking of other tobacco products (e.g., cigars, hookahs, pipes, and cannabis) are not captured by the questionnaires. Furthermore, individuals' answers on smoking behaviour might be impacted by social desirability bias. In line with this, it is hard to decipher what constitutes someone as former smoker, given the collection of data presented herein, which might have impacted our analysis on former smokers. Moreover, in-depth information regarding environmental or occupational exposures that can possibly affect miRNA expression, such as air pollution and asbestos, were not available for model adjustments. Further experimental validation is warranted to assess the impact of smoking on the identified miRNAs and the subsequent risk of disease attributable to smoking.

Conclusion

In summary, we present a large-scale population-based investigation of plasma circulating miRNAs in relation to cigarette smoking. The evidence from this study suggests multiple miRNAs to be associated with cigarette smoking, many of them seem to be reversible following smoking cessation. In addition, we provide evidence for potential correlations between some of the smoking-related miRNAs and incident lung cancer. These results may lay the groundwork for further investigation of miRNAs as epigenetic modulators linking smoking, gene expression, and lung cancer.

Methods

Study population

This study was conducted using data from the RS, a large prospective population-based cohort initiated in 1989 in the city of Rotterdam, the Netherlands [36]. The RS has four sub-cohorts, and participants (> 40) are followed every 3–5 years. The first sub-cohort (RS-I) includes 7983 individuals (age ≥ 55 years). This was later extended by a second sub-cohort (RS-II, n = 3011, age ≥ 55 years), and a third sub-cohort (RS-III, n = 3932, age ≥ 45 years). The most recent extension includes 3005 individuals, as the fourth sub-cohort (RS-IV, age ≥ 40 years). A more in-depth description of the RS can be found elsewhere [36]. The RS has been approved by the Medical Ethics Committee of the Erasmus MC and by the review board of the Dutch Ministry of Health, Welfare and Sports (1,068,889–159,521-PG).

Plasma levels of circulating miRNAs were measured in 2754 randomly selected individuals from three RS sub-cohorts (RS-I-4, RS-II-2, and RS-IV-1). One individual was excluded due to missing profiling on miRNAs, while 63 participants were excluded due to missing smoking data, resulting in a sample size of 2686 non-overlapping participants.

MicroRNA profiling

The HTG EdgeSeq miRNA Whole Transcriptome Assay (WTA) was used to measure the levels of miRNAs in plasma. Whole blood samples in Rotterdam study were collected in PAXGene Tubes. A total volume of 50 μL of plasma, for two re-measurements that generally is sufficient to obtain a valid result for all samples, was sent to HTG Molecular Diagnostics, Inc. (AZ, USA) for sequencing. Each sample was tagged individually with molecular barcodes; tagged samples were pooled and sequenced on an Illumina NextSeq sequencer (Illumina, San Diego, CA, USA). Data were provided as data tables of raw, quality control (QC) raw, counts per million (CPM), and median normalized counts. Log2 counts per million (log2 CPM) standardization was used to transformed counts and adjusted for total reads within a sample. The initial miRNA list encompassed all 2083 miRNAs in the HTG EdgeSeq miRNA Whole Transcriptome Assay that were profiled in 2754 Rotterdam study participants. miRNAs with Lg2 CPM < 1.0 indicated that they were not expressed in the samples. We implemented a lower limit of quantification (LLOQ) method to select well-expressed miRNAs. The LLOQ level was based on a monotonic decreasing spline curve fit (by R function ‘scam::scam’) between the mean and standard deviation per miRNA on the normalized value. All miRNAs of which > 50% of the values were above the LLOQ were considered as well-expressed (n = 591).

Smoking and lung cancer assessment

Participants were categorized into smoking status categories (former, current, and never) based on the answers they provided in self-administered questionnaires. In former smokers, smoking cessation was calculated based on the age minus the cessation age. Due to the low response rate on “cessation age” cross-sectionally, we used the previous time-point for former smokers in both time points (i.e., did not initiate smoking meanwhile). This variable was further categorized into i) cessation less than 5 years ago (< 5 years), ii) between 5 and 15 years (≥ 5 and < 15 years), and iii) more than 15 years (≥ 15 years).

Additionally, for the cumulative effect of smoking on miRNA levels in current smokers, we computed pack-years (number of cigarettes smoked per day, divided by 20, multiplied by the total years of smoking). The smoking initiation age was not available in the cross-sectional setting but was used from the previous visit for a subset of cohorts, which had available data. We calculated pack-year for current smokers at both points. One participant was excluded due to initiation at the age of five, which we considered an outlier.

Lung cancer was diagnosed from the general practitioner's medical records and through linkage with Dutch Hospital Data, Netherlands Cancer Registry, and histology and cytopathology registries. Two physicians independently coded diagnoses according to the International Classification of Diseases, tenth revision (ICD-10). In case of discrepancy, a consensus was sought through physician specialized in internal medicine. Due to small sample size, histological lung cancer diagnosis included all lung cancer types. Lung cancer diagnosis date was based on the biopsy date; if unavailable, the hospital admission date, or using the discharge letter. Only pathology-confirmed lung cancers were included in the analysis. The follow-up for incident lung cancer was conducted until January 1, 2015. Participants were followed from study entry until the occurrence of cancer, death, the last health status update when they were known to be cancer-free, or January 1, 2015, whichever came first. Incident lung cancer was defined as any primary lung cancer. In the case of multiple cancers within one participant, we included only those whose first diagnosis was lung cancer for analysis, while the rest were excluded.

Covariable assessment

Home-administered interviews were used to assess participants’ age and sex. Weight and height were measured when participants were standing without heavy outer garments or shoes. Information on weight and height was used to calculate participants' BMI as weight divided by height squared (kg/m2). Educational level (primary, lower, intermediate, and higher) and alcohol consumption (g/day) were assessed during the home interviews. Blood pressure (BP) was measured twice in a sitting position on the right arm using a random-zero sphygmomanometer, and the average of 2 measurements was used. Hypertension was defined as a systolic (BP) ≥ 140 mm Hg or diastolic BP ≥ 90 mm Hg or the use of BP‐lowering drugs prescribed for hypertension. Prevalent diabetes mellitus type 2 was identified according to the World Health Organization criteria: fasting glucose levels of ≥ 7.0 mmol/L, nonfasting glucose levels ≥ 11.1 mmol/L, or the use of glucose-lowering medication. Coronary heart disease was defined if the participant suffered a myocardial infarction or underwent a coronary artery bypass grafting or percutaneous coronary revascularization procedure. Stroke was defined according to the World Health Organization definition as a syndrome of rapidly developing clinical signs of focal or global disturbance of cerebral function, with symptoms lasting 24 h or longer or leading to death, with no apparent cause other than of vascular origin. Participants were screened for dementia at baseline and subsequent center visits with the Mini-Mental State Examination and the Geriatric Mental Schedule organic level [63]. Those with a Mini-Mental State Examination score < 26 or Geriatric Mental Schedule score > 0 underwent further investigation and informant interview, including the Cambridge Examination for Mental Disorders of the Elderly. Blood samples of participants were obtained during the visit to the research centre. Using a haematology analyser, measure the levels of red blood cell counts (1012/L) and white blood cell counts in venous blood (109/L).

Statistical analyses

Smoking in association with changes in plasma miRNA levels

We implemented multivariable linear regression models to explore the association between smoking (current versus never smokers [reference group]) and plasma miRNA levels (log2 CPM), adjusted for age, sex, cohort, and BMI. We used the same adjustment for other smoking-exposure analyses throughout this manuscript, and the Bonferroni-corrected P-value threshold was set at P < 0.5/591 = 8.46 × 10–5. We also explored current (reference group) versus former smokers with miRNA expression levels. Next, we assessed the potential of reversibility of the smoking effect by comparing expression levels of the identified smoking miRNAs among different cut-off groups within former smokers. We compared the difference in miRNA levels between current smokers (reference) with the three cessation time categories, including i) < 5 years, ii) ≥ 5 and < 15 years, and iii) ≥ 15 years of cessation time. Finally, we investigated the association between the cumulative effect of smoking in current smokers (pack-year) as exposure and all miRNA levels as the outcome.

Additionally, we explored the relationship between smoking-related miRNAs and the incidence of lung cancer. Due to the presence of competing mortality risks, we applied the competing risk Cox proportional hazards regression model to determine hazard ratios (HRs) and 95% CIs between miRNA expression and incident lung cancer. The analyses were adjusted for age, sex, cohort, smoking, chronic disease, BMI, red blood cells, white blood cells, alcohol consumption, and education. Nominal P-value threshold (P < 0.05) was considered, due to the correlation between the pre-selected smoking-miRNAs and the exploratory nature of our analysis. All analysis were performed using R software, version 4.2.3 (R Core Team, 2021).

In silico analyses of target genes of smoking-associated miRNAs

We used the open-source platform miRWalk [64] to obtain putative and validated miRNA target genes. Our selection criteria were based on genes that were reported in all three commonly used miRNA prediction databases (TargetScan [39], miRDB [40], and miRTarBase [41]), embedded within the miRWalk platform [64]. We included genes which were reported in all three databases. We explored if the miRNA predictive target genes were previously linked to smoking through GWAS [42], EWAS [43], and TWAS studies [61]. Using miRPathDB 2.0 [44], we explored the KEGG pathways [45] underlying the smoking-related miRNAs.