Background

Lung cancer ranks first among all cancers in terms of incidence, and it is also the most important cause of cancer death all over the world [1]. Non-small cell lung cancer (NSCLC) accounts for around 85% of all lung cancers, with 30% of NSCLC cases being classified as lung squamous cell carcinoma (LUSC) [2,3,4,5,6,7]. Many patients are diagnosed with NSCLC at an advanced phase, which is attributable for the high mortality [8]. Therefore, more effective biomarkers are urgently needed in LUSC.

MicroRNAs (miRNAs) are ∼ 22-nt long endogenous RNAs that play significant roles in various cellular processes [9]. Previous studies have shown that miRNAs can target mRNAs involved in most of the developmental processes and are thus associating with many diseases [10]. Moreover, miRNAs have also been found to be involved in cancer [11]. Studies have found that miR-198-5p plays a vital role in many human cancers, including lung cancer [12]. A previous study [12] investigated the relationship between FGFR1 and miR-198-5p. However, the expression pattern of miR-198-5p in LUSC remains unknown. Additionally, the prospective target genes of miR-198-5p in LUSC have not yet been identified. Therefore, the relationship between miR-198-5p and lung squamous cell carcinoma as well as the underlying mechanism remains unknown.

To evaluate the clinical significance of miR-198-5p in LUSC, we examined miR-198-5p expression in LUSC tissues and carried out additional specific analyses to uncover its clinicopathological role. Furthermore, we performed big data analysis based on the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA). Subsequently, bioinformatics examinations were conducted to investigate the probable mechanism of miR-198-5p in LUSC.

Methods

Retrieval of data and publications from TCGA and GEO

The flowchart representing the main design of our study is shown in Fig. 1. We downloaded miRNA expression data from the Cancer Genome Atlas (TCGA) associated with LUSC. All data were converted to a log2 scale. We also retrieved data from the Gene Expression Omnibus (GEO) and ArrayExpress to assess the expression pattern of miR-198-5p in LUSC and corresponding non-tumor samples. The search terms were as follows: (“lung” OR “pulmonary” OR “respiratory” OR “bronchioles” OR “bronchi” OR “alveoli” OR “pneumocytes” OR “air way” [MeSH]) AND (“cancer” OR “carcinoma” OR “tumor” OR “neoplas” OR “malignan” “squamous cell carcinoma” OR “adenocarcinoma” [MeSH]) OR/AND (“MicroRNA” OR “miRNA” OR “MicroRNA” OR “Small Temporal RNA” OR “noncoding RNA” OR “ncRNA” OR “small RNA” [MeSH]). Datasets with expression levels of miR-198-5p in LUSC and corresponding non-tumor samples were included. Other types of tumor or other miRNA were excluded. The number of samples in the tumor and non-tumor groups was at least three. The expression level of miR-198-5p in the datasets was converted to a log2 scale. The number, mean, and standard deviation of miR-198-5p levels in the tumor and non-tumor groups were calculated. We also searched PubMed, Web of Science, Science Direct, Google Scholar, Ovid, LILACS, Wiley Online Library, EMBASE, Cochrane Central Register of Controlled Trials, Chong Qing VIP, CNKI, Wan Fang, and China Biology Medicine disc; however, no publications regarding miR-198-5p expression in LUSC were found in these databases.

Fig. 1
figure 1

Flowchart representing the main design of our study

RT-qPCR

After the formalin-fixed, paraffin-embedded (FFPE) sections were dewaxed, total RNA was achieved from these sections with the miRNeasy FFPE Kit (QIAGEN) according to manufacturer’s instruction. The concentration of RNA was measured using Nanodrop 2000. MiR-191 (CAACGGAAUCCCAAAAGCAGCU) and miR-103 (AGCAGCAUUGUACAGGGCUAUGA) were used as stably expressed control miRNAs as previously reported [13]. Applied Biosystems 7900 PCR system was used to perform real-time quantitative PCR and detect miR-198-5p expression (GGUCCAGAGGGGAGAUAGGUUC). The relative expression of miR-198-5p was calculated using the formula 2−ΔCq.

Statistical analysis

Paired sample t test and independent sample t test were performed using SPSS 23.0 to determine the association between miR-198-5p expression and various clinicopathological parameters based on real-time RT-qPCR and microarray data. P < 0.05 was regarded as being statistically significant. Receiver operating characteristic (ROC) curves were constructed using SPSS 23.0.

Concerning the meta-analysis based on all accessible data, we used Stata 14 to determine the combined expression value of miR-198-5p in tumor and non-tumor groups and its relationship with both standard mean difference (SMD) and summarized receiver operating characteristic (SROC). The fixed effect model was initially used. The random effect model was used when heterogeneity was detected. The data were considered heterogeneous when I2 > 50%. Subgroup analysis and sensitivity analysis were carried out to find out the source of heterogeneity. We tested publication bias with funnel plots. SPSS 23.0 was employed to explore the diagnostic value of miR-198-5p in LUSC based on GEO data. Then, we performed a diagnostic meta-analysis using MetaDiSc 1.4. Meta-regression and threshold effect analysis were performed to determine the source of heterogeneity. The data from TCGA were not included in the analysis because of missing data.

Differentially expressed mRNAs in LUSC based on TCGA

The expression level of each mRNA transformed into the log2 scale was evaluated using DESeq R package. We obtained 9860 differentially expressed genes in LUSC, including 6092 upregulated and 3768 downregulated genes.

Selection of putative target genes of miR-198-5p

Predictions were conducted in silico with miRWalk 2.0 (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2/). Genes that were present in more than 5 of the 12 prediction online tools were selected for further analysis. The selected genes were cross-referenced with the differentially expressed genes in TCGA. The overlapping genes were considered the putative targets of miR-198-5p in LUSC.

Bioinformatics analyses

Gene Ontology (GO) annotation via DAVID (https://david.ncifcrf.gov/) was performed, including biological processes (BP), cellular components (CC), and molecular functions (MF). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways associated with the putative target genes were also analyzed in DAVID. The results of GO annotation and KEGG pathway analysis were visualized using BiNGO and EnrichmentMap plugins in Cytoscape version 3.5.0. We used STRING (https://string-db.org/) to build interaction maps of the proteins encoded by the putative target genes.

Validation of the putative target genes in the most significant KEGG pathway based on TCGA data

We selected the genes in the most significant KEGG pathway “non-small cell lung cancer” for further analysis. The differences in the expression levels of E2F2, E2F3, TGFA, PRKCG, CDK6, EGF, and CDK4 between LUSC and non-tumor tissues were analyzed based on TCGA data.

Results

GEO data mining to determine the expression and diagnostic value of miR-198-5p

Considering the meta-analysis of miR-198-5p expression-based GEO data, eight datasets were included in our study. The scatter plots based on the GEO datasets are shown in Fig. 2. Forest plots using both fixed effect model (Fig. 3a) and random effect model (Fig. 3b) represented the expression level of miR-198-5p in LUSC. The combined effect sizes were − 0.30 (95% CI − 0.54, − 0.06) and − 0.39 (95% CI − 0.83, 0.05) based on the fixed and random effect models, respectively. Subgroup analysis showed that there was no heterogeneity among the studies from Asia (I2 = 0.0%) (Fig. 3c). The corresponding funnel plot is shown in Fig. 4a (P > 0.05). Sensitivity analysis (Fig. 4b) showed that the dataset GSE40738 might be a source of heterogeneity. The forest plot after the removal of GSE40738 is shown in Fig. 4c. The adjusted combined SMD value was − 0.56 (95% CI − 0.86, − 0.27) with I2 = 33.1%. The receiver operating characteristic curves based on the included datasets from GEO database are shown in Fig. 5. The pooled sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and diagnostic odds ratio were 0.47 (95% CI 0.40, 0.54), 0.76 (95% CI 0.69, 0.82), 2.19 (95% CI 1.58, 3.05), 0.57 (95% CI 0.38, 0.84), and 4.64 (95% CI 2.04, 10.56), respectively (Fig. 6). The area under the curve (AUC) based on the summarized receiver operating characteristic (SROC) curve was 0.7749 (Q* = 0.7143) (Fig. 7). We did not find a threshold effect of miR-198-5p in the study (P = 0.058). Only study region was determined to be a covariant in the meta-regression, and thus it was likely not a source of heterogeneity (P = 0.0550) (Table 1).

Fig. 2
figure 2

Scatterplots based on the included GEO datasets. a GSE14936. b GSE16025. c GSE19945. d GSE25508. e GSE40738. f GSE47525. g GSE51853. h GSE74190

Fig. 3
figure 3

Continuous variable meta-analysis based on GEO datasets. a Forest plot based on fixed effect model b Forest plot based on random effect model c Subgroup analysis by region

Fig. 4
figure 4

Continuous variable meta-analysis based on GEO datasets. a Funnel plot. b Sensitivity analysis. c Forest plot after the elimination of GSE40738

Fig. 5
figure 5

Receiver operating characteristic (ROC) curves based on GEO datasets. a GSE14936. b GSE16025. c GSE19945. d GSE25508. e GSE40738. f GSE47525. g GSE51853. h GSE74190

Fig. 6
figure 6

Pooled sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, diagnostic score, and odds ratio obtained using MetaDisc 1.4 based on GEO datasets

Fig. 7
figure 7

Summarized receiver operating characteristic (SROC) curve of GEO datasets

Table 1 Meta-regression based on GEO data

Clinical value of miR-198-5p in LUSC assessed using RT-qPCR

Using RT-qPCR, the expression of miR-198-5p in the LUSC tissues was (4.3826 ± 1.7660) compared with that in the non-tumor tissues (4.4522 ± 1.8263, P = 0.885) (Fig. 8a, b). The other clinicopathological features of the LUSC case are shown in Table 2. Notably, the expression level of miR-198-5p in patients with early TNM stage (I-II) was (5.4400 ± 1.5277) compared to (3.5690 ± 1.5228) in patients with advanced TNM stage (III-IV) (P = 0.008) (Fig. 8c, d).

Fig. 8
figure 8

Relationship between miR-198-5p expression and TNM stage in lung squamous cell carcinoma (LUSC) and diagnostic value of miR-198-5p expression. Scatterplots (a LUSC and adjacent non-cancerous tissues and c TNM stage). Receiver operating characteristic (ROC) curves (b LUSC and adjacent non-cancerous tissues and d TNM stage)

Table 2 Relationship between the expression of miR-198-5p and clinicopathological features in LUSC patients

Meta-analysis of miR-198-5p expression based on PCR and GEO data

The combined SMD values based on GEO and PCR data were − 0.26 (95% CI − 0.48, − 0.04) (Fig. 9a) and − 0.34 (95% CI − 0.71, 0.04) (Fig. 9b) using the fixed and random effect models, respectively. In subgroup analysis, no heterogeneity among the studies from Asia was observed (I2 = 0.0%) (Fig. 9c). The corresponding funnel plot is shown in Fig. 10a (P > 0.05). Once again, study region and the dataset GSE40738 were identified as sources of heterogeneity (Fig. 10b). After eliminating GSE40738 from the analysis, the SMD changed to − 0.46 (95% CI − 0.72, − 0.20) (Fig. 10c). The pooled sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and diagnostic odds ratio were 0.52 (95% CI 0.45, 0.58), 0.70 (95% CI 0.63, 0.76), 1.97 (95% CI 1.25, 3.13), 0.56 (95% CI 0.38, 0.82), and 4.23 (95% CI 2.07, 8.64), respectively (Fig. 11). The area under the curve (AUC) based on the summarized receiver operating characteristic (SROC) curve was 0.7351 (Q* = 0.6812) (Fig. 12). We also observed threshold effect of miR-198-5p in this study (P = 0.013). In this case, study region was not accountable for heterogeneity (P = 0.2107) (Table 3).

Fig. 9
figure 9

Continuous variable meta-analysis based on GEO datasets and in-house RT-qPCR results. a: Forest plot based on fixed effect model, (b): forest plot based on random effect model, (c): subgroup analysis by region

Fig. 10
figure 10

Continuous variable meta-analysis based on GEO datasets and in-house RT-qPCR results. a Funnel plot. b Sensitivity analysis. c Forest plot after the elimination of GSE40738

Fig. 11
figure 11

Pooled sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, diagnostic score, and odds ratio obtained using MetaDisc 1.4 based on GEO datasets and in-house RT-qPCR results

Fig. 12
figure 12

Summarized receiver operating characteristic (SROC) curve of GEO datasets and in-house RT-qPCR results

Table 3 Meta-regression based on GEO and in-house RT-qPCR data

Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation, and protein-protein interaction (PPI) network

In total, 542 genes were considered putative targets of miR-198-5p in LUSC (Fig. 13). Based on Gene Ontology (GO) analysis, the putative target genes of miR-198-5p were predominantly related to epidermis development with respect to biological processes (P = 4.84E−04) (Fig. 14, Table 4), cell junction in the case of cellular components (P = 7.60E−05) (Fig. 15, Table 4), and protein dimerization activity regarding molecular functions (P = 8.03E−05) (Fig. 16, Table 4). In terms of the KEGG pathway, the putative target genes of miR-198-5p were associated with non-small cell lung cancer, pathways in cancer and pancreatic cancer (Fig. 17, Table 5). The protein-protein interaction (PPI) network is shown in Fig. 18. The seven chosen genes in the KEGG pathway “non-small cell lung cancer” were all significantly upregulated in the LUSC samples compared to the non-tumor samples based on TCGA data (Fig. 19, Fig. 20).

Fig. 13
figure 13

Venn diagram based on the 12 predicting tools and the differentially expressed genes (DEGs) in TCGA

Fig. 14
figure 14

Biological processes (BP) were analyzed using the BiNGO plugin in Cytoscape

Table 4 The most enriched 10 Gene Ontology (GO) term in biological process (BP) analysis, cellular component (CC), and molecular functions (MF)
Fig. 15
figure 15

Cellular components (CC) were analyzed using the BiNGO plugin in Cytoscape

Fig. 16
figure 16

Molecular functions (MF) were analyzed using the BiNGO plugin in Cytoscape

Fig. 17
figure 17

Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was performed using the EnrichmentMap plugin in Cytoscape

Table 5 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation of the putative target genes of miR-198-5p
Fig. 18
figure 18

Protein-protein interaction (PPI) networks with 537 nodes and 848 edges were constructed using STRING. The PPI enrichment P value was 1.7E−14. Disconnected nodes were hidden in the network

Fig. 19
figure 19

Scatterplots of the seven chosen genes from the Cancer Genome Atlas (TCGA). a CDK4. b CDK6. c E2F2. d E2F3. e EGF. f PRKCG. g TGFA

Fig. 20
figure 20

Receiver operating characteristic (ROC) curves of the seven chosen genes from the Cancer Genome Atlas (TCGA). a CDK4. b CDK6. c E2F2. d E2F3. e EGF. f PRKCG. g TGFA

Discussion

MiR-198-5p was clearly under-expressed in LUSC tissues in comparison with non-cancer lung tissues. The cases with LUSC in Asia expressed lower levels of miR-198-5p than did the healthy controls; however, the expression pattern in other regions was unclear. Our RT-qPCR indicated that the expression of miR-198-5p might be related to the tumor TNM stage, which suggests that miR-198-5p likely plays a role in tumor growth, lymph node metastasis, or distant metastasis. However, the downregulation of miR-198-5p was not obvious in our in-house RT-qPCR analysis. The diagnostic validation and meta-analysis based on GEO and RT-qPCR data indicated that miR-198-5p might be a biomarker of LUSC, but the sensitivity and specificity were not sufficiently high. Similarly, study region may be a source of heterogeneity, which suggested that the diagnostic screening might be suitable for Asian populations but not for others. Apart from LUSC, several studies have explored the expression pattern and mechanism of miR-198-5p in other diseases. MiR-198-5p has been reported to be upregulated in multiple myeloma [14], chronic pancreatitis or pancreatic ductal adenocarcinoma [15], Parkinson’s disease [16], esophageal cancer [17], preeclampsia [18], pancreatic adenocarcinoma, ampullary adenocarcinoma [19], lupus nephritis [20], retinoblastoma [21], anencephaly [22], and squamous cell carcinoma of tongue [23]. On the other hand, low expression of miR-198-5p has been found in prostate cancer [24], breast cancer [25], glioblastoma [26, 27], hepatocellular carcinoma [28], especially hepatitis C virus-associated hepatocellular carcinoma [29, 30], osteosarcoma [31], gastric cancer [32], colorectal cancer [33], pancreatic cancer [34], and respiratory syncytial virus (RSV) infection [35]. The overexpression of miR-198-5p has also been documented in CD8+ T cells in renal cell carcinoma [36]. In prostate cancer, a recent study indicated that miR-198-5p is targeted by the long noncoding RNA SChLAP1, leading to the activation of the MAPK1 pathway, thereby promoting cancer cell proliferation and metastasis [24]. Another study suggested that miR-198-5p may be involved in prostate cancer [37]. In hepatocellular carcinoma, miR-198-5p has been shown to target the HGF/c-MET pathway [38]. Several studies have revealed that the expression of miR-198-5p is greatly related to lymph node metastasis or distant metastasis in different malignant diseases, such as breast cancer [25], osteosarcoma [31], gastric cancer [32], and colorectal cancer [33]. Some studies have also shown that miR-198-5p is closely related to cell proliferation, apoptosis, and migration [12, 14, 39, 40]. The relationship between miR-198-5p and cancer prognosis is controversial [15, 17, 27, 32,33,34]. Thus, we investigated whether miR-198-5p plays an important role in biological processes in various diseases, both malignant and benign.

In the case of lung adenocarcinoma, two reports have verified that miR-198-5p is under-expressed [12, 41]. However, studies on the characteristic of miR-198-5p in LUSC are lacking. One study assessed the diagnostic significance of miR-198-5p in lung adenocarcinoma, with sensitivity = 71.1%, specificity = 95.2%, and AUC = 0.887 (95% CI 0.801, 0.945) [41]. Our study highlighted the diagnostic value of miR-198-5p in LUSC. Yang et al. showed that miR-198-5p was capable to suppress proliferation and promote apoptosis in lung cancer cells by targeting FGFR1 [12], and Wu et al. showed that miR-198-5p promotes apoptosis, represses cell proliferation, and leads to cell cycle arrest in lung adenocarcinoma cells by directly targeting SHMT1 [39]. Our study showed that expression pattern and diagnostic value of miR-198-5p varied according to the race of the patient population, which should be further validated in larger samples.

Although many studies use prediction tools to determine miRNA target genes, the inadequate number of available prediction tools can lead to unreliable data. We used 12 online prediction tools based on miRWalk 2.0, and this method had not been previously utilized in LUSC. The predicted genes were cross-referenced with the differentially expressed genes in TCGA, which further enhanced the specificity and accuracy of our investigation. Because miR-198-5p is downregulated in LUSC, we chose the upregulated genes from TCGA. Via bioinformatics analyses, the putative target genes of miR-198-5p were most significantly enriched in the KEGG pathways of non-small cell lung cancer, pathways in cancer, pancreatic cancer, glioma, and ECM-receptor interactions. For the GO biological processes, the putative target genes of miR-198-5p were involved in epidermis development, negative regulation of neuron apoptotic processes, positive regulation of cell proliferation, the canonical Wnt signaling pathway, and nervous system development, which indicated that the putative target genes might regulated epidermis proliferation, cell apoptosis, or carcinoma of nervous tissues. In addition, for the GO cellular components, the putative target genes of miR-198-5p were enriched in cell junction, transcription factor complex, synaptic vesicle, cell surface proteins, and synapses, which are related to migration, metastasis, and intercellular exchange of molecules. In terms of the GO molecular functions, the putative target genes of miR-198-5p were associated with protein dimerization activity, calcium ion binding, hydrolase activity on carbon-nitrogen (but not peptide) bonds, frizzled binding, and RNA polymerase II transcription co-activator activity. To verify the accuracy of our analysis, we selected several genes and determined its expression based on TCGA. The genes included the pathway non-small cell lung cancer were E2F2, E2F3, TGFA, PRKCG, CDK6, EGF, and CDK4, which were all expressed at significantly higher levels in LUSC tissues in comparison to that in the non-cancer group. Thus, these genes are probably the targets of miR-198-5p. The putative target genes of miR-198 should be validated further in the future.

Conclusions

MiR-198-5p might be downregulated in LUSC, especially in Asia region. The putative target genes of miR-198-5p were closely related to tumorigenesis and progress in LUSC.