Background

Thyroid carcinoma is the most common endocrine malignant disorder in the world, and its incidence has increased in recent decades. Thyroid cancer can mainly be histologically classified into three types, namely, follicular thyroid cancer, papillary thyroid cancer (PTC), and anaplastic thyroid cancer [1]. The former two are well-differentiated, while the latter one is undifferentiated. Thus, the substantial characteristics of these pathological types are different. Of these types, PTC is the most common one, occupying about 80% of all thyroid cancers [2].

As a well-differentiated type of thyroid neoplasm, PTC often shows a favorable prognosis, especially in patients with young ages [3]. Nevertheless, a proportion of PTC patients still have unfavorable prognosis. Reports showed that 50% of PTC cases are diagnosed with cervical lymph node metastasis (LNM), suggesting that high frequency of LNM is a usual event in PTC patients, which may be the leading cause of poor prognosis for these patients [4, 5]. However, the molecular mechanisms underlying LNM remain little understood.

Reports have implicated the roles of a number of genes in the LNM of PTC. For example, over-expressions of IDH2 [6], VEGF, and Survivin [7] may be significantly correlated with the neck LNM in PTC patients. NECTIN4 is also over-expressed in PTC tissues, which may regulate the expression of epithelial-mesenchymal transition-related genes through activation of the PI3K/AKT pathway, and thus, up-regulated NECTIN4 is connected with LNM [8]. Similarly, TC-1 protein expression showed significant associations with LNM in PTC patients [9]. Nevertheless, these studies only focused on any single gene or pathway, neglecting that cancer development involves multiple gene variations, or dysregulation and activation of multiple signaling pathways [10].

To comprehensively learn the genes that might be critical for cancer progression, mining the high throughput data is a good strategy. With the development of microarray and sequencing, much data has been generated and stored in the public databases. Excavation of the disorganized data can help us find genes that are important in the development of PTC, and help us explore their roles in different biological processes.

To address this issue, we aimed to screen possible candidate genes and assess their possible roles in the development of LMN in PTC patients. The present study included two phases: a discovery phase and a validation one. In the discovery phase, we planned to screen the possible key genes that play critical roles in the process of PTC LNM by using bioinformatics approaches. In the validation phase, the roles of the screened genes were further assessed in a PTC cohort from the cancer genome atlas (TCGA) [11] to show their relations with clinical features as well as to evaluate their prognostic values in PTC patients. Then, the protein expressions of these genes were further assessed by an immunohistochemical (IHC) assay.

Materials and methods

Data screening

A dataset, GSE60542, was retrieved from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). The type of the study was ‘Expression profiling by array’, which was confined to human species and conducted on the Affymetrix Human Genome U133 Plus 2.0 Array. The dataset was deposited by Tarabichi et al. [12], which contained 19 PTC samples from patients with LNM, and 14 samples from those without LNM.

The dataset was analyzed by using GEO2R that is a web tool based on the limma packages of R software [13]. The samples were divided into two groups according to the N stage, and a comparison of the two groups was carried out. The results were downloaded and analyzed, in which the genes that met the following cut–off criteria were screened out as the differentially expressed genes (DEGs): p < 0.05 and |log fold–change| > 1.0.

Assessment of the DEGs in UALCAN and Oncomine database

To screen the key genes that might have an influence on the prognosis of thyroid cancer, a cohort from the TCGA database was used for validation by using UALCAN tool [14]. The UALCAN is a convenient web-portal that helps researchers to carry out multiple analyses of TCGA gene expression data, which contains Level 3 RNA-seq data and relevant clinical information from a total of 31 carcinoma types. The prognostic value of each hub gene was individually evaluated. The genes whose expression levels affected the prognosis of thyroid cancer were regarded as the key genes.

Besides the UALCAN tool, the mRNA expression levels of the genes were also assessed in Oncomine tool [15]. The Oncomine is also a platform that provides researchers with a powerful analytical function for computing gene expression characteristics, clusters and gene set modules.

The gene expression level of the candidate genes was evaluated in the Oncomine and UALCAN, respectively. The mRNA levels of each gene in PTC and normal samples were compared.

Data mining in TCGA database

In order to explore the roles of the key genes in PTC cases, we conducted data excavation in the TCGA database. Standardized gene expression levels of the key genes in a TCGA cohort and relevant clinical data of PTC patients were obtained from TCGA and cBio Cancer Genomics Portal [16]. Samples with gene expression data, clinical characteristics data and prognosis data were achieved. Genes were divided as “high” and “low” groups in the light of the median expression levels, respectively.

Functional annotation and gene set enrichment analysis (GSEA)

To learn the possible biological functions of the key genes, they were submitted to Genecards tool [17] for evaluation. The biological process part of GO (gene ontology) and KEGG pathway enrichment analysis were concerned.

To explore the possible molecular mechanisms underlying the key genes, GSEA was also conducted to identify the significant common pathways the genes enriched in. TCGA data can be ordered in a ranked list according to the varied expression between the phenotypes. Each of the key gene level was divided, respectively, as high and low categories according to the median expression value to annotate phenotype. Gene set c2.cp.kegg.v6.2.symbols.gmt (Version 6.2) was chosen as the reference set. False discovery rate (FDR) < 0.05 was considered as the cut-off standard.

PTC specimens

PTC tissue microarray (HThy-Pap120CS-01) was purchased from Shanghai Outdo Biotech Co., Ltd. A total of 58 PTC samples and 58 paired adjacent non-cancer ones were included. Moreover, four cases of normal thyroid tissues were also included. All patients were pathologically diagnosed as PTC. No other treatment was performed before the operation. All samples were obtained after informed consent from the patients.

IHC staining and evaluation

The protein expressions of the key genes were detected by utilizing the two-step method of IHC as previously described in our study [18]. In brief, the slides were deparaffinized, rehydrated and treated with 3% hydrogen peroxided for 20 min to inhibit endogenous peroxidase. The slices were washed with distilled water and saturated in phosphate buffer (PBS) for 5 min, and then incubated overnight with goat anti-polyclonal antibody (primary antibody; ABCAM) at 4 °C. DAB solution was used to develop and hematoxylin was re-stained. IHC staining was performed according to the manufacturer’s instructions.

The results of IHC staining were identified by a comprehensive scoring method. Two pathologists independently evaluated the results without knowing the clinical parameters of the cases. The dyeing intensity of protein ranged from 0 to 3 points, in which 0 was negative, 1 was weak, 2 was medium, and 3 was strong. The percentage score of positive cells was 0–4, including 1 (0–25%), 2 (26–50%), 3 (51–75%), and 4 (76–100%). Then, multiply the proportion and intensity scores to get a total score, ranging from 0 to 12.

Statistical analysis

For continuous variables, Variance analysis (ANOVA), t test or Wilcoxon rank sum test were used to analyze the differences between the two groups according to the concrete types of the data. Chi-square test was used to distinguish the incidence of different groups. Kaplan–Meier method was used to calculate the overall survival curve, and Log-rank test was used to determine the difference in survival rate. COX multivariate regression analysis was carried out to assess the effects of the confounding factors on survival. These analyses were performed by utilizing MedCalc software (15.2.2; Mariakerke, Belgium). p < 0.05 is considered to have statistical significance.

Results

Key genes screening from microarray-based datasets

The dataset GSE60542 comprises 19 PTC samples from patients with LNM, and 14 PTC samples from those without LNM. The comparison of the two groups generated 5 up-regulated and 21 down-regulated DEGs (Table 1, and Fig. 1A).

Table 1 The most significant up-regulated and down-regulated DEGs in GSE60542 (Top ten, PTC N1 versus PTC N0)
Fig. 1
figure 1

A Heatmap of the DEGs screened from GSE60542. The horizontal axis above stands for the name of sample; the left vertical axis stands for the names of DEGs. Red stands for up-regulated genes, while green stands for down-regulated genes. B Survival curves of the representative DEGs evaluated in the UALCAN tool. a Down-regulated genes. b Up-regulated genes

After we had submitted these DEGs to the UALCAN tool for evaluation, the data showed that only the expression of one gene, IGFBP3, may have an influence on the overall survival time of thyroid carcinoma patients (Fig. 1B). Thus,IGFBP3 was selected for further analysis.

Expression of IGFBP3 between thyroid cancer and normal tissues

To assess the expression levels of IGFBP3 mRNA between thyroid cancer and normal thyroid tissues, the data in the Oncomine and TCGA database were analyzed.

As shown in Fig. 2A, IGFBP3 mRNA was significantly over-expressed in thyroid cancer tissues relative to normal controls.

Fig. 2
figure 2

A Expression of IGHBP3 mRNA in the thyroid cancer and the controls, respectively. a Oncomine database, p < 0.05 cancer vs control; b UALCAN database, p < 0.05 cancer vs control. B The survival curves in PTC patients with high and low expression of IGHBP3. a Disease-free survival; p > 0.05; b Overall survival; p < 0.05. C GSEA showed that high IGHBP3 expression was positively correlated with several metabolism-related and cancer-related pathways

Relationship of Clinicopathological factors with IGFBP3 mRNA expression

The mRNA expression data of IGFBP3 in a PTC cohort and relevant clinical data were obtained from TCGA database. Table 2 lists the characteristics of the involved PTC cases.

Table 2 Relationship between IGFBP3 expression and clinicopathological factors

The expression levels of IGFBP3 were separated as high and low groups on the basis of their median level. As shown in Table 2, the relationship between IGFBP3 mRNA level and clinicopathological characteristics was explored. The results revealed that IGFBP3 high expression may have an association with male patients, advanced clinical stages, high T stages, and LNM, respectively. No associations were observed in the comparisons regarding age, race, histological type, and distant metastasis.

Association of IGFBP3 mRNA expression with the prognosis of PTC

The prognostic value of IGFBP3 in PTC was evaluated. As shown in Fig. 2B, the log-rank test indicated that PTC cases with high IGFBP3 expression had a shorter overall survival time than those of cases with low expression (p < 0.05). Nevertheless, IGFBP3 seemed to exert little influence on the disease-free survival in PTC patients (p > 0.05). However, multivariate cox regression analysis failed to suggest IGFBP3 expression as an independent prognostic factor for PTC (data not shown).

GO and pathway enrichment analysis of IGFBP3

To comprehensively explore the functions of IGFBP3, the GO and KEGG pathway enrichment analysis were conducted.

Table 3 listed the top 5 GO terms and pathways, respectively. GO analysis suggested that IGFBP3 has a relationship with regulation of cell growth, osteoblast differentiation, negative regulation of protein phosphorylation, protein phosphorylation, and apoptotic process.

Table 3 GO analysis and pathway enrichment analysis of IGFBP3 (top 5)

Pathway analysis suggested that IGFBP3 might have a relationship with gene expression, TP53 regulates transcription of cell death genes, glucose/energy metabolism, cellular senescence, and DNA Damage Response.

To know the potential roles of IGFBP3 more specifically in the development of thyroid cancer, GSEA was applied to identify the possible pathways it enriched in. The results showed that high IGFBP3 expression was mainly associated with two categories of pathways: metabolism-related and cancer progression-related signaling pathways. As shown in Fig. 2c, the former mainly included arachidonic acid metabolism, alpha linolenic acid metabolism, ether lipid metabolism, galactose metabolism, glycerophospholipid metabolism, and linoleic acid metabolism; the latter mainly contained VEGF signaling pathway, P53 signaling pathway, Notch signaling pathway, Gnrh signaling pathway, Fc Epsilon Ri signaling pathway, and Cell Cycle.

Protein expression of IGFBP3 determined by IHC

The results of IHC showed that specific staining was observed in the cytoplasm of the cancer and normal cells, respectively (Fig. 3A-b). The data also confirmed that the scores of IGFBP3 in PTC tissues was higher than those in para-carcinoma ones (p < 0.05), as shown in Fig. 3B-a.

Fig. 3
figure 3

A IGHBP3 expression in PTC tissues and adjacent normal tissues. a IGHBP3 protein expressions in a tissue chip (paired PTC tissues and adjacent normal tissues) were measured with IHC (× 10); b Representative examples of IGHBP3 expression in PTC tissues and adjacent normal tissues (× 200); c stands for cancer, while p stands for para-cancer. B The results from the tissue chip. a The IGHBP3 expression was higher in PTC tissues than that in the normal tissues (p < 0.05). bd High IGHBP3 expression was related with advanced age (b), advanced clinical stage (c), and the presence of LNM (d), respectively (p < 0.05 for each comparison)

There were a total of 58 PTC samples on the slice (Fig. 3A-a). The characteristics of these cases were presented in Table 4, of which no cases presented distant metastasis. Moreover, the survival information was not available in the chip. Confounding factors such as clinical stages, LNM, and age can be addressed.

Table 4 Patient characteristics of the tissue microarray

Interestingly, the data indicated that higher expression score of IGFBP3 was correlated with advanced clinical stages, LNM, and advanced ages, respectively (p < 0.05) (Fig. 3B).

Discussion

To find the key genes in PTC LNM, we conducted a research containing a discovery phase and a validation one. In the former phase, we screened several genes that might be involved in the LNM of PTC through bioinformatics analyses, and found that IGFBP3 may be a key candidate gene and its over-expression might have an association with poor prognosis of PTC patients. In the validation phase, we firstly assessed the expression of IGFBP3 in a PTC cohort from TCGA database, and found that high IGFBP3 expression may have a relationship with sex, advanced clinical stages, high T stages, and LNM, as well as shorter overall survival time in patients. IGFBP3 was predicted to be enriched in many pathways regarding metabolisms and cell signaling pathways. Then, IGFBP3 protein expression was tested by IHC in a tissue microarray containing PTC samples. The data confirmed that IGFBP3 was up-regulated in PTC tissues relative to the para-carcinoma tissues. Higher expression score of IGFBP3 was associated with advanced clinical stages, LNM, and advanced ages, respectively.

IGFBP‐3, Insulin‐like growth factor binding protein‐3, is a protein of high affinity binding protein family, which regulates the function of insulin-like growth factor. It can be found in various organs and tissues and plays multiple roles in various biological processes, such as cell proliferation, senescence, apoptosis, and epithelial-mesenchymal transition [19]. Thus, IGFBP‐3 has been implicated to play a role in tumor development. Evidence indicates that IGFBP‐3 can play different roles in different carcinomas. For instance, over-expression of IGFBP‐3 has an association with esophageal cancer promotion and predicts poor prognosis [20]. Likewise, high expression of IGFBP‐3 in glioma may be associated with tumor histology and poor prognosis [21]. Nevertheless, IGFBP‐3 acts as a tumor suppressor in a number of cancers. In lung cancer cells, IGFBP3 over-expression might induce cancer cell apoptosis and increase sensitivity of cells to cisplatin [22]. In liver cancer cells, IGFBP3 expression was lower relative to the normal cells. IGFBP3 upregulation may lead to cell apoptosis and reduced colony formation [23]. Hence, IGFBP3 presents its multifunctional roles in various cancers. For PTC, both bioinformatics analysis and validation experiments showed that IGFBP3 was over-expressed in cancer tissues compared to the normal tissues, suggesting that IGFBP3 acts as an oncogene in PTC development. Moreover, the data also indicate that IGFBP3 expression might exert any influence on the overall survival time of PTC patients.

Evidence showed that IGFBP3 may exert its biological effects, in a sex-specific manner, on diseases such as obesity [24] and osteoporosis [25]. The results of the present study presented that the rate of high IGFBP3 in men (60.98%, 75/123) was markedly higher than that in women (43.50%,144/331), indicating that IGFBP3 might also play a certain role in the PTC development in a sex-specific manner. Reports showed that estrogen may affect the bioactivity of IGF1 and the IGFBP3 expression levels [26]; and both estrogen and androgen treatment can obviously regulate the expressions of IGF pathway genes, including IGFBP3 [27]. This may help speculate the reasons for the unbalanced expression of IGFBP3 in PTC tissues between male and female cases.

In the discovery phase, IGFBP3 has been selected as a candidate gene that may be associated with LNM of PTC. In the validation phase, the data based on a TCGA cohort indicated that high IGFBP3 mRNA expression may have a correlation with advanced clinical stages, high T stages, and the presence of LNM. Then, the data from a tissue chip confirmed that high IGFBP3 protein expression may be associated with LNM. The results steadily confirmed a critical role of IGFBP3 in the process of LNM in PTC patients. However, little about the molecular mechanisms has been clearly shown to date. A report showed that IGFBP3 may promote LNM in oral cancer through activation of ERK signaling [28]. However, few studies regarding thyroid cancer on this issue could be searched. Hence, future lab experiments are needed to explore the mechanisms of IGFBP3 in the process of LNM.

To comprehensively learn the possible functions of IGFBP3 in PTC, we conducted analyses by using the Genecards and GSEA tool, respectively. Both GO and KEGG enrichment analysis suggested that IGFBP3 may be enriched in a variety of pathways concerning multiple biological processes. However, GSEA analysis suggested that IGFBP3 might be mainly enriched in several cancer progression-related and metabolism-related pathways. Reports have shown that VEGF signaling_pathway [29], P53 signaling pathway [30], Notch signaling pathway [31], and Gnrh signaling pathway [32] may be involved in cancer development. As for the metabolism-related pathways, evidence has shown that they may also be related with cancer aggression. For instance, alpha linolenic acid metabolism was associated with aggressive prostate carcinoma [33]. Arachidonic acid metabolism might be involved in hepatocellular cancer progression [34]. Therefore, in future investigations, these pathways resulted from GSEA analysis need to be considered to explore the mechanisms underlying PTC progression.

Several limitations might be addressed. First, in the key genes screening process, ‘P values’ but not ‘adjusted P values’ had been used for screening the DEGs. In this case, if the ‘adjusted P values’ was considered, few genes met the selection criteria. Thus, we had to relax the inclusion criteria and used ‘P values’ instead. This less rigorous operation might lower the credibility of the results. Nevertheless, subsequent validation processes may to some extent make the results convincible. Second, in the validation process, only data in the TCGA database and a tissue chip were used. Future experiments using cell lines or animal models are needed to further investigate the molecular mechanisms of IGFBP3 in PTC development. Third, an IHC assay against paraffin tissue sections was used to validate the protein expression of IGFBP3 in PTC tissues. More comprehensive approaches, such as an agnostic quantitative proteomics approach against fresh frozen tissue sections, may help further identify suitable protein candidates [35]. However, a relevant commercial tissue chip containing fresh frozen tissue sections could not be obtained from the company. Therefore, this approach might be utilized in future studies when the fresh frozen tissue sections are available.

Conclusion

In summary, the results of this study suggest that over-expression of IGFBP3 might be associated with LNM in PTC patients. In addition, it may affect the prognosis of PTC and may become a potential target for cancer treatment.