Background

Cancer metastasis is the greatest cause of death in almost all types of malignancies [1]. Multiple factors from the tumor and the host contribute to the formation and progression of distant secondary tumors [1, 2], and most of the mechanistic studies to date have mainly focused on the metastatic potential of tumor cells. It is believed that the metastasis of single cancer cells begins with the cells gaining the ability to migrate and invade. The cancer cells can gain motility in several ways, including epithelial-mesenchymal transition (EMT) and fusion of cancer cells to highly mobile bone marrow-derived cells [3, 4]. In the metastases formed by clusters of tumor cells, EMT may not be necessary [5]; however, the layer of endothelial cells enveloping the entire tumor cluster/embolus seems critical for the survival of tumor clusters [6].

The ability to identify cancer patients with a high risk of metastasis is essential in the era of precision medicine. In addition to applying clinicopathologic parameter combination, also known as clinical prognostic classifiers in some circumstances, molecular profiling based on high-throughput technologies is expected to allow for a more accurate and robust prognostic prediction of metastatic potential in patients. How to effectively analyze big data generated from high-throughput screening is an emerging issue for many bioinformaticians. We hypothesize that, with optimal weighting on the impact of each individual gene, a collection of key pro-metastatic genes could be useful to generate a prognostic tool to identify the metastatic potential of a specific tumor and novel signaling pathways underlying metastasis.

Main text

The increased investigation of cancer metastasis in recent years has identified over 200 pro-metastatic genes. In this review, we aim to identify a group of key pro-metastatic genes with in vivo functional evidence and reasonable clinical relevance for application to big data analyses.

Figure 1 summarizes the analytic procedure of this review. First, we carefully selected 285 genes from the literature through searching PubMed based on the following criteria: (1) author-provided evidence of promoting migration and/or invasion of cancer cells; (2) author-provided evidence of promoting metastasis in vivo using animal models; (3) when a gene has been reported as pro-metastatic in several articles, all articles reporting the link were reviewed, and the most convincing studies are listed as the key references in Table 1. In addition, we applied survival analyses as validation tests using the publicly available TCGA datasets (threshold = 0.05). For analyses of clear cell renal cell carcinoma (ccRCC), the mRNA expression data of 72 non-cancerous kidney tissues and 539 tumors [clear cell kidney carcinoma (KIRC) in the TCGA database] were downloaded. For analyses of hepatocellular carcinoma (HCC), the mRNA expression data of 50 non-cancerous liver tissues and 374 tumors [liver hepatocellular carcinoma (LIHC) in the TCGA database] were used. Normalization was performed using the DESeq method (Version 1.26.0). For each individual gene, the median expression level was used as a cut-off value to separate the patients into high and low expression groups. Genes were excluded if their elevated expression significantly associated with better patient prognosis in any patient cohort. Finally, 150 genes passed the tests and are listed in Table 1. Among them, 79 genes have significant prognostic values in the ccRCC patient cohort, 35 genes have significant prognostic values in the HCC cohort, and 23 genes have significant prognostic values in both cohorts.

Fig. 1
figure 1

A schematic illustration of the study design and findings

Table 1 The list of 150 pro-metastatic genes with clinical relevance and key references

Although different tumor types are believed to rely on different molecular mechanisms for metastasis, 23 common pro-metastatic genes have been identified in our analyses, associating with poor prognosis in both cancer types. Among them, we are most interested in 11 genes that are not only statistically significant in terms of prognostic impact but also associated with distinct overall survival curves in both cohorts, suggesting the genes’ profound biological impacts on tumor progression. For the other 12 genes, although their biological impact on tumor progression were found to be significant in log-rank tests in both cohorts, the survival curves of high versus low expression groups crossed at some time points. The 11 most interesting genes are BIRC5 (Survivin), CXCL1, CXCL8 (IL8), E2F1, ETV4, EZH2, MMP1, MMP9, MYB, PTTG1, and YBX1. Figure 2 shows the survival curves of patients with either ccRCC or HCC expressing these 11 genes. Our findings suggest that different tumor types may partially share some common metastatic mechanisms, therefore strengthening the rationale of applying the list of 150 pro-metastatic genes to big data analyses. Interestingly, 4 of these 11 genes encode secreted proteins, namely, CXCL1, CXCL8, MMP1, and MMP9, which are ideal pharmaceutical targets for blocking cancer metastasis.

Fig. 2
figure 2

The survival curves of two cohorts of cancer patients comparing the mRNA expression levels of 11 genes. The data were retrieved from The Cancer Genome Atlas (TCGA) database. The survival curves were plotted using the Kaplan–Meier method and compared using the log-rank test. Consistently, among all 11 genes presented in this figure, elevated gene expression levels significantly associate with shorter overall patient survival (P < 0.05) in both tumor types. ccRCC clear cell renal cell carcinoma, HCC hepatocellular carcinoma

Although not covered in this review article, emerging data regarding the regulatory roles of non-coding RNA in metastasis have linked different pro-metastatic genes to forming signaling cascades [79]. Further investigation into the roles of non-coding RNA in metastasis is warranted.

Conclusions

In summary, we present here a collection of 150 important pro-metastatic genes for big data analyses. We expect more key molecules to be identified and validated in the near future to be included in the list, thereby accelerating the efforts in preventing and treating cancer metastasis.