Key pathways and genes controlling the development and progression of clear cell renal cell carcinoma (ccRCC) based on gene set enrichment analysis

Background Clear-cell renal cell carcinoma (ccRCC) is one of the most common types of kidney cancer in adults; however, its causes are not completely understood. The study was designed to filter the key pathways and genes associated with the occurrence or development of ccRCC, acquaint its pathogenesis at gene and pathway level, to provide more theory evidence and targeted therapy for ccRCC. Methods Gene set enrichment analysis (GSEA) and meta-analysis (Meta) were used to screen the critical pathways and genes which may affect the occurrence and progression of ccRCC on the transcription level. Corresponding pathways of significant genes were obtained with the online website DAVID (http://david.abcc.ncifcrf.gov/). Results Thirty seven consistent pathways and key genes in these pathways related to ccRCC were obtained with combined GSEA and meta-analysis. These pathways were mainly involved in metabolism, organismal systems, cellular processes and environmental information processing. Conclusion The gene pathways that we identified could provide insight concerning the development of ccRCC. Further studies are needed to determine the biological function for the positive genes.


Introduction
Renal cell carcinoma (RCC) is one of the most common genitourinary malignancies, accounting for about 3 % of all cancers worldwide [1]. Clear cell renal cell carcinoma (ccRCC) is the most common histological type of renal cell carcinoma, also called conventional RCC, which represents 75-80 % of RCC. The male/female ratio is approximately 2:1 [2]. Initial treatment is most commonly a radical or partial nephrectomy and remains the mainstay of curative treatment [3]. Where the tumor is confined to the renal parenchyma, the 5-year survival rate is 60-70 %, but this is lowered considerably where metastases have spread. It is relatively resistant to radiation therapy and chemotherapy, although some cases respond to immunotherapy.
After the completion of the Human Genome Project, advances in microarray technology led to global gene expression profiling of ccRCC [4][5][6]. Microarray technology profiles the expression levels of thousands of genes simultaneously, providing a snapshot of transcript levels in the cells/tissues being studied; it is a powerful tool to study ccRCC. All microarray data are available from the Gene Expression Omnibus (GEO) public database at NCBI [7]. However, the large amounts of data acquired must be reduced or 'translated' to a smaller set of genes representing meaningful biological differences between control and test systems and validated in an experimental or clinical setting. It is a challenge to analyze such high information from microarray datasets to identify molecular pathways and key genes deregulated in ccRCC. To resolve this conflict, Subramanian describes a method gene set enrichment analysis (GSEA) which has been recognized as a breakthrough for the gene set and functional pathways analysis [8]. GSEA is a method that allows us to search the key genes and pathways associated with the occurrence and development of disease by analyzing diverse experimental datasets. However, the difference of platforms, the sample capacity and the standardization affect the result more or less. Metaanalysis of microarray data can be a better way to solve the problem of poor reproducibility and reliability [9]. In our study, after a standardized microarray preprocessing for all the expression datasets, GSEA and a meta-analysis were used to detect the mixing pathways and key genes which can provide the theoretical basis to the further perception of the biological mechanism of ccRCC.

Data collection
We used clear cell renal cell carcinoma as the keywords and set the limit of study type expression profiling by array and a species limit of humans to search GEO (http://www. ncbi.nlm.nih.gov/geo/) for the relevant gene datasets. Search results provided 2,762 identified datasets involved in ccRCC. Studies that met all of the following criteria were included: (1) the data were about genome-wide RNA expression; (2) the complete microarray raw or normalized data were effective; (3) the data provided a comparison of renal tissue between ccRCC patients and normal controls relatively; (4) datasets contained more than three samples; (5) the raw data were expressed as CEL files. Finally, there were three gene expression datasets which met the selection criteria (Table 1).

Gene set enrichment analysis
The category version 2.10.1 package was used to perform with GSEA. General statistical analysis and computing was processed in the R statistical programming language [10]. The Robust Multichip Averaging (RMA) [11] algorithm in the affy conductor package [12] was used for each affymetrix raw dataset to calculate background-adjusted, normalized and log2 probe set intensities. The only genes we selected should have been mapped to an explicit KEGG pathway for the purpose of analyzing the GSEA and metaanalysis further [13]. We performed pathway analysis of each dataset independently. The measure of variability was within the interquartile range (IQR) and a cut-off was set up to remove IQR values under 0.5 for all the remaining genes. If one gene was targeted for multiple probe sets, we retained the probe set with the largest variability. Genes in each pathway went through the Student's t test, and each pathway's p value was obtained in the permutation test with 1,000 times. The p value was not more than 0.05.

Meta-analysis
To obtain the differentially expressed genes from the remaining genes of each dataset above, meta-analysis was carried out in SAS 9.13. The following formula was applied to calculate chi-square value of each gene [14]: log e P i (K is the number of the datasets).
To calculate p value of each gene, and retained the genes with p \ 0.05. Significant genes were used to obtain the pathways of the KEGG from DAVID Bioinformatics Resources 6.7 (http://david.abcc.ncifcrf.gov/).

GSEA analysis
According to the inclusion criteria, we obtained three datasets in the end. Tissues used to extract the total RNA were matched pairs from clear cell renal cell carcinoma and normal tissue adjacent to renal cell carcinoma. The fuhrman grade of ccRCC was no more than three. Due to the samples used, genomic profiling were matched pairs. It reduced the influence of multiple factors on GSEA and Meta-analysis, ensuring the reliability of the obtained conclusion. The three inclusion datasets contained 40 ccRCC cases and 40 controls. GSEA method was used separately on each dataset to find the significantly changed genes and the significant co-pathways. After GSEA analysis, 8,506 significantly changed genes were screened out from the three gene expression microarray datasets. Overlap existed in the up-regulated and down-regulated pathways. There were fourteen mixing pathways including 206 up-regulated and 253 down-regulated pathways from three datasets. Detailed information about the analysis results is shown in Table 2.

Meta-analysis
To further identify the results above, meta-analysis was used to detect differentially expressed genes between two experimental groups. We got the p value for each gene based on unpaired t test. A total of 1,150 significant genes were detected (p \ 0.05). Furthermore, the Database for Annotation, Visualization and Integrated Discovery (DAVID) (http://david.abcc.ncifcrf.gov/) was utilized for the annotation of these genes. We imported the official gene symbols of 1,150 genes into the gene functional classification tool of DAVID. In order to identify biologically relevant molecular networks of these genes, KEGG (http://www.genome.jp/kegg/), a distinct pathway analysis tools of bioinformatics endowed with comprehensive knowledgebase was used. There were 1,038 genes identified by KEGG. In total, 48 KEGG pathways were detected. More details were shown in Table 3.
The results of GSEA and meta-analysis To search the intersection pathways, a comparative analysis was made subsequently between the significant common pathways of GSEA and meta-analysis. At last, 37 consistent pathways and significant genes (p \ 0.05) in these pathways were obtained. These pathways mainly concerned metabolism, organismal systems, cellular processes, environmental information processing, and human diseases. The details are shown in Table 4.

Discussion
Clear cell renal cell carcinoma is one of the most common types of kidney cancer in adult; however, its causes are not completely understood. The selection of differentially expressed genes and consistent pathways helps us to explore their underlying molecular mechanisms, thereby providing insights into biological function. Single genemarker-based approaches can fail to detect transcriptional programs that are distributed across an entire network of genes are yet subtle at the level of individual gene [15]. Genome-wide microarrays can locate gene families and pathways which show a consistent alteration in a disease state. Pathway analysis is a valid method to reduce a major deviation and can obtain interesting common genes and pathways by mixing differently expressed genes from different datasets.
Some studies have been published. The study of Tun et al. [16] used gene expression profiling of early-stage ccRCC combined with a comprehensive bioinformatics analyses to reveal the significant pathway and transcription factors which take effect in the development of ccRCC. Meanwhile, Maruschke et al. [17] used microarray expression analysis to determine 16 gene sets that distinguish expression profiles from grade 1 and grade 3 tumor tissues based on MSigDB data bank analysis. The two studies above were single dataset analysis. However, multimicroarray dataset analysis for the development of ccRCC was rare. This study uses three datasets based on a novel GSEA carried out by KEGG dataset and meta-analysis approach to identify the common significant genes and genetic pathways with p \ 0.05 associated ccRCC. And our findings suggest that most genes and pathways involved in ccRCC are the same according to their functional classification. In this study, we discussed several differentially expressed pathways and genes among crossing pathways which suggest the role of these pathways and genes in ccRCC based on their functional classification.

Metabolism pathways
The metabolism pathways in our study were predominantly focused on carbohydrate metabolism, lipid metabolism, energy metabolism, amino acid metabolism, metabolism of other amino acids, metabolism of cofactors and vitamins.    Table 4 Common crossing significant pathways and genes between the results of GSEA and meta-analysis           ccRCC is increasingly being recognized as a metabolic disease. Numerous studies have shown a significant association between body mass index, obesity and the development of kidney cancer [18]. In a case-control study from Iowa, diets richest in animal and saturated fats, oleic acid and cholesterol were associated with statistically significant increases in RCC (1.9-2.6-fold, depending on the factor) [19]. These metabolic abnormalities provide protection for the tumor but also may provide a source of vulnerability and therapeutic opportunity. Citrate cycle (TCA cycle) discovered in this study belongs to Carbohydrate metabolism; it is part of a metabolic pathway coupled to mitochondrial oxidative phosphorylation that converts nutrients to energy in aerobic cells. Recently, heterozygous germline mutations in fumarate hydratase (FH) or succinate dehydrogenase (SDH) of the TCA cycle have been shown to predispose individuals to tumors [20]. SDHB/C/D is the key gene in the Citrate cycle and oxidative phosphorylation pathway in our article. SDH is one of the seven known kidney cancer genes involved in pathways that respond to metabolic stress and/or nutrient stimulation [21]. Targeting the fundamental metabolic abnormalities in kidney cancer provides a unique opportunity for the development of more effective forms of therapy for this disease. Recently, early-onset renal tumors have been found to develop in individuals with germ line SDHB mutations [22,23]. In preclinical models, increased succinate has been shown to inhibit HIF prolyl hydroxylase and affect HIF stability [24]. HIF can strengthen the expression of vascular endothelial growth factor (VEGF), glucose transcript factor 1 (GLUT-1) and glycolytic enzyme in the downstream target genes; promote the generation of blood vessels and energy metabolism of cells; and possibly play an important role in the developing progress in the excessive expression of malignant tumor [25,26].

Cellular processes and cell communication
Focal adhesion pathway (Fig. 1) in our result belongs to cellular processes and cell communication. In cell biology, focal adhesions (cell-matrix adhesions or FAs) are specific types of large macromolecular assemblies through which both mechanical force and regulatory signals are transmitted. Cell-matrix adhesions play essential roles in important biological processes including cell motility, cell proliferation, cell differentiation, regulation of gene expression and cell survival [27]. Tumor epithelial and endothelial cells require attachment to the extracellular matrix (ECM) for survival; on loss of adhesion, they undergo anoikis [28,29]. Quinazoline-based drugs trigger anoikis in renal cancer cells by targeting the focal adhesion survival signaling. This potent antitumor action against human RCC suggests a novel quinazoline-based therapy targeting renal cancer [30]. VEGFA/B/C (marked by red stars in Fig. 1) is significant gene in focal adhesion pathway. The role of VEGF in particular has been explored as a key factor in the pathogenesis of RCC. VEGF functions to increase vascular permeability, induce endothelial cell proliferation and migration, and promote endothelial cell survival [31]. Furthermore, VEGF receptor expression has been observed in RCC cells, suggesting that VEGF may also serve as an autocrine stimulus in RCC [32]. The high VEGF expression in RCC is the direct result of inactivation of the Von Hippel-Lindau tumor suppressor gene (VHL). Data suggest that VHL inactivation occurs in the majority of ccRCC [33]. Therapeutic targeting of VEGF in RCC has strong biologic rationale. Substantial clinical activity has been reported in clinical trials with VEGF-targeting agents [34,35]. Further investigation is needed to optimally use these agents for maximal clinical benefit.

Environmental information processing
Extracellular matrix-receptor interaction and cell adhesion molecule (CAM) pathways were in this classification. They are all about signaling molecules and interaction. The extracellular matrix (ECM) consists of a complex mixture of structural and functional macromolecules and serves an important role in tissue and organ morphogenesis and in the maintenance of cell and tissue structure and function [36]. There is close connection between ECM-receptor interaction and focal adhesion pathway, which is also a significant pathway. At the cell-extracellular matrix contact points, specialized structures are formed and termed focal adhesions, where bundles of actin filaments are anchored to transmembrane receptors of the integrin family through a multi-molecular complex of junctional plaque proteins [37]. There is increasing evidence that certain integrins associate with receptor tyrosine kinases (RTKs) to activate signaling pathways that are necessary for tumor invasion and metastasis [38]. Zhou et al. [39] found that multiple canonical cancer-associated signaling pathways including focal adhesion, cell cycle and ECM-receptor interaction were significantly more likely to be disrupted in ccRCC than expected by chance. This is consistent with the results of our study. CD47 is a key gene in ECM-receptor interaction pathway, which is involved in the increase in intracellular calcium concentration that occurs upon cell adhesion to extracellular matrix. As has been found by other investigators, CD47 over expression may be associated with ferric nitrilotriacetate-induced renal cortical tubular damage and regeneration that lead to a polycystic state, and with tumor progression and metastasis of the induced RCCs [40].

Other pathways and genes
Human diseases and organismal systems are the two remaining classifications associated with ccRCC. Pathways such as type I diabetes mellitus, epithelial cell signaling in helicobacter pylori infection, bladder cancer, systemic lupus erythematosus and so on all belonged to the classification of human diseases. They are mainly about endocrine and metabolic diseases, neurodegenerative diseases, infectious diseases, cancers, immune diseases and cardiovascular diseases. Some diseases above belong to endocrine or immune system organic system in the classification of organismal systems. Most of genes in these pathways can be enriched in the above pathways. HLA-DQB1 appears in 8 pathways in human diseases, organismal systems and environmental information processing classification is an important gene for ccRCC, and this has widely been reported in the literature. Patients with RCC whose tumors did not express HLA-DQA1 or HLA-DQB1 molecules demonstrated poor clinical response [41]. EGFR and VEGFA/B/C expression in human disease pathways play an important regulatory role in tumor angiogenesis, invasion and metastasis. Based on the EGFR/VEGF target in the treatment of cancer is the hot spot in drug research [42].

Conclusion
The pathogenesis of ccRCC is quite complicated. It is effective to identify differentially expressed genes and deduce their underlying molecular pathways based on gene set enrichment analysis and meta-analysis. The significant genes and pathways were mainly focused on metabolism, cellular processes and cell communication, environmental information processing, human diseases and organismal systems. They may have some connections with ccRCC. Furthermore, we verified some of the results by searching the literature in the discussion section. The conclusion is relatively reliable and can be used to guide further study. Further experiments are needed to verify specific links between these results and ccRCC.
Conflict of interest The authors declare that they have no conflicts of interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.