Exploring the Molecular Mechanism and Biomakers of Liver Cancer Based on Gene Expression Microarray

Liver cancer is one of the most common cancers worldwide with high morbidity and mortality. Its molecular mechanism hasn’t been fully understood though many studies have been conducted and thus further researches are still needed to improve the prognosis of liver cancer. Firstly, differentially expressed genes (DEGs) between six Mdr2-knockout (Mdr2-KO) mutant mice samples (3-month-old and 12-month-old) and six control mice samples were identified. Then, the enriched GO terms and KEGG pathways of those DEGs were obtained using the Database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.abcc.ncifcrf.gov/). Finally, protein-protein interactions (PPI) network of those DEGs were constructed using STRING database (http://www.string-db.org/) and visualized by Cytoscape software, at the same time, genes with high degree were selected out. Several novel biomarkers that might play important roles in liver cancer were identified through the analysis of gene microarray in GEO. Also, some genes such as Tyrobp, Ctss and pathways such as Pathways in cancer, ECM-receptor interaction that had been researched previously were further confirmed in this study. Through the bioinformatics analysis of the gene microarray in GEO, we found some novel biomarkers of liver cancer and further confirmed some known biomarkers.


Introduction
Liver cancer is one of the most common malignancies. It has a high morbidity and mortality, especially in sub-Saharan Africa and eastern Asia. The incidence of liver cancer has doubled or even more in the past 15 years [1]. However, the molecular mechanism of liver cancer is still largely unknown. For above reasons, an increasing number of researches on liver cancer have been conducted in recent years. Different molecular mechanism and various biomarkers related to liver cancer have been identified. Through qRT-PCR and Western blotting, JianXin et al. [2] have inferred that GOLPH3, which has higher expression level in gene and protein level of liver cancer patients compared with that of the normal population, is a new biomarker for liver cancer. Mah et al. [3] have found that the inflammation-related pathway NFkB plays an important role in liver cancer by analyzing the methylation profile of 59 liver cancer patients. Despite a great number of previous researches, molecular mechanism of liver cancer has not been fully grasped. Hence, further researches of molecular level, such as researches of gene or protein, are still needed to find out new molecular mechanism or biomarkers in an effort to improve the prognosis, diagnosis and treatment of liver cancer.
Mdr2-knockout (Mdr2-KO) mice lack the liver-specific Pglycoprotein responsible for phosphatidylcholine transport across the canalicular membrane, which may result in dysfunctional phospholipid secretion [4]. Signs of inflammation are accompanied by an increase in plasma transaminase levels and followed by enhanced connective tissue storage and fibrosis progression. As a consequence of chronic inflammation and progressing fibrosis, Mdr2-knockout mice may develop liver cancer between the ages of 12 and 15 months [5].
With the rapid growth of microarry and its implication in cancer research, a lot of genes that are related to cancers (including liver cancer) have been verified. For example, Yang et al. [6] have found that Gα12 is an important therapeutic target for liver cancer through cDNA microarray analysis. Xu et al. [7] have verified, through microarray and RT-PCR technology, the role of CXCL5 in liver cancer migration and invasion.
In this research, by analyzing the gene expression microarray of liver cancer in GEO database, we further confirmed the molecular mechanism and some biomarkers of liver cancer that had been investigated previously. Moreover, genes, which had not been researched but also had a great importance to liver cancer, were also included in this research. Also, most of the enriched GO terms and KEGG pathways of those genes were related to liver cancer, especially cell cycle, immune response, inflammatory response, pathways in cancer, MAPK signaling pathway, Cell adhesion molecules and etc. In conclusion, our finding can improve our understanding of liver cancer and provide potential therapeutic targets for further studies.

Gene Expression Microarray Data
In this study, the gene expression microarray data set GSE4612 was downloaded from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/). GSE4612 [8] is a gene expression profile data including six Mdr2 knockout (Mdr2-KO) mutant mice samples(3-month-old and 12-monthold) and six control mice samples(3-month-old and 12-monthold). The platform of this microarray data is GPL339 [MOE430A] Affymetrix Mouse Expression 430A Array.

Preprocessing of the Microarray Data
Unwanted noise of the raw microarray data was filtered out in the preprocessing stage. The normalization of raw data and background correction was conducted via affy [9] package in R. Moreover, multiple probes that corresponded to one gene symbol were summarized-taking the average expression values of those probes as the expression value of this gene. There were a total of 22,690 probes in the microarray and 13,687 gene symbols that had no duplicate before and after preprocessing.

Get the Differentially Expressed Genes
After the preprocessing, the critical step was to get the differentially expressed genes (DEGs) between the case samples and the control samples. The tool used in this study was the limma [10] package in R. t-test was conducted on the gene expression values between case samples and control samples and the genes with P value<0.05 and |log 2 (fold change)|>1 were selected out. According to those criteria, in the first step, the DEGs between the case samples and the control samples were selected out from 3-month-old mice and 12-month-old mice respectively, then the overlapped genes between those two list DEGs were selected out. The heatmap of the overlapped DEGs was obtained through gplots package in R to visualize their expression value in different samples.

GO Enrichment and KEGG Pathway Analysis of the DEGs
After getting the DEGs, GO enrichment and KEGG pathway analysis of the DEGs were conducted. Here, the tool used in this study was DAVID (http://david.abcc.ncifcrf.gov/) (Database for Annotation, Visualization and Integrated Discovery). It could be used to do functional annotation for a list of genes, gene functional classfication or gene ID conversion. In this study, the module used in this study was the functional annotation. First, we submitted the DEGs list into the database and selected Mus musculus in species column. Finally, the GO terms and the KEGG pathways with P value smaller than 0.05 and at least five genes were selected out as the enriched function of DEGs.

Construct the PPI Network of DEGs
To further investigate the molecular mechanism of liver cancer, PPI network of the DEGs was constructed through STRING database (http://www.string-db.org/). STRING is a database that infers the interaction between genes through analyzing the genomic data that comes from different sources, such as high-throughput experiments, coexpression data and the previous data and etc. Also, it has a unique scoring framework which assigns the interaction an integrated score to represent its confidence through combining the score of the different sources. Here, we selected the gene-gene interactions, whose integrated scores were bigger than 0.4 (the default threshold in the STRING database), to construct the PPI network and Cytoscape [11] was used for visualization. To select core genes (the genes that might be more likely involved in liver cancer) from PPI network, we analyzed the topological structure of the network and obtained the degree (the number of genes that directly interact with the gene) of each gene. Here, we selected the genes whose degree is beyond 10 as the core genes in the network.

Differentially Expressed Genes (DEGs)
There were 1898 DEGs in the 3-month-old mice and 864 DEGs in the 12-month-old mice between the case samples and control samples. A total of 380 overlapped DEGs between those two DEG lists were identified. From the heatmap (Fig. 1), we could get that the gene expression of Mdr2 knockout samples were distinguished from the control samples, meanwhile, the gene expression of 3-month-old samples were distinguished from the 12-month-old samples, indicating that obvious differences existed in these groups.

Enriched GO Terms and KEGG Pathways of DEGs
In this study, a total of 128 enriched GO terms and 23 KEGG pathways were obtained. The top 10 enriched GO terms of the DEGs according to P value were shown (Table 1). Table 1 indicated that the main enriched GO terms was the biological process of cell, such as cell adhesion, regulation of cell growth, regulation of cell cycle. Besides the cell biological process, there were also some enriched GO terms related to immune response, inflammatory response and etc. The enriched KEGG pathways of the DEGs were shown in Table 2. A few enriched KEGG pathways were directly related to cancer, such as Pathways in cancer, Small cell lung cancer, Bladder cancer. What's more, it was possible that other pathways had an important influence on the progression of cancer via some biological process, such as Toll-like receptor signaling, EMC-receptor interaction, MAPK signaling pathway and etc. The KEGG pathways and their corresponding gene number were shown in Fig. 2.

PPI Network of the DEGs and Core Genes in the PPI Network
The PPI (Fig. 3) network contained 244 nodes and 1053 edges. The nodes represented the DEGs and the edges represented the interactions between the DEGs. A great number of genes of higher degree, which were the core genes in the PPI network, might relate to liver cancer more closely. The core genes and their corresponding degree were shown in Table 3. Among those core genes, Ctss and Tyrobp had the highest degree and there were 28 genes whose degree was beyond 20.

Discussion
Although researchers have made considerable efforts in disclosing the mechanisms of liver cancer,current understanding of the genetic alterations associated with the progression of liver cancer has not yet to be elucidated. In this study, we conducted genome-wide gene expression analysis by a high throughput method to identify the DEGs from liver cancer compared with normal liver tissues. Here, a total number of 380 overlapped DEGs from original dataset of two groups (3month-old group and 12-month-old group) were identified, including 289 overexpressed genes, 66 down-regulated genes and 25 genes that had contradictory expression trend. GO analyses revealed that the significant ontology categories included immune response, cell adhesion, inflammatory response and so on. Immune effector process, nuclear division, cell division, mitotic cell cycle and positive regulation of cellular component organization were obviously overrepresented in the up-regulated genes according to the functional enrichment analysis. In the immune response, for example, TLR2 could enhance ovarian cancer stem cell self-renewal and eventually promote tumor repair and recurrence [12]. ICAM-1 is a transmembrane glycoprotein in the immunoglobulin superfamily, which participates in oral cancer progression and induces macrophage/SCC-cell adhesion [13]. Ciftci et al. [14] indicated that serum TGFB1 level might be elevated in breast cancer patients and had a favorable prognostic value. CDH1, involved in cell adhesion, can code the adhesion protein E-cadherin that plays a central part in the process of epithelial morphogenesis [15]. CCL5 belongs to the CCchemokine family and plays a pivotal role in the invasion and metastasis of human cancer cells. Huang et al. reported that CCL5 stimulation could increase lung cancer migration   [16].
DEGs were then used in KEGG pathway analyses and 23 pathways were screened out, such as Cell adhesion molecules, Toll-like receptor signaling, EMC-receptor interaction, MAPK signaling pathway and etc. Previous researches reported that most of these pathways were involved in cancer progression. The immune system played a critical role in body defense system, and the dysfunction of immune system might result in cancer. Stimulation of various Toll-like receptors induced specific patterns of gene expression, which resulted in the activation of innate immunity and the development of antigen-specific acquired immunity [17]. Moreover, MAPK signal molecules participated in the amplification and specificity of the transmitted signals that finally activated a number of regulatory molecules in the cytoplasm and the nucleus to initiate cellular processes such as proliferation, differentiation, and development [18].
Furthermore, the topological structure analysis of PPI network suggested that Ctss, Tyrobp, Vim, Cdk1 were the top 4 core genes, which might be potential therapeutic targets for future research. Cathepsin S (Ctss), a key enzyme in major histocompatibility complex class II (MHC-II) mediating antigen presentation, might be involved in malignant progression of lung cancer [19]. CD47 positive liver cancer cells preferentially secreted cathepsin S (CTSS), which regulated liver tumor-initiating cells through the CTSS/protease-activated receptor 2 (PAR2) loop [20]. Shabo indicated that Tyrobp (DAP12) in breast cancer was associated with an advanced tumor grade and higher rates of skeletal and liver metastases [21,22]. Costa reported that Vim could associate with GDF15 and TMEFF2 to predict bladder cancer [23].
Overall, with a microarray data set from the GEO database, a range of DEGs were obtained in liver cancer and normal tissues. These genes might be functionally relevant to pathogenesis of liver cancer. Functional analysis revealed mitotic cell cycle, proteinaceous extracellular matrix and MAPK signaling pathway participated in biological processes as the significant items for liver cancer. These results could provide a valuable data base for further investigation of liver cancer research. Of course, further experiments are still needed to further confirm the potential function of these genes.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.