Background

The hereditary autosomal dominant polycystic kidney disease (ADPKD) is the most common monogenic disorder. ADPKD is a multi-systematic disease diagnosed by growing multiple cysts on kidneys. liver cysts and cerebral aneurysms are also the main clinical findings of disease [1]. ADPKD is genetically heterogeneous and results from mutations in at least two genes, Polycystic Kidney Disease-1 (PKD1) or PKD-2 [2]. These genes encode transmembrane proteins, Polycystin-1 (PC-1) and Polycystin-2 (PC-2) which form a functional complex [3]. This protein complex, similar to other proteins are affected in polycystic kidney diseases locate in primary cilia of epithelial and endothelial cells [4]. PC-1 known as a cell surface receptor and PC-2 is a cation channel and both of them play a critical role in controlling of signaling pathways related to proliferation, apoptosis, and cell polarities through Ca2+ homeostasis regulation [5]. In spite of numerous studies related to polycystins functions, their roles are poorly understood. Regarding this major limitation being sensible to recognize the underlying mechanisms, systems biology approaches with a holistic view of the molecular mechanisms of disorders, have the potential to overcome these limitations. These approaches with comprehensive interpretation, using high throughput data extracted from omics data, provide the opportunity to represent the behavior of networks and emerge new therapeutic strategies. Therefore, we re-analyzed the array dataset deposited by Song X et al. which was compared transcription profiling of all samples from PKD1 patients with normal tissue, and gene set enrichment analysis (GSEA) was performed [6]. But here, we have shown large-scale protein interaction networks. For deeply understanding of central genes that related with phenotypes of disease in each step, network and clustering analysis were carried out. These revealed some of the key genes, such as EDN1, EGFR, ARF6, FOXO1, and ITGB5 involved during disease. Pathways were identified with enrichment analysis with the notice on cysts size, from early to late steps. Moreover, for the purpose of assay the regulatory mechanisms of DE genes, microRNAs (miRNAs) and transcription factors (TFs) enriched with DE genes were predicted.

Methods

Microarray data and DE genes screening

Microarray dataset with accession number “GSE7869” from the Gene Expression Omnibus (GEO) database was extracted. The quality of transcriptomics dataset was measured by principal component analysis (PCA) through the ggplot2 package and prcomp function of R [7]. Using GEO2R a web tool of GEO, groups were compared to detect genes that are differentially expressed with cysts growth. Samples of normal tissues (n = 3), minimally cystic tissues (n = 5), small cysts (n = 5), and large cysts (n = 3) were compared based on during the time of disease progression, using Student’s t-test, respectively. Benjamini–Hochberg false discovery rate (FDR) was used for p-value correction. Genes were declared as differentially expressed, had an adjusted p-value less than 0.05.

Protein-protein interaction networks construction

The protein–protein interaction (PPI) networks were built with DE genes. For networks construction, CluePedia plugin version 1.5.2 [8] of Cytoscape software version 3.7.1 [9] was used. STRING database with confidence cutoff 0.80 was provided, for retrieving interactions [10]. Networks topology was investigated using the NetworkAnalyzer tool of Cytoscape [11]. “Molecular Complex Detection” (MCOD) plugin of Cytoscape detected modules, highly connected sub-networks, based on default settings [12].

Pathway enrichment analysis

Functional analysis of genes clustered with MCODE was done by Cytoscape ClueGO plugin version 2.5.2 [13]. Reactome [14] and KEGG (Kyoto Encyclopedia of Genes and Genomes) [15] databases were chosen for retrieving pathways. Bonferroni step down was applied for p-value correction, and signaling pathways with adjusted p-value ≤ 0.05 were determined.

miRNA and TF enrichment analysis

The microRNAs (miRNAs) and transcription factors (TFs), key regulators of genes, were predicted by Enrichr web server [16]. TargetScan microRNA 2017 and ChEA 2016 libraries were used for miRNA and TF enrichment analysis, respectively. Adjusted p-value less than 0.05 was considered as the significant threshold. The miRNAs with more targeted genes were selected.

Results

By microarray data analysis, differentially expressed genes were identified

The microarray dataset “GSE7869” which includes renal cysts in different sizes; small cysts (SC) less than 1 mm, medium cysts between 10 and 25 mm, and large cysts (LC) greater than 50 mm have been analyzed. Minimally cystic tissues (MCT) obtained from healthy parts of the renal cortex of PKD1 patients were considered as heterozygote samples. In quality assay step except medium cysts, the samples were segregated based on their states (normal tissue, minimally cystic tissue, small cyst, and large cyst), indicate the acceptable quality of this dataset (Fig. 1). Using GEO2R tool, we obtained 512, 7024, and 655 genes which are significantly variably expressed between normal vs. MCT samples, MCT vs. SC samples, and SC vs. LC samples, respectively (Additional file 1). Interestingly, these sets of DE genes have few overlapping genes (Fig. 2a).

Fig. 1
figure 1

The quality of the microarray dataset is satisfying. The Principle component analysis results of the GSE7869 dataset were shown the samples were separated appropriately

Fig. 2
figure 2

The overlapping of differentially expressed genes and protein–protein interaction networks. The protein- protein interaction networks were built with differentially expressed genes. b: Normal vs. MCT, c: MCT vs. SC, d: SC vs. LC)

Protein–protein interaction networks were constructed

The PPI networks with DE genes were constructed. Links between genes were selected based on activation, binding, post-translational modification, and inhibition interactions. PPI networks are shown small cyst growth phase is an important and complex step during the progression of the disease. 81, 2737, and 155 nodes (genes) are in PPI networks (normal vs. MCT, MCT vs. SC, and SC vs. LC), respectively (Fig. 2b–d). The MCODE application identified protein clusters in networks. These protein complexes and modules are highly interconnected subnetworks with the most effective genes. Network topology were measured based on the graph theory concepts such as degree, betweenness, and closeness centrality. The seed gene with the highest centrality is EDN1 in the early stage, normal vs. MCT comparison. Seed genes such as EGFR, ARF6, WWTR, SMURF2, TGFB2, and HSD17B8 are critical genes in the comparison of MCT with SC. FOXO1, EDN1, and ITGB5 are introduced as central genes in the late stage, SC vs. LC comparison. Some of these genes including EGFR and EDN1 have been recognized related to ADPKD in previous experimental studies [17, 18] and other genes are candidates for future studies. The genes are represented in Table 1.

Table 1 Top clustered genes in the PPI networks. The seed genes with the highest density in PP networks are shown

Pathway enrichment analysis was performed

Functional analysis was carried out based on genes detected by MCODE. Using pathway enrichment analysis from 18, 1318, and 66 genes, we reached to 7, 113, and 39 pathways, respectively (Fig. 3). Interestingly, the GoTerms are informative and related to the phenotype of each step, such as collecting duct acid secretion in early step. An interesting finding in this study was the detection of critical pathways and functions such as EGF, Wnt, MAPK, HIF, P53, CFTR, AMPK, PDGF, NFκB, IGF1, MET signaling, oxidative phosphorylation, energy metabolism, cell–cell and cell–matrix interaction, and signaling by interleukins which were previously shown to be associated with ADPKD in experimental studies [19,20,21], and other pathways could consider for more studies and validation.

Fig. 3
figure 3

Pathway enrichment analysis of clustered genes. Functional analysis showed interconnected and informative pathways mainly are associated with renal cystic growth (a Normal vs. MCT, b MCT vs. SC, and c SC vs. LC). The significance of pathways is labeled based on the color code. The number of mapped genes in each path is shown according to the size of nodes

miRNAs and TFs enriched with DE gene were determined

The miRNAs and TFs as important regulators of DE genes were predicted. HNF4A, ESR1, and RXR were defined as TFs in the initial step, in normal vs. MCT comparison. TFs were significant in the small and large cyst growth steps are shown in Table 2. The top miRNAs enriched with DE genes in each phase are shown in Fig. 4. Previous studies reported the association of ADPKD with some of TFs e.g. HNF4A, STAT3, VDR, TP53, and HIF1A [6, 20, 22, 23]. Also, the role of miR-17 family and miR-192 in cyst enlargement were identified [24, 25]. It is valuable to investigate other miRNAs and TFs in experimental studies.

Table 2 Transcription factor enrichment analysis
Fig. 4
figure 4

miRNA enrichment analysis results. The top of the miRNA were predicted. An adjusted p-value less than 0.05 was considered as the significant cut-off

Discussion

ADPKD caused by mutations in PKD1 or PKD2 genes [2] and the protein products of these genes, polycystin-1 and polycystin-2 act as a mechanosensor on the surface of epithelial and endothelial cells [4]. The loss and gain of function of these proteins, leading to dysregulation of pathways related to proliferation, apoptosis, and polarity of cells [5]. Despite many studies indicated the functions of the polycystins, the numerous ambiguities remain about the molecular mechanisms of the disease progression. For the importance of time series analysis of diseases [26], the purpose of this study was the computational analysis of the expression profile of renal cysts that were compared based on different sizes of cysts. Bioinformatics methods were performed in this study showed that 512, 7024, and 655 DE genes, respectively dysregulated in each step. The PPI networks were shown nodes and their interactions became more complex with the progression of disease in small cyst growth. The topology and clustering analysis of networks were employed for revealing candidate genes with high centrality as therapeutic targets. Nodes (genes) with high degree, they have many connections and are important for the networks. Betweenness centrality is based on the number of shortest paths going through a node and are shortcuts of the networks. Also, closeness centrality calculated physically nearest genes to all nodes [27]. Modules are high density regions in the network and identify functional genes [12]. The role of some of these genes has been well documented in ADPKD such as EDN1 as a vasoconstrictor may promote tumorigenesis and recent studies have documented that an increase in serum endothelin levels is associated with renal pathogenesis of ADPKD. Also, polymorphisms of EDN1 can influence the age of onset of end-stage renal disease in ADPKD [18, 28]. EGFR promotes cell growth, proliferation, and cell survival and has important functions in the progression of ADPKD [17]. Other genes introduced as applicant genes for future studies are ARF6, SMURF2, WWTR1, CACNB2, and FOXO1. ARF6 is a member of the RAS superfamily that regulates signaling pathways related to actin remodeling such as wnt path, the central pathway in ADPKD [26]. SMURF2 controls cell migration with BMP and TGFβ signaling pathways [29]. WWTR1 acts as a transcriptional coactivator downstream of the Hippo signaling pathway that plays a major function in the control of organ size [30]. Ablation of CACNB2 leads to calcium homeostasis derivation and could have a critical role in the initiation and progression of the disease. Previous studies showed that mutation in the PKD1 leads to higher glycolysis in ADPKD kidneys. FOXO1 through insulin signaling plays a main role in glucose metabolism and consequently involved in ADPKD pathogenesis [31, 32]. Also, ITGB5 contributes to cell adhesion and known as a biomarker in kidney disease [33]. The mechanisms of the newly introduced crucial genes such as PPIE remain to be identified with experimental studies. We pointed out TFs such as HNF4A, STAT3, VDR, TP53, and HIF1A associated with ADPKD [22, 23]. In addition, other TFs as CLOCK in ADPKD pathogenesis firstly are described in this study. Since CLOCK involved in kidney function, confirmation its role in ADPKD can get interesting results [34]. Functional analysis was shown that the pathways are correlated with the phenotype of disease in each step including pathways involved in cell proliferation, apoptosis, and inflammation. The roles of some of the pathways have determined in ADPKD pathogenesis [19, 20].

Conclusions

Here by computational tools we generate a systematic view of the ADPKD to explore the comprehensive molecular mechanisms of a monogenic disease. Methods employed in this study may also be used for each monogenic disorder to reach novel therapeutic targets. Also, the necessity of holistic maps assay of monogenetic disease besides complex disease is desired.