1 Background

Polycystic ovary syndrome (PCOS) is a metabolic and reproductive disorder that affects between 4 and 18% of reproductive-age women [1] and in the general population, it is estimated around 21.27% [2]. Insulin resistance, hormonal imbalances, and metabolic disorders are common symptoms of PCOS, which increase the risk of type 2 diabetes mellitus (T2DM), cardiovascular disease (CVD), and infertility [3] and affect the quality of life [4]. According to Rotterdam and Androgen Excess Society criteria, the prevalence of PCOS in India is 22.5% and 10.7%, respectively. About 52.6% of women were observed with mild PCOS as a typical phenotype [5]. PCOS is a multifactorial disorder with a genetic component, even though the exact cause is unknown. Compared to prevalence in the general population, approximately 20–40% of first-degree female relatives of women with PCOS develop PCOS themselves [6]. However, up to 70% of women with PCOS are undiagnosed [7]. Indeed, due to its apparent similarity with many other pathologies, including Cushing's syndrome, obesity, congenital adrenal hyperplasia, ovarian and adrenal neoplasms, and its optimal diagnosis is often hampered [8].

One of the most common symptoms of PCOS is androgen hypersecretion. It is known as hyper-androgenism, and it's the second most common symptom of PCOS. This condition affects anywhere from 17 to 83% of women [9]. The heterogeneity and complexity of PCOS make conventional approaches to understanding the disease, such as finding a specific gene or pathway, ineffective. Numerous studies have revealed several potential genes, proteins, and metabolites implicated in the pathogenesis of PCOS using various methods such as genomics, transcriptomics, metabolomics, and bioinformatics [10,11,12,13]. Due to the complexity of PCOS, candidate gene techniques are inadequate to comprehend its molecular function. A systems biology method, which combines experimental and computational biology to better understand complex biological systems, could investigate several interacting genes and their products that contribute to PCOS. The data produced by experimental methods and available in databases and publications are analyzed and integrated using computational systems biology. In the case of PCOS, one of the first studies to use a computational approach was published in 2009, when researchers built a protein network from seven transcriptomics data to understand better the disease's mechanism [14, 15]. Recent study findings have indicated that genes (APCO3, ADCY2, C3AR1, HRH2, GRIA1, MLNR and TAAR2) played a crucial role in the formation and progression of PCOS and that microarray data may be used to identify new biomarkers and therapeutic targets for PCOS [16].

In this research study, the differentially expressed genes (DEGs) were investigated using Gene Expression Omnibus (GEO) data and bioinformatics analysis tool. The feature and pathway enrichment analysis for DEGs were then examined. We also developed a gene interaction network for the DEGs and identified major signaling pathways and genes associated with them. Furthermore, since the interaction between genes and signaling networks has played a vital role in PCOS production and progression, the interaction was developed to investigate further the relationship between genes and signaling networks in PCOS. Overall, our systematic research may provide insights to explore the molecular basis of developing risk of T2DM in PCOS women.

2 Methods

2.1 Microarray data

GEO (http://www.ncbi.nlm.nih.gov/geo) is a public data repository in functional genomics for high-throughput gene expression data, chips, and microarrays [17]. One gene expression dataset [GSE8157] was retrieved from GEO. Based on the platform's annotation information, the probes are converted into appropriate gene symbols. Muscle PCOS pioglitazone, muscle PCOS after pioglitazone, muscle PCOS control, and muscle PCOS case were among the 43 samples in the GSE8157 data collection.

2.2 Identifying DEGs

The DEGs were identified using GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/), an R-based web application available in the GEO database [18]. The DEGs were calculated using |logFC|≥ 1.0 and a t test with a p < 0.05 significance level.

2.3 Gene ontology (GO) enrichment and KEGG pathway analysis of the DEGs

The DAVID tool was used to perform comprehensive analysis and visualization of a functionally enriched set of genes [19]. The GO terms and Kyoto encyclopedia of gene and genome (KEGG) pathways were combined to develop the functionally organized GO/pathway term network. Statistical test-enrichment, correlation test-Bonferroni step-down, and p ≤ 0.05 parameters were set for protein/gene list enrichment analysis.

2.4 Construction of gene interaction network

NetworkAnalyst (http://www.networkanalyst.ca/), a multifunctional online software, was used to analyze the DEGs and construct the visualized signal network based on the interaction source from SIGnaling 2.0 (SIGnaling Network Open Resource) and also used to analyze path and module investigations, as well as protein–drug interactions.

Further, a gene interaction network of seed genes consisting of their direct neighbors in the network was constructed. The Cytoscape 3.3.0 tool [20] was used to visualize these gene interaction networks, and the network node attributes were calculated using default parameters. An extended giant network was constructed from the DEGs under consideration and also sub-networks. This study aimed to explore a systems-level of mechanism in the pathway that links PCOS to T2DM. For the topological study, nodes in the giant network with high betweenness centrality (BC) values and metrics relating to network theory were considered.

2.5 Identification of hub proteins

The Cytohubba is a Cytoscape plugin used to find the gene interaction network's hub genes and generate the subnetwork from the giant network. This tool provides 11 topological analysis methods, where we employed five methods (degree, radiality centrality, closeness centrality, BC and bottleneck) to identify hub or bottleneck genes in the network [21].

2.6 Centrality analysis of protein interaction network

Centrality criteria such as connectivity degree (D), BC, and closeness centrality (CC) were used to evaluate the nodes of the giant network and subnetwork, the first two centrality measures of the above are basic in network theory [22, 23]. A node with a high BC significantly impacts the network's flow, and it plays an important role in detecting bottlenecks. The other parameters like average clustering coefficient, mean shortest path length, neighbourhood centrality distribution, closeness centrality and diameter are also used to characterize a network [22]. The average degree represents the mean of all degree values of nodes in a network. We use the NetworkAnalyser plugin in Cytoscape software [24] to characterize the node parameters and network measurements.

2.7 Subnetwork with all the shortest paths

Few pairs of candidate genes, even in a giant network, are not directly connected, thus leading to the creation of sub-networks. Here, the sub-network of the genes associated with PCOS linked with T2DM was constructed. All the shortest paths between every pair of candidate gene(s) directly or indirectly connected were considered for calculation using NetworkAnalyzer.

2.8 Backbone network based on high BC values

The proteins with high BC and connectivity between them are involved in constructing the backbone network. In this study, 5% of the high BC values from the total nodes of the giant network were set as a critical point [25, 26]. Usually, the centrality of the nodes in a network is measured using BC, and they constitute most of the shortest paths in the network. The communication among all other nodes in the network will be controlled and functionally monitored by these backbone network nodes.

3 Results

3.1 Identification of DEGs

The GEO2R method was used to identify DEGs from the data set that revealed AMPK, a pathway linking PCOS to diabetes. Based on the preliminary data, the inclusion criteria were considered p < 0.05 and |logFC|≥ 1.0 (Fig. 1). A total of 339 DEGs were identified that constituted extended network and subsequent giant network extracted from extended network is composed of 318 nodes connected via 340 edges. Out of 339 DEGs from extended network, 18 were up-regulated and 321 were down-regulated genes after the analysis of GSE8157 (Fig. 2).

Fig. 1
figure 1

Volcano map of all DEGs, screening criteria: P < 0 .05 and |logFC|≥ 1. Red and blue colour represents up-regulated and down-regulated DEGs, respectively. FC fold-change, DEGs differentially expressed genes

Fig. 2
figure 2

Heatmap plot displayed in a grid where each row represents a gene and each column represents a sample considered in differentially expression of genes detected in comparison of control and case samples of PCOS women using BART-bioinformatics array research tool

3.2 GO functional enrichment and pathway analysis

The DAVID tool was used to perform GO functional enrichment analysis on both up-and down-regulated DEGs. The histograms of top GO functional enrichment analyses of down-regulated genes in DAVID (standard cut-off p < 0.05) are shown in Fig. 3. The up-regulated DEGs are predominantly enriched in cell shape, cytoskeleton structure, and actin cytoskeleton organization in biological process analysis. Down-regulated genes mainly were involved in biological processes (Fig. 3A and Additional file 1: Table S1) like negative regulation of transcription, DNA-template, multicellular organism development, cellular response to DNA damage stimulus, cell surface receptor signaling pathway, axon guidance, glucose homeostasis, smoothened signaling pathway, negative regulation of cell growth, intracellular receptor signaling pathway, and phospholipase C-activating G-protein coupled receptor signaling pathway. The up-regulated DEGs are related to cellular components are mainly enriched in the ruffle. Down-regulated DEGs (Fig. 3B) are related to adherens junction, chromatin, stress fiber, Golgi stack, an integral component of the plasma membrane and cell projection (Additional file 1: Table S1).

Fig. 3
figure 3

The results of top down regulate DEG’s GO functional enrichment study. A biological process, a cellular component, a molecular function, and KEGG pathways are all examples of biological processes

Furthermore, up-regulated DEGs are primarily enriched in poly(A) RNA binding in molecular function (MF) analyses, which refers to binding to a sequence of adenylyl residues in an RNA molecule, such as the poly(A) tail, a sequence of adenylyl residues at the 3' end of eukaryotic mRNA. The down-regulated DEGs are associated with molecular functions (Additional file 1: Table S1) like RNA polymerase II transcription factor activity, ligand-activated sequence-specific DNA binding, p53 binding, acrosin binding, RNA polymerase II core promoter proximal region sequence-specific DNA binding, transferase activity, transferring glycosyl groups, GKAP/Homer scaffold activity, transcriptional activator activity, RNA polymerase II core promoter proximal region sequence-specific binding, protein binding, steroid hormone receptor activity, and actin filament binding (Fig. 3C).

3.3 Enrichment analysis of KEGG pathways

The up-regulated DEGs had no significant pathways enriched. Still, the down-regulated DEGs enriched the AMPK signaling pathway, alanine, aspartate, glutamate metabolism, neuroactive ligand-receptor interaction, butanoate metabolism, and adipocytokine signaling pathways. The findings of the KEGG pathway enrichment analysis of down-regulated genes (DAVID bar diagrams) are shown in Fig. 3D.

3.4 Construction of gene interaction network

The giant DEGs gene interaction network generated by NetworkAnalyst consists of 318 nodes with 340 edges (Fig. 4). The centrality parameter study of each node from giant networks, including degree (D), BC, and CC, is illustrated in Additional file 1: Table S2. The largest degree and high BC in the giant network were 74 and 0.565, respectively. The giant network is characterized by a limited number of strongly connected nodes, and some nodes have relatively few requirements, as is typical of biological networks.

Fig. 4
figure 4

Extended interactome of gene interaction network from DEGs in a PCOS women with risk of T2DM. Seed proteins in network are highlighted in orange color and corresponding nodes connected with seed proteins in green color

3.5 Selection of hub genes

Here, Cytohubba was used to predict the hub proteins, based upon the five classical methods of Cytohubba, the top 10 hub proteins selected by ranked methods in Cytohubba (Additional file 1: Table S3). Finally, two central genes were identified by overlapping the first ten genes, as shown in Fig. 5. The AR (Androgen receptor) and STK11 (Serine/threonine-protein kinase) are selected as hub genes based on five ranked methods.

Fig. 5
figure 5

Overlapping of top ten genes using betweenness, degree, closeness, bottleneck and radiality of cytoHubba resulted to identify two hub genes

3.6 Key nodes of the backbone network

The backbone network for the signaling network was constructed from 16 nodes with a high BC value (Table 1). In the backbone network, AR is located at the center with the BC value, which controls the information flow in the backbone network (Fig. 6). AR has 15 first neighbors like STK11, DYRK1A, TWIST2, TP73, MYCN, THRA, NFYA, NR4A3, PDX1, FST, SMARCA4, SIRT5, NRF1, REST, and RAD51. AR plays a crucial role in the origins of PCOS. Identifying and confirming the locations of AR-mediated actions and the molecular mechanisms involved in PCOS development is critical to providing the knowledge required for the future development of innovative, mechanism-based interventions for PCOS treatment.

Table 1 List of genes in the backbone network
Fig. 6
figure 6

The backbone network's topology constructed based on 11 nodes with a high BC value, where sizes of nodes are proportional to their BC values

4 Discussion

Genomic analysis, which is the study of the structure, function, and expression of genes in an organism, is one of the methodologies used to explain the molecular basis of PCOS and its consequences. We used published gene expression profile data and a bioinformatics analysis tool to investigate DEGs in PCOS skeletal muscle. Our systematic analysis would help to understand the molecular complications of PCOS associated with diabetes. This study is complemented by knowledge of PCOS-related disorders to the PCOS pathway network to establish the mechanistic interactions between PCOS and other diseases.

The etiopathology of PCOS has not been fully understood despite a vast amount of research being progressed and to date, no effective systemic or targeted therapy exists. A large amount of data from transcriptomic or genome-wide associated studies on PCOS patients are publicly available. These data profiles can be used to comprehensively understand the pathophysiology of PCOS and its accelerated risks in the patients. An integrated GEO analysis and systems biology approaches analyze gene expression data extracted from the microarray or RNA-seq methods. In the present study, the R software-based GEO2R is used to analyze differentially expressed genes. The GEO2R uses GEOquery and limma R package (from Bioconductor project) to compare original submitter-supplied processed data tables. Here, the gene expression profile of GSE8157 was analyzed by using a wide variety of bioinformatics methods. We explored the DEGs in the skeletal muscle of women with PCOS regulating a common metabolic abnormality and leading to increased risk of T2DM. We identified a total of 339 DEGs between the PCOS cases and control samples, in which 18 genes were up-regulated and 321 were down-regulated. A series of bioinformatics tools were used for this data analysis to predict the key genes and molecular pathways associated with the PCOS linking to the risk of T2DM.

The GO analysis has shown that the up-regulated genes mainly participate in the biological process like regulation of cell shape (GO:0008360; genes: FGD6, BRWD1, FMNL3, MYH9), cytoskeleton organization (GO:0007010; genes: FGD6, BRWD1, FMNL3) and actin cytoskeleton organization (GO:0030036; genes: FGD6, FMNL3). The cellular components (CC) enriched include ruffle membrane (GO:0001726; genes: FGD6, MYH9). The molecular functions (MF) enriched was poly(A) RNA binding (GO:0003723; genes: IMP3, PSIP1, MAGOHB, MYH9). However, the down-regulated DEGs were mainly found in the biological process, like glucose homeostasis, negative regulation of cell growth, intracellular receptor signaling pathway, and phospholipase C-activating G-protein coupled receptor signaling pathway. Recent studies have primarily focused on the expression, quantification, and genetic polymorphisms of PCOS and have built a considerable argument that abnormal PCOS is linked to diabetes and infertility; however, few studies have given direct proof. Anovulation, hyperandrogenism, and insulin resistance are all symptoms of PCOS. Hyperinsulinemia has been linked to an increased risk of cardiovascular disease and the progression of T2DM. T2DM is manifested by hyperglycaemia caused by insulin resistance, which results in impaired glucose uptake and utilization. Insulin resistance can be found in the liver, skeletal muscle, and adipose tissue. Skeletal muscle, in particular, loses its metabolic versatility, making it difficult to switch between glucose and fatty acid use [27]. Recent evidence suggests that broken-down fatty acid oxidation is a contributing factor in insulin resistance in muscles [28]. By 2025, a global agreement has been reached to halt the increase in diabetes and obesity. Diabetes affects approximately 422 million people worldwide, most of whom live in low- and middle-income countries, and diabetes is directly responsible for 1.6 million deaths per year. Over the last few decades, both the number of cases and the incidence of diabetes have gradually increased. (who.int).

The KEGG pathway analysis also indicated that the DEGs were mainly enriched in the AMPK signaling pathway; alanine, aspartate and glutamate metabolism; neuroactive ligand-receptor interaction; butanoate metabolism; and adipocytokine signaling pathway. While AMPK is commonly considered an energy sensor, recent research has established fructose 1,6-bisphosphate as an AMPK metabolite regulator [29]. For T2DM, AMPK activation in response to exercise has a huge advantage [30]. Therapeutic agents that resolve insulin resistance have gotten a lot of publicity for the same reason. The thiazolidinediones (TZDs) and metformin are two primary insulin-sensitizing agents that have been developed. Both drugs work by activating AMPK and thus bypassing insulin signaling [31]. AMPK regulates the downstream kinases glucose-6-phosphatase (G-6-Pase) and phosphoenolpyruvate carboxykinase (PEPCK), influencing gluconeogenesis and alleviating diabetes. AMPK can also enhance IR by regulating glucose transporter 4 and free fatty acids [32]. Understanding the entire signal transduction pathway involving AMPK in skeletal muscle may lead to major pharmacologic improvements in managing and treating T2DM. Further, an increase in adipocytokine is also found an essential role in PCOS pathophysiology [33].

The gene interaction network of DEGs is used to explore the underlying biochemical processes and interaction pathways related to insulin resistance in PCOS women, which may lead to a risk of T2DM. The background network was further constructed by the genes like AR, STK11, PDX1, MYCN, DYPK1A, FST, SFYA, RAD51, SIRT5, REST, THRA, NRF1, TWIST2, NR4A, TP73, SMARCA4, SIRT5, and RAD51. Further, the hub genes (top 10) in the network were identified based on the five ranking methods of Cytohubba. The overlap of results from all the five methods resulted from two central genes, such as AR and STK11. Several studies have been demonstrated that AR is a target to prevent androgen-related metabolic disorders like T2DM. AR is less important in females to maintain energy homeostasis, but elevated androgen concentrations increase pathological levels leading to metabolic dysfunction [34]. The main clinical hallmark of PCOS is hyperandrogenism [35] and clinical evidence has been reported that the ovary is the primary source of androgens in women with PCOS [36]. Gao et al. [37] reported that AR was differently expressed in PCOS, especially in actual PCOS subtypes. On the other side, it is also hypothesized that gene variants in SKT11 would be associated with the metabolic risk in PCOS women [38, 39]. Similarly, Single Nucleotide Polymorphism in the STK11 gene has been suggested to be associated with metformin efficacy in PCOS-treated patients [40]. Another study has demonstrated that a polymorphism in the STK11 gene is associated with low ovulatory response to treatment with metformin alone in a prospective, randomized trial [41].

However, the current analysis of the GEO dataset revealed major metabolic processes and pathways involved in PCOS women that may lead to the risk of T2DM. The DEGs identified are majorly found enriched in the down-regulation of various biological processes and pathways. Overall, our systematic analysis will gain insights into PCOS pathogenesis at molecular level and help to identify the potential candidate genes for development of metabolic disorders in PCOS individuals. Therefore, the hub genes and pathways may be potential therapeutic targets of PCOS treatment. Nevertheless, the potential limitations and other alternative explanations would be very insightful for future research, such as the limited control numbers in the database. Furthermore, we inferred the possible role of the hub genes identified, which need to be verified by further experimental biological studies and confirm the potential mechanisms of the hub proteins identified. The next stage of this study involves in vivo or clinical studies to verify in silico results.

5 Conclusions

The current results demonstrate that the pathogenesis of PCOS is linked with the risk of developing T2DM with the contribution of common pathological pathways in women. The two hub genes AR and STK11 identified from the gene interaction network are clinical hallmarks of PCOS and T2DM, respectively. These hub genes are involved in the pathogenesis via the AMPK pathway and adipocytokine signaling pathway. Based on the obtained results, the molecular mechanism underlying in developing diabetic risk in PCOS women can be investigated. Additional in vivo or clinical research is needed to validate the function of the identified genes as potent diagnostic or therapeutic interest.