Introduction

Liver fibrosis is a prevalent chronic liver disease that poses a significant global health challenge [1]. It contributes to approximately 2 million deaths annually worldwide, making it a substantial burden on public health [2]. The causes of liver fibrosis can vary and include viral infections (such as hepatitis B and hepatitis C), autoimmune and genetic diseases, alcoholic steatohepatitis, and non-alcoholic steatohepatitis. Notably, liver fibrosis plays a critical role in the progression of liver diseases, leading to liver cirrhosis and hepatocellular carcinoma, which currently ranks as the 11th most common cause of death globally [2].

Despite numerous research efforts, the development of effective and safe anti-fibrotic agents for treating liver fibrosis remains a challenge [3, 4]. Liver transplantation has traditionally been the most effective treatment option for end-stage liver fibrosis, but it comes with several limitations [5]. An alternative strategy to reduce liver fibrosis-related mortality is to impede the progression of fibrogenesis [6]. To achieve this, a comprehensive understanding of the underlying biomarker mechanisms is imperative, as it may offer novel avenues for managing and treating liver fibrosis [7].

In this study, we employed a bioinformatic approach to identify genetic-driven biomarkers associated with liver fibrosis. By integrating multiple genomic databases, we examined genes related to liver fibrosis using the DisGeNET database. Furthermore, we conducted functional enrichment analysis and protein–protein interaction (PPI) network analysis to gain insights into the interactions and cooperative behaviors of these genes during the development of liver fibrosis. Leveraging the CytoHubba plugin within Cytoscape, we utilized MCC algorithm to identify key genes of interest. Detailed information on the study workflow is depicted in Fig. 1. Previous studies have successfully employed the CytoHubba approach to identify potential biomarkers in various diseases [8, 9], making it a valuable tool for our investigation. By combining these datasets, our study aimed to identify novel biomarkers that could aid in the diagnosis, management, and treatment of liver fibrosis.

Fig. 1
figure 1

Study workflow of bioinformatics analysis

Methods

Identified liver fibrosis-associated genes

DisGeNET, a comprehensive genomic database (http://www.disgenet.org) [10], was used to find genes associated with liver fibrosis (accessed on May 16, 2023). The selection of genes was based on their correlation with the disease/trait attribute of “Liver fibrosis.” DisGeNET provides a valuable resource for investigating the genetic factors underlying various diseases, including liver fibrosis, by integrating information from multiple sources such as scientific literature and public databases. By focusing on the disease/trait attribute relevant to liver fibrosis, we ensured that our gene selection was targeted toward identifying genes directly implicated in the development and progression of this particular condition.

Gene ontology enrichment analysis

The 2019 version of the Web-based Gene Set study Toolkit (WebGestalt) was used to conduct an enrichment study of gene ontology (accessed on May 16, 2023) [10]. The gene ontology (GO) database contains valuable annotations that provide insights into the characteristics and putative functions of genes and gene products across different organisms. By utilizing this database, we aimed to identify enriched genes associated with liver fibrosis and gain a better understanding of their biological relevance. The gene ontology (GO) classification system encompasses three major categories: biological process (BP), cellular component (CC), and molecular function (MF). The BP category represents a collection of molecular functions that describe diverse biological processes occurring within an organism. The CC category describes the subcellular structures, locations, and macromolecular complexes where genes are active. Finally, the MF category explains the specific function of a gene or gene product [10]. To determine the statistical significance of the enriched genes, we set the significance threshold at a q-value (false discovery rate, FDR) of less than 0.05. This rigorous threshold ensures that the identified enriched genes are highly likely to be biologically relevant to liver fibrosis, minimizing the chances of false positive results. By conducting gene ontology enrichment analysis, we aimed to unravel the functional implications of the identified genes and gain insights into the underlying molecular mechanisms driving liver fibrosis.

KEGG pathway enrichment analysis

Using the WebGestalt 2019 online database (accessed on May 16, 2023), we performed KEGG pathway enrichment analysis [10]. This analysis aimed to identify significantly altered pathways associated with the candidate genes we selected for liver fibrosis. The KEGG database serves as a valuable resource for understanding biological pathways and their involvement in various diseases and biological processes [11]. This analysis provides insights into the functional implications and potential molecular mechanisms underlying the development and progression of liver fibrosis. The resulting q-values serve as a measure of the significance of the pathway enrichment results, indicating the likelihood that the observed enrichment is not due to chance. Presenting the KEGG enrichment results with q-values below 0.05 highlights the pathways that exhibited significant enrichment in relation to liver fibrosis. These findings provide valuable information for understanding the altered biological processes and signaling pathways involved in liver fibrosis and may aid in identifying potential therapeutic targets or biomarkers associated with the disease.

Discovering biomarker gene of liver fibrosis

Subsequently, we performed a protein–protein interaction (PPI) analysis using the candidate genes and proteins in the STRING database (https://string-db.org/), accessed on May 16, 2023. The STRING database is a comprehensive resource that catalogs known and predicted protein–protein interactions, encompassing both physical and functional associations [12]. We utilized the default settings provided by STRING for this analysis. To visualize and analyze the PPI network, we employed Cytoscape software version 3.7.2 (Bethesda, MD, USA), accessed on May 16, 2023. Cytoscape is a powerful tool for visualizing and interpreting complex biological networks [13]. To identify important modules within the PPI network, we utilized the Cytoscape plugin molecular complex detection (MCODE). MCODE applies specific scoring and parameter thresholds to screen for densely connected regions in the network. In our analysis, we used the following MCODE parameters: k score = 2, degree cutoff = 2, node score cutoff = 0.2, and maximum depth = 100 [14]. This module detection method helps uncover significant subnetworks that may represent functional units or protein complexes relevant to liver fibrosis. Additionally, we employed the CytoHubba plugin in Cytoscape to analyze the network structure of the PPI network and identify hub genes. CytoHubba offers eleven topological analysis methods, including degree (Deg), edge percolated component (EPC), maximum neighborhood component (MNC), density of maximum neighborhood component (DMNC), maximal clique centrality (MCC), and six centralities based on shortest paths (Bottleneck, EcCentricity, Closeness, Radiality, Betweenness, and Stress). Among these methods, we utilized the MCC algorithm, which has been shown to accurately predict essential proteins in the yeast PPI network [15]. The top-ranked gene obtained from the MCC algorithm was considered a potential biomarker gene, as it may play a critical role in liver fibrosis. By employing these rigorous network analysis techniques, we aimed to uncover key interactions and hub genes within the PPI network associated with liver fibrosis. Identifying potential biomarker genes through this approach may contribute to a better understanding of the molecular mechanisms underlying liver fibrosis and facilitate the development of novel diagnostic or therapeutic targets.

Result

Identified liver fibrosis genes

In order to identify genes implicated in liver fibrosis, we conducted a query on the DisGeNET database (http://www.disgenet.org; accessed on May 16, 2023). This comprehensive database provided us with a total of 105 genes that have been associated with liver fibrosis, offering valuable insights into the genetic landscape of this condition (Table 1). To ensure the relevance and reliability of the retrieved genetic information from DisGeNET, we applied a filtering criterion. Specifically, we selected genes with score values higher than 0.3. This criterion was implemented to prioritize genes that have stronger associations with liver fibrosis based on the available evidence within the DisGeNET database. The goal was to focus on genes that are more likely to play significant roles in the development and progression of liver fibrosis. By employing this filtering criterion, we aimed to refine the gene set to those with more robust genetic associations, thereby enhancing the reliability and biological significance of the identified genes. This approach allows us to prioritize genes that could serve as potential targets for further investigation and therapeutic development in the context of liver fibrosis.

Table 1 Liver fibrosis-associated genes

Gene ontology enrichment analysis of liver fibrosis

GO enrichment analysis was carried out utilizing the WebGestalt 2019 online tools in order to examine the biological characteristics of the found genes. The analysis encompassed three major categories: BP, CC, and MF. The BP analysis unveiled a total of 1134 significantly enriched functions. Notably, the top-ranked results demonstrated strong associations with “response to endogenous stimulus,” “response to oxygen-containing compound,” and “response to nitrogen compound.” The CC analysis identified 34 significantly enriched functions, including “extracellular matrix,” “collagen-containing extracellular matrix,” and “endoplasmic reticulum lumen.” Furthermore, the MF analysis revealed 50 significantly enriched functions, such as “signaling receptor binding,” “growth factor binding,” and “extracellular matrix structural constituent.” To provide a concise overview of the enriched functions within each category, Fig. 2 visually represents the top 10 significance obtained from the GO enrichment analysis (BP, MF, CC). This figure serves as a summary, highlighting the key biological processes, cellular components, and molecular functions associated with the identified genes. For a more detailed examination of the GO enrichment analysis results, please refer to Supplementary Table 1, which provides comprehensive outcomes.

Fig. 2
figure 2

Gene ontology enrichment analysis of liver fibrosis-associated genes using WebGestalt 2019. (A) Top 10 enriched biological processes (BP); (B) top 10 enriched cellular components (CC); (C) top 10 enriched molecular functions (MF)

KEGG pathway enrichment analysis of liver fibrosis-associated genes

In order to gain insights into the potential involvement of pathways related to the identified gene candidates, we performed KEGG enrichment pathway analysis using WebGestalt 2019. This analysis aimed to elucidate the functional implications of the identified genes within established biological pathways. By applying a q-value threshold of less than 0.05, the KEGG analysis revealed a total of 25 pathways that exhibited significant changes. Among the highly scored pathways, notable categories included “AGE-RAGE signaling pathway in diabetic complications,” “Complement and coagulation cascades,” and “PI3K-Akt signaling pathway,” as shown in Fig. 3 These findings provide evidence of the potential involvement of these pathways in the context of liver fibrosis. For a detailed overview of the outcomes of the KEGG pathway enrichment analysis, please refer to Supplementary Table 2.

Fig. 3
figure 3

Top 10 of KEGG enrichment analysis of liver fibrosis-associated genes

Discovering biomarkers of liver fibrosis-associated genes

We utilized the STRING database to construct a protein–protein interaction (PPI) network comprising 105 genes associated with liver fibrosis. Subsequently, we employed Cytoscape plugins such as MCODE and CytoHubba to extract potential biomarker genes from the PPI networks. With the MCODE approach, we identified gene clusters, which represent possible biomarkers, by subclustering the PPI network into four distinct subclusters (Fig. 4). Furthermore, we employed CytoHubba to select hub genes from the PPI network. Hub genes are highly connected nodes within the network that play crucial roles. To assess the significance of each node, we employed CytoHubba’s MCC method. Remarkably, the top 10 hub genes identified using this method (TGFB1, MMP2, CTNNB1, FGF2, IL6, LOX, CTGF, SMAD3, ALB, and VEGFA) were also found to be the top 10 potential biomarkers (Fig. 5).

Fig. 4
figure 4

Depicts the construction of a protein–protein interaction (PPI) network for genes associated with liver fibrosis, using the STRING database and Cytoscape software. The resulting PPI network consisted of 95 nodes (genes) and 631 edges (interactions between genes). Notably, by applying the MCODE plugin in Cytoscape, we detected four distinct modules within the PPI network. These modules were labeled as Cluster 1 (score = 19.636), Cluster 2 (score = 5.333), Cluster 3 (score = 3), and Cluster 4 (score = 3), reflecting their respective levels of significance within the network

Fig. 5
figure 5

Higher degree values are represented by pink to yellow colors. The rank of genes was identified from CytoHubba. The higher score, the higher rank of the biomarker

Discussion

With the advances in genomics research, the bioinformatics-based approach is one of the potential approaches to proposing biomarkers for various diseases. In our study, we employed this approach to identify potential genetic-driven biomarkers for liver fibrosis, marking the first of its kind in liver fibrosis research. Utilizing advanced bioinformatic tools, we delved into the intricate molecular mechanisms associated with the disease and identified promising biomarker candidates. Herein, we utilized DisGeNET as a valuable platform for exploring genes and variations associated with human diseases, including liver fibrosis [16]. Our study employed DisGeNET, in conjunction with GO enrichment studies, KEGG pathway enrichment analyses, and PPI analysis, to conduct a comprehensive bioinformatics analysis.

Liver fibrosis is characterized by the excessive accumulation of extracellular matrix (ECM) in the subendothelial compartment. The ECM can be broken down by matrix metalloproteinases (MMPs), while tissue inhibitor matrix metalloproteinases (TIMPs) promote ECM formation and prevent its breakdown [17]. Under normal physiological conditions, there is a balanced regulation of MMPs and TIMPs to maintain ECM homeostasis. However, in liver fibrosis, this balance is disrupted [18]. Hepatic stellate cells (HSCs) are the primary source of ECM in the liver. Normally, HSCs are responsible for vitamin A storage, but liver injury triggers their activation [19]. This HSC activation is crucial in the early stages of liver fibrosis. Activated HSCs produce collagen-1 (Col-1), a major constituent of the ECM. This study identified ten potential biomarkers for liver fibrosis: TGF β-1, MMP-2, CTNNB-1, FGF-2, IL-6, LOX, CTGF, SMAD-3, ALB, and VEGFA. Among these biomarkers, TGF β-1 and MMP-2 stood out due to their high systemic scores in the CytoHubba MCC algorithm, suggesting their potential as useful biomarkers for liver fibrosis [20].

Transforming growth factor beta (TGFβ) plays a central role in the development of tissue fibrosis, particularly in conjunction with Smad signaling, which leads to the activation of myofibroblasts and subsequent extracellular matrix transformation (ECMT) [21]. TGF-β1, known for its profibrogenic and immunosuppressive properties, is released during liver injury by Kupffer cells, sinusoidal endothelial cells, and other inflammatory cells, contributing to its activation. This cytokine serves as a master profibrogenic agent, activating hepatic stellate cells (HSCs) via the TGFβ/Smad3 signaling pathway [22]. Consequently, TGF-β1 not only increases HSC activation but also influences the expression of MMPs and TIMPs. Inhibition of TGF-β1 has been demonstrated to suppress HSC activation both in vivo and in vitro [23]. The TGF-β1/Smad pathway plays a role in ECM deposition by enhancing TIMP1 expression and inhibiting MMP2 expression. Additionally, serum levels of TGF-β1 have been associated with the severity of inflammation and stages of liver fibrosis. TGF-β1 shows promise as a serum biomarker for the progression of liver inflammation and fibrosis, particularly in chronic HCV infection [23,24,25].

Matrix metalloproteinases (MMPs) are enzymes that degrade components of the extracellular matrix (ECM), maintaining ECM integrity and composition, while also participating in ECM-mediated signaling. MMP-2, also known as gelatinase A, is predominantly expressed by hepatocytes, particularly hepatic stellate cells (HSCs) and Kupffer cells (KCs). It is one of the extensively studied enzymes in liver fibrosis and is involved in maintaining vascular homeostasis in the liver’s vascular region. Studies have demonstrated a correlation between MMP-2 expression and fibrosis progression, irrespective of the underlying etiology, suggesting its profibrogenic properties. Elevated MMP-2 expression has been associated with various liver conditions, including chronic hepatitis, liver fibrosis, alcoholic cirrhosis, ischemia–reperfusion injury (IRI), biliary atresia (BA), and hepatocellular carcinoma (HCC) [26]. MMP-2 activity has shown potential as a serum marker for disease severity in alcoholic liver disease [27].

The utilization of DisGeNET and bioinformatic approaches enabled the identification of potential genetic-driven biomarkers for liver fibrosis. The dysregulation of MMPs and TIMPs, along with the involvement of TGF-β1 and MMP-2, underscores their significant contributions to the pathophysiology of liver fibrosis. TGF-β1 activates hepatic stellate cells (HSCs) and promotes the deposition of extracellular matrix (ECM) through the TGFβ/Smad3 signaling pathway, whereas MMP-2 plays a role in ECM remodeling. These findings offer valuable insights into the underlying genetic mechanisms of liver fibrosis and present potential candidates for future clinical investigations and early detection of the disease.

However, it is essential to acknowledge certain limitations that require careful consideration. The presented results are based on current information obtained from DisGeNET, and future updates or new data may influence these findings. Moreover, the analysis conducted in this study is primarily exploratory, and further confirmation through functional studies is necessary to validate the results.

Conclusions

In conclusion, this study identifies potential genetic-driven biomarkers for liver fibrosis through bioinformatic analyses. The significant hub genes, including TGFB1, MMP2, CTNNB1, FGF2, IL6, LOX, CTGF, SMAD3, ALB, and VEGFA, provide valuable insights into the progression of liver fibrosis. Specifically, TGF-β1 and MMP-2 stand out as promising biomarker candidates, supported by their high systemic scores in the CytoHubba MCC algorithm. However, further research is needed to fully elucidate the regulatory mechanisms and clinical implications of TGF-β1 and MMP-2 in liver fibrosis. The investigation and validation of these genes as clinical indicators for liver fibrosis warrant future studies.