Introduction

Heart failure (HF) is caused by several factors, including systolic and/or diastolic dysfunction, cardiac pumping dysfunction, and end-stage of cardiovascular diseases, which seriously threaten human health [1]. In recent years, with advancements in social medicine and population aging, the incidence of cardiac dysfunction in patients with various cardiovascular diseases has gradually increased, and the incidence of HF has continued to rise, posing a serious threat to public health [2]. According to epidemiological statistics, the global prevalence of HF is 1–2%, with over 10% of affected individuals above the age of 70 years [3]. Between 2013 and 2016, an estimated 6.2 million American adults over the age of 20 years suffered from HF [4]. According to the latest survey report, the life-long risk of HF increases after 45 years of age and varies from 20 to 45%, depending on race and ethnicity [5]. Therefore, the prevention and treatment of HF has become the highest priority for medical professionals.

Drug treatment is the primary treatment modality for patients with HF. A new era of neuroendocrine inhibitor treatment for HF began in 1987, when CONSENSUS successfully confirmed that treatment with angiotensin-converting enzyme inhibitors reduced the mortality rate in patients with HF by 27% [6]. Subsequently, other potent drugs, such as angiotensin receptor neprilysin inhibitor (ARNI) [7] and sodium-glucose cotransporter 2 inhibitors (SGLT2i) [8, 9], have developed to reduce mortality in patients with HF. With advances in clinical management, targeted therapies are being implemented in the cardiovascular field, and are expected to represent a significant breakthrough in the treatment of patients with HF [10]. Various factors can contribute to myocardial injury and subsequent aseptic inflammation of the myocardium [11]. If a pathogenic infection is present, it leads to inflammatory damage to the myocardium [12], activates inflammatory factors, and finally results in myocardial fibrosis [13]. Inflammatory reactions are involved in both the onset and progression of HF. Most researchers believe that hemodynamic disorders, tissue injury and gastrointestinal mucosal ischemia in patients with HF can directly or indirectly activate the immune system in vivo, increasing circulating inflammatory cytokine levels. These cytokines activate the target cells by interacting with specific receptors on the cell membrane, thereby triggering a systemic inflammatory response [14, 15]. Therefore, finding an effective method of diagnostic and therapeutic strategy, elucidating the molecular biological mechanisms of pathogenesis, and inhibiting damage caused by inflammatory reactions, are crucial for inhibiting and delaying the progression of HF.

Biomarkers are biological molecules found primarily in the blood, other bodily fluids, or tissues, and are usually composed of DNA, RNA, microRNA, epigenetic modification, or antibodies. They possess hypersensitivity, specificity, and positive diagnostic value for diseases [16, 17]. HF is a complex pathophysiological process, involving multiple factors [18], however, it can be predicted using a single gene [19]. Bioinformatics is a high-throughput technique that can be used to screen multiple databases to identify potential pathological biomarkers for various diseases [20]. In recent years, the development and application of DNA microarrays and next-generation sequencing technologies, have enabled simultaneous analysis of thousands of genes in different disease samples. Therefore, the use of biomarkers for diagnosis, prognosis, and personalized medical services has increased. Numerous studies have been conducted on biomarkers for the diagnosis of HF. B-type natriuretic peptide (BNP) was one of the earliest biomarkers used to diagnose acute HF [21]. The plasma concentration, stability, and diagnostic value of Nt-proANP and Nt-proBNP are higher in patients with chronic HF [22, 23]. Studies have shown that the combined determination of adiponectin and NT-proBNP is more accurate than that of NT-proBNP alone [24]. Other diagnostic biomarkers, such as miR-302b-3p [25], Soluble ST2 [26], and Gal-3 [27], have recently been identified.

Since the beginning of the 21st century, artificial intelligence (AI) has progressively permeated all aspects of our existence, particularly in the medical field [28]. With continued exploration of the potential of artificial intelligence, AI-based clinical research will result in a paradigm shift in medical practice, thus significantly improving the survival rate of many diseases including cancer [29]. Currently, the diagnosis of HF is primarily based on the clinical signs and symptoms of patients, with echocardiography and chest radiography serving as the most common auxiliary tests. However, these examinations are not accurate during the intermediate and late phases of the disease and lack clinical specificity and sensitivity. Therefore, exploring a reliable diagnostic approach to reduce mortality and improve the prognosis of patients with HF is critical. In this study, we first identified the characteristic abnormal genes associated with HF using machine learning, and then constructed and validated a prediction model using an artificial neural network.

Methods

Data acquisition

We accessed the available datasets from the GEO website (https://www.ncbi.nlm.nih.gov/geo/) to construct HF and NFD (non-failure donor) cohorts. The Train group included 16 cases without HF, 86 cases with dilated cardiomyopathy (CMP), and 108 cases with ischemic heart disease from GSE5406 dataset [30], as well as 13 cases of dilated cardiomyopathy and 15 cases of non-HF GSE3586 dataset [31]. To verify the reliability and stability of the artificial neural network model more thoroughly, we included 96 cases with ischemic heart disease, 84 cases with dilated cardiomyopathy (CMP), and 139 cases without HF in the Test group from the GSE57345 dataset [32].

Analysis of differential gene expressions and functional enrichment in HF

The R package “limma” was used to identify differentially expressed genes (DEGs) in HF, and “heatmap” was used to plot heat and volcano maps. Genes with |logFC| > 0.6 (|fold change| ≥ 1.5) and false discovery rate (FDR) < 0.05 were considered statically significant. Subsequently, we performed GO and KEGG enrichment analysis [33] using R packages “org.Hs.eg.db”(3.14.0), “clusterProfiler”(4.0), “ggplot2”(3.3.5) and “enrichment plot”. For further biological insights into the DEGs, we conducted bioinformatics analysis of Reactome, WikipathwayCancer and Metascape analysis using WebGestalt 2019 website (http://www.webgestalt.org/) and Metascape (version 3.5; http://metascape.org/), while protein-protein interaction modes were obtained from the String website (version 11.5; https://cn.string-db.org/).

Identification of disease-specific genes and construction of an artificial neural network

We then conducted a random-forest analysis using the “randomForest” package (version 4.6) and filtered the DEGs to identify the nodes with the lowest cross-validation errors. The parameter settings were seed = 123,456 and ntree = 500. Homologous genes acquired using the above two approaches were identified as HF-specific genes. Disease signature genes were visualized using the “limma” and “pheatmap” packages (version 3.5.3), and the samples were clustered according to their expression. To eliminate batch effects between cohorts, we scored the DEGs based on their expression relative to the median value: upregulated genes were assigned a score of 1 for values, greater than the median value, otherwise, they were scored 0. When this gene was down-regulated, the score followed the opposite pattern. We constructed an artificial neural network to diagnose HF using gene scores. The neural network consisted of three layers, an input, a hidden, and an output layer. The R package used in this step was “NeuralNetTools”, and the seed was set to 12,345,678.

Evaluation of the artificial neural network model

The same approach was used to test and validate the gene cohort, and to evaluate the diagnostic accuracy of the HF model. To evaluate the efficiency of the artificial neural network model, we plotted ROC curves for the two cohorts using the “pROC” package (1.15.3). In the ROC curve, the horizontal scale denoted the false positive rate, representing “1-Specificity”, and the vertical scale denotes the true positive rate, representing “Sensitivity”. The area under the curve (AUC) represented the accuracy of the model, which was our primary focus.

The immunological milieu of HF

The CIBERSORT algorithm (https://cibersort.stanford.edu/runcibersort.php) for immune cell infiltration was used to quantify 22 immune cells, and the results were filtered using a p-value < 0.05. The analysis was performed using the R packages “e1071”, “preprocessCore” and “CIBERSORT.R”. Based on these results, we calculated the correlation between immunocytes. The “corrplot” package (version 0.92) visually displayed immune cell contents and predicted their correlation. Finally, we measured the distribution of immunocytes, which differed between cases with or without HF.

Results

Identification of DEGs and functional enrichment analysis

After setting the parameters |logFC| > 0.6 and FDR < 0.05, differential expression analysis of the GEO dataset revealed 153 differentially expressed genes, of which 81 were down-regulated and 72 were up-regulated (Fig. 1A and B, Additional File Table 1). For these DEGs, GO enrichment analysis showed that the 81 down-regulated genes were primarily associated with the positive regulation of vascular development, angiogenesis, neutrophil activation, L-amino acid transport, and neutrophil-mediated immunity (Fig. 2A). The up-regulated 72 genes were primarily involved in muscle system processes, extracellular matrix organization, extracellular structure organization, muscle contraction, and cell-substrate adhesion (Fig. 2B). KEGG enrichment analysis indicated that the down-regulated 81 genes were mainly associated with the PI3K-AKT signaling pathway, MAPK signaling pathway, Cytokine-cytokine receptor interaction, Calcium signaling pathway, HIF-1signaling pathway, Chemokine signaling pathway, Focal adhesion, JAK-STAT signaling pathway, AGE-RAGE signaling pathway in diabetic complications, Th17 cell differentiation, and amino acids biosynthesis (Fig. 2C). The up-regulated 72 genes were mainly involved in the Calcium signaling pathway, cGMP-PKG signaling pathway, AGE-RAGE signaling pathway in diabetic complications, Th17 cell differentiation, Th1, and Th2 cell differentiation, Peroxisome, Valine, leucine, isoleucine degradation, and renin secretion pathways (Fig. 2D).

Fig. 1
figure 1

Genome-wide identification of differentially expressed genes of HF. (A) The heatmap of DEGs in Train group. (B) The volcano plots of DEGs.

Fig. 2
figure 2

Functional enrichment analyses of DEGs for HF. (A) Chord diagrams of GO terms belonging to the top 81 down-regulated genes of HF. (B) Chord diagrams of GO terms belonging to the top 72 up-regulated genes of HF. (C) Chord diagram of KEGG enrichment pathway of the first 81 down-regulated genes in HF. (D) Chord diagram of KEGG enrichment pathway of the first 72 down-regulated genes in HF.

Prediction of the function and disease spectrum of HF-related factors

The meta-scene analysis provided an overview of the network diagram (Fig. 3A). The nodes in this network represented the functions or pathways. The higher the similarity between the two nodes, the more genes were shared between the two functions or pathways. Figure 3B lists the HF-related factors, such as regulation of cell adhesion and vascular development, while Fig. 3C, HF-related diseases are predicted based on disease prevalence, such as idiopathic pulmonary arterial hypertension and myocardial ischemia. Additionally, Fig. 3D screens HF-related transcription factors, including HIF1A, SP1, EGR1, and CTCF from an epigenetic perspective. Figure 3E depicts the function of these genes within specific cells.

Fig. 3
figure 3

Prediction of function and disease spectrum of factors related to HF. (A) Metascenario analysis provides an overview of network diagrams. (B) List factors associated with HF. (C) The HF-related diseases through disease prevalence were predicted. (D) HF-related transcription factors from the perspective of epigenetics were screened. (E) The genes in cells were listed

Network analysis of protein-protein interactions

Protein-protein interaction (PPI) enrichment of the HF-related genes was analyzed using Metascape algorithm. In the network diagram, nodes represented genes or proteins, and nodes with the same color represent genes or proteins with related functions. The connection between the two nodes indicated a protein-protein interaction between the two genes (Fig. 4A), and Fig. 4B shows the correlation between functionally different genes or proteins.

Fig. 4
figure 4

The Metascape analyses the protein-protein interaction. (A) Pathway and process enrichment analysis of HF. (B) The sub-module analysis of protein-protein Interaction

Selection of disease-specific genes and prediction model for the HF

Figure 5 A illustrates the random forest algorithm, with the X- and y-axes representing the number of trees and cross-validation error, respectively. The black lines indicate the error values for all samples. During cross-validation, we identified the point with the minimum error. After locating this point, the number of trees corresponding to this point, which was the lowest point on the black line, was determined. Then Fig. 5B was created, the Y-axis represented the gene name, and the X-axis represented the importance score of the gene. The gene was considered more important if the score was higher. Genes with scores higher than 4 were selected for subsequent analysis. The heat map (Fig. 5C) showed the aggregation of genes, indicating the pathogenic nature of the genes detected in random forest trees. Figure 5D depicts the construction of a neural network model based on gene scores, where the input layer comprising genes for multiple diseases was linked to the hidden layer displaying disease-related genes according to their obtained scores and weights. We observed that there were five nodes in the hidden layer. Based on these five nodes and their respective weights, we obtained the output layer, which was the attribute of the sample. The accuracy of the model was further evaluated by constructing ROC curves. The accuracy of train group and test group was 0.993 and 0.995, respectively. Figure 5E, F clearly show that the areas under the ROC curves are 0.996 and 0.863, respectively. The AUC values were greater than 0.75, indicating that our diagnostic model was accurate, reliable, and unaffected by alterations in the cohort group. The precision, recall, and F1 score of Train group were 0.957, 0.963, and 0.945, respectively. The precision, recall, and F1 score of Test group were 0.893, 0.826, and 0.842, respectively.

Fig. 5
figure 5

Identification of characteristic genes of HF by machine learning and the construction of diagnosis signature by an artificial neural network. (A) The construction of RandomForest. (B) Identification of HF signature genes based on significance scores. (C) The heatmap of CRC characteristic genes. (D) Schematic view of the artificial neural network. (E) The ROC curves demonstrate the diagnostic performance of the artificial neural networks for HF in Train Group (GEO). (F) Test Group (HF of TCGA)

The immune microenvironment of HF

The histogram in Fig. 6A displays the presence of 22 distinct immune cell types. We assessed their correlations by determining the infiltration of immune cells. The results are shown in Fig. 6B, with numbers representing the correlation coefficient, red indicating a positive correlation, and blue indicating a negative correlation. The highest positive correlation coefficient between activated dendritic cells activated and NK cells was 0.41, and the highest negative correlation coefficient between activated NK cells and regulatory T cells was − 0.52. Then the immune cell fractions were compared between the groups, and the results revealed significant differences in naïve B cells, plasma cells, CD4 naïve T cells, CD4 memory-activated T cells, regulatory T cells, γδ T cells, activated NK cells, monocytes, M2 macrophages, activated DC, and resting mast cells (P < 0.05, Fig. 6C).

Fig. 6
figure 6

The immune microenvironment of HF. (A) Histogram of 22 kinds of immune cells in HF patients and normal controls. (B) The correlation between various immune cells of HF patients. (C) Violin chart of differences of individual immune cells

Discussion

The mechanism of HF is complex and has not yet been fully elucidated at present. Research suggests that abnormal vascular microcirculatory metabolism [34], aberrant expression of multiple inflammatory markers [35, 36], immune responses [37], and abnormal expression of metabolic proteins are closely associated with heart failure [38]. HF is the leading cause of death from cardiovascular disease. Various RNAs and genes are involved in regulating cellular activities of vascular smooth muscle cells then affecting cardiovascular disease [39, 40]. For instance, inflammatory cytokines IL-6 and TNF-α could be the main targets of miR-296a, and their expression was abnormal in peripheral blood mononuclear cell of patients with coronary artery disease [40]. Despite significant advances in the treatment of HF in recent years, the 5-year survival rateremains approximately 50% [41], and the prognosis is still poor. Therefore, early diagnosis and treatment of patients are crucial for reducing the incidence and mortality of HF patients. In this study, a HF model consisting of 16 characteristic genes was constructed using machine learning and artificial intelligence based on high-throughput sequencing data from public databases. The model demonstrates high sensitivity and specificity for screening purposes, to prevent HF.

HF transcription factors involved in genome regulation have been proposed as putative epigenetic mechanisms. Zhao and his team workers [42] demonstrated that MIAT silencing reduces the incidence of HF by activating the PI3K/Akt signaling pathway. The present study found that the PI3K-Akt signaling pathway is down-regulated in HF. Consistent with the findings of this study, previous research has demonstrated that the JAK/STAT signaling pathway mediates the inflammatory response, left ventricular remodeling, and myocardial ischemia-reperfusion injury, via the downregulation of genes enriched in the JAK-STAT signaling pathway [43, 44]. Furthermore, the downregulated GO and KEGG genes were enriched in the HIF-1 signaling pathway, Th17 cell differentiation, organic anion transport, neutrophil activation, and other pathways and functions. Studies [45] have revealed that CaMK II oxidative activity is significantly increased in patients with HF, thereby activating the calcium signaling pathway, which is consistent with the results of this study. Moreover, downregulated GO and KEGG genes were enriched in the cGMP-PKG signaling pathway, AGE-RAGE signaling pathway in diabetic complications, muscle system processes, and extracellular matrix organization. Th17 cell differentiation was enriched in both up-regulated and down-regulated genes. Research [46,47,48] has highlighted that Th17 cells can produce IL-17 and IL-22, which are key effector cytokines. IL-17 is an effective inducer of matrix metalloproteinase-1 (MMP-1) in human cardiac fibroblasts, which may have potential implications in cardiac fibrosis, remodeling, and heart failure through various pathways.

We identified disease-specific genes for HF using a random forest algorithm in machine learning to facilitate the integration of neural network models. The artificial neural network method, which has been extensively applied to cancer diagnosis and treatment models, was used to construct a diagnostic model of HF [49]. The prediction model of rectal cancer-related microsatellite instability (MSI) established by Stanford University [50] successfully predicted MSI by identifying the whole-glass scanning image (WSI) of HE staining. Moreover, the DeepLabV3 + semantic segmentation model exhibits good feature extraction and semantic image segmentation abilities. ResNet50, a classical image-classification model, has been widely used for target classification and other fields [51, 52]. Artificial neural network models have been applied to lung cancer [53] and breast cancer [54]. This is the first time that an artificial neural network approach has been used to develop a heart failure disease model. Our diagnostic signal comprised of 16 genes (ECM2, LUM, ISLR, ASPN, PTN, SFRP4, GLT8D2, FRZB, FCN3, TEAD4, NPTX2, LAD1, ALOX5AP, RNASE2, IL1RL1, CD163). Currently, there are no studies on the direct correlation between the ECM2, GLT8D2, NPTX2, LAD1 genes and HF, however, evidence suggests their potential association with HF, may become potential biomarkers for HF diagnosis in the future [18, 55,56,57]. For instance, ECM2 was related to immune process and could serve as a target for immunotherapy for glioma [58, 59]. Consistently, the present study observed that HF-related genes, including ECM2 and CD163, were associated with immune cells. CD163, a receptor for tumor necrosis factor-like weak apoptosis-inducing factor (TWEAK), may serve as a novel marker of HF. Studies demonstrated its anti-inflammatory, antioxidant and cardiovascular-protective effects [60,61,62]. Furthermore, serum TWEAK levels are significantly higher in patients with HF than in healthy individuals [63]. ILIRL1 could be induced by cardiomyocyte stretch, and might reflect inflammation and hemodynamic stress in HF [64]. HIF-α is a key factor mediating the relationship between obesity and HF through affecting fibrosis and inflammation in adipose tissue [65]. This demonstrates the effectiveness of gene screening in this study, and the significance of these genes in the diagnosis, treatment, and prognosis of various diseases.

The pathological basis of HF is ventricular remodeling, specifically myocardial hypertrophy and fibrosis, which results from hemodynamic overload [66]. The JAK/STAT signaling pathway [44], reactive oxygen species (ROS) generation [67], calcium overload [68], Th17 cells, PI3K/AKT signaling pathway [69], and MAPK signaling pathway have all been implicated in the pathophysiology of HF. These findings are consistent with the results of the GO/KEGG enrichment pathway. In addition, cell adhesion molecules are also involved in the process of HF. A previous study noted that focal adhesion kinase-related pathways may be inhibited in metformin-treated vascular smooth muscle cells then retard the progression of vessel stenosis [70]. Heart failure is frequently associated with immune activation and inflammatory responses. As important inflammatory mediators, chemokines, exert chemotactic effects on various target cells, including vascular endothelial cells, which can contribute to the development of HF [71]. Studies have suggested interactions between myocardial cells and the microvascular system. Persistent pathological overload leads to cardiac maladaptation and remodeling, resulting in HF. At the same time, cellular senescence affects the cardiac regeneration and recovery in patients with ischemic heart disease. The study of the differential expression of metabolic proteins in patients with HF can enable a better understanding of the occurrence of HF, particularly the crucial role of angiogenesis factors [38]. Echocardiography is the most commonly used method for diagnosing heart failure. The disadvantage is that patients have organic lesions, and the artificial neural network is primarily calculated based on the scores of various factors, and after which the diagnosis is made. Neural networks have proven to be highly reliable in the diagnosis of HF, which is the first time this approach has been used in this context.

In addition to establishing early diagnostic models, we investigated the immune microenvironment of HF. Studies have shown that the pathological mechanisms underlying heart failure include inflammatory immune responses and inflammatory cell infiltration [37]. This study found a significant increase in dendritic cells (DCs) in cases of heart failure. DCs express MHC II, making them a unique cell group that presents antigens to T cells. DCs also secrete numerous growth factors and cytokines to modulate immune responses and inflammation [72]. Studies on myocardial infarction suggest that the eventual outcome of DCs activity depends on the subset of DCs involved and the type of effector cells that are subsequently recruited. Ideally, the activation of conventional dendritic cells, should increase the activity of tolerant dendritic cells to rapidly reduce inflammation [73]. NK cells are positively correlated with DCs proliferation. Activation of NK cells depends on the balance between activation and inhibitory signals from target cells [73]. In acute myocardial infarction, studies have shown that NK cells promote dendritic cell differentiation by releasing cytokines, thus forming a positive feedback pathway and influencing ventricular remodeling [74]. The primary pathology of HF is ventricular remodeling. Although there is no direct evidence of association between NK cells and HF, it can be confirmed that NK cells are increased in the pathophysiology of HF, which is consistent with the findings of this study. In contrast, investigations into cardiovascular diseases have revealed that the number of regulatory T cells is reduced. Regulatory T cells are a subset of CD4 + T cells with unique immunoregulation abilities that maintain immune homeostasis in the body, primarily through cell contact and the release of inhibitory cytokines (such as IL-10 and TGF-β1) [75]. Consistent with the results of this study, there may be a negative correlation between NK and regulatory T cells. However, this study had some limitations. First, this was a retrospective analysis, using datasets retrieved from the public database. Moreover, we only verified the predictive performance of HF, treatment and prognosis require further investigation. Additionally, experimental and clinical studies are necessary to validate the results of this study and to assess their implications for the treatment and prognosis of HF.

Conclusions

We developed an accurate HF diagnostic model using machine learning and an artificial neural network. Despite disparities between patient cohorts, this signature is still effective and can be used for personalized disease prediction and precision medicine. In addition, immune regulation plays crucial role in the progression of HF, and our results have potential implications for the use of immunotherapies to treat HF patients in the later stage. Substantive and scientific validation of these findings warrants large-scale prospective clinical trials and experimental studies.