Introduction

Early-onset sepsis (EOS) in newborns is a multiorgan system dysfunction that can lead to severe neonatal morbidity and mortality when pathogenic microbial strains are isolated from peripheral blood or cerebrospinal fluid within 7 days of birth [1]. Acute kidney injury (AKI) is one of the most common conditions presenting with neonatal EOS [2]. In practice, many newborns with suspected EOS are given intravenous broad-spectrum antibiotics for several days, which may interfere with early breastfeeding, lead to bacterial dysbiosis, and increase the risk of morbidity for many diseases, such as type I diabetes, asthma, and necrotizing enterocolitis (NEC), although the incidence of neonatal EOS is less than 6 per 1,0000 [3,4,5,6]. Early antibiotic treatment of very low birth weight preterm infants with suspected sepsis without blood cultures is associated with an increased risk of subsequent morbidity and mortality [7, 8]. In addition, no studies to date have shown that infants with EOS could benefit from antibiotic therapy. Due to the limited predictive power of routine laboratory tests such as complete blood count, erythrocyte sedimentation rate (ESR), procalcitonin (PCT), and C-reactive protein (CRP) to effectively distinguish neonatal EOS from suspected EOS, this may prolong the use of antibiotics in uninfected infants [9,10,11]. Hence, attempts to discover specific biomarkers of neonatal EOS and to reduce unnecessary antibiotic exposure in neonates with suspected sepsis are exceptionally important for clinical neonatal care. In the current study, we analyzed gene expression profiles of neonatal EOS patients from public databases to develop a genetic model for predicting sepsis, which could provide insight into early molecular changes and biological mechanisms of neonatal EOS.

Materials and methods

Obtainment of the GSE25504 cohort and identification of differentially expressed genes (DEGs)

Whole blood mRNA expression profiling of neonatal EOS patients in the GSE25504 dataset was downloaded from gene expression omnibus (GEO) databases (http://www.ncbi.nlm.nih.gov/geo/), which included microarray data from four platforms (GPL570, GPL6947, GPL13667, and GPL15158). Then, the Limma R package with cut-off criteria of p value less than 0.05 and |logFC|> 0.5 was used to identify DEGs between the sepsis and control groups on four platforms of the GSE25504 dataset. The overlapping DEGs among the four platforms were used to identify gene set enrichment with Gene set enrichment analysis (GSEA) in the Metascape database [12]. Results were considered statistically significant when the p value was less than 0.05.

Construction of the least absolute shrink and selection operator (LASSO) model

Based on these overlapping DEGs, the binomial LASSO model was constructed using the glmnet package on the GPL6947 platform. LASSO is a more refined model obtained by constructing a penalty function that makes it compress some coefficients while setting some coefficients to zero. Therefore, it retains the advantage of subset shrinkage and is a kind of biased estimation dealing with data with complex covariance. The characteristic of LASSO regression is that when building a generalized linear model, variables can be selectively put into the model to obtain better performance parameters, thus avoiding overfitting. Here, the generalized linear model contains one-dimensional continuous dependent variables, multidimensional continuous dependent variables, nonnegative count dependent variables, binary discrete dependent variables, and multivariate discrete dependent variables. The complexity of LASSO is controlled by λ, and the larger λ penalizes linear models with more variables more strongly, so as to finally obtain a good model with fewer variables. The score was at the last setup based on the premise of directly combining the equation underneath with the mRNA expression level duplicating the LASSO regression coefficient (β) when λmin was confirmed. Score = (βmRNA1 × mRNA1) + (βmRNA2 × mRNA2) + … + (βmRNAn × mRNAn). The accuracy of the predictive features was assessed by ROC examination. Receiver operating characteristic curve (ROC) analysis was used to examine the model’s ability to discriminate between EOS and normal infants.

Clinical specimens and quantitative real-time PCR (qRT-PCR) analysis

In the period from 1 Jan 2022 to 30 Jun 2022, 99 normal and 60 EOS infants from Children’s Hospital Affiliated to Zhengzhou University were included in this single-center retrospective case–control study. EOS infants presented with significant clinical signs and symptoms of sepsis (respiratory distress, anemia, fever, decreased absolute neutrophil count, or increased C-reactive protein) were confirmed with the results of positive blood culture. Peripheral blood mononuclear cells (PBMCs) were isolated from 1 mL of whole blood collected from each infant by density separation on a Ficoll-Paque. After total RNA was extracted from PBMCs, qRT-PCR was used to detect the mRNA levels of genes in the model [13]. Finally, the relative mRNA expression levels were normalized to β-ACTIN. Primer sequences are shown in Table 1.

Table 1 The sequences of the qRT-PCR primers used in this study

Identification of miRNAs targeting genes in our diagnostic models

Perform miRNA target prediction analysis to identify miRNAs that target genes in our diagnostic model by binding to their 3′ UTRs with four online public websites, including TargetScan [14], mirDIP [15], miRWalk [16], and miRmap [17] databases. Subsequently, target miRNAs predicted concordantly by the four databases were chosen to construct a miRNA-mRNA network by Cytoscape software [18].

Statistical analysis

Categorical data were compared with the Pearson chi-square test or Fisher exact test whenever appropriate, and quantitative variables were analyzed using the independent-sample t-test. The clinical characteristics of 159 infants are shown in Table 2. Results were considered statistically significant when the p value was < 0.05.

Table 2 Clinical characteristics of HCC patients involved in the study

Results

Identification of overlapped DEGs

As shown in Fig. 1A, DEGs on the four platforms were screened with a cut-off criterion of p value < 0.05 and |logFC|> 0.5; 20 upregulated genes and 8 down-regulated genes were identified as common DEGs (Fig. 1B), which are shown in Fig. 1C. We also constructed a protein–protein interaction (PPI) network based on these DEGs (Fig. 1D). Twenty-eight nodes and 11 edges were acquired from the PPI network. The local clustering coefficient was 0.332, and the PPI enrichment p value was equal to 4.78e − 05. We found that CD247, CD3G, CD8B, and LCK might be the core genes. Finally, the results of the GSEA analysis showed that these overlapping DEGs are mainly associated with pathways related to infection, neutrophil degranulation, and cellular immune response (Fig. 1E).

Fig. 1
figure 1

Identification of the common DEGs on the four platforms in the GSE25504 dataset. A Volcano plots of DEGs on four platforms. B The 28 duplicated DEGs were changed in EOS samples. C Names of the 28 overlapping DEGs. D PPI network construction. E Functional enrichment analysis

Identification of critical genes involved in neonatal EOS

According to the binomial LASSO analysis, when λmin was equal to 0.02353066, eight genes (CST7, CD3G, CD8B, CD247, SIRPG, GPR84, MAL, and ANKRD22) were identified that most accurately predicted EOS (Fig. 2A). Score = (1.54278655 × CST7) − (0.02890854 × cd247) − (0.78324710 × CD3G) − (0.95299570 × CD8B) − (0.88133909 × SIRPG) + (0.33541335 × GPR84) + 0.11719602 × ANKRD22) − (0.55729858 × MAL). As shown in Fig. 2B, infants with EOS had significantly higher scores than these normal infants on the four platforms. What is more, the characterization of this eight-genes signature showed good diagnostic power with AUCs of 1, 1, 0.905, and 0.923 on the four platforms, respectively (Fig. 2C).

Fig. 2
figure 2

Construction and ROC analysis of this eight-gene diagnostic model in the GSE25504 dataset. A Adjustment of parameter selection in binomial LASSO models by 10 times cross-validation. B Difference analysis of scores between normal and EOS infants. C ROC analysis of the eight-gene diagnostic model

Verification of the diagnostic model in a clinical cohort

To verify the capability of the model in the diagnosing of neonatal EOS in actual clinical practice, qRT-PCR analysis was performed in a clinical cohort. There were no differences in gender, age, and NEU% levels between the EOS and normal infant groups (Table 2). The types of bacteria that the EOS infants were infected with are shown in Fig. 3. EOS infants had higher levels of PCT and CRP and lower levels of Hb when compared with those in the normal group. All genes in the model except CD8B, SIRPG, GPR84, and MAL were differentially expressed in the peripheral blood of EOS infants and normal infants (Fig. 4A). Four genes that were not significantly different were not included when calculating infant scores using the same formula. ROC analysis revealed that this diagnostic model performed well in our clinical cohort (Fig. 4B). In addition, when compared with conventional inflammatory indicators such as C-reactive protein (CRP), hemoglobin (Hb), neutrophil percentage (NEU%), and procalcitonin (PCT), the model has better diagnostic performance.

Fig. 3
figure 3

Types of bacteria that infected EOS infants

Fig. 4
figure 4

Verification of this eight-gene diagnostic model in a clinical dataset. A Difference analysis of eight genes expression between normal and EOS infants. B ROC analysis of the diagnostic model and other conventional inflammatory indicators in the clinical cohort. ns, not significant; **p < 0.01; ***p < 0.001

Construction of the miRNA-mRNA network

Due to the results of miRNA target prediction analysis within the four online public websites, as shown in Fig. 5A, a total of 5 miRNAs targeting the 3′ UTR of ANKRD22, 7 miRNAs targeting the 3′ UTR of CD3G, 2 miRNAs targeting the 3′ UTR of CST7, and 9 miRNAs targeting the 3′ UTR of CD247 were identified. Then, the miRNA-mRNA network consisted of genes in our diagnostic model, and potential target miRNAs were constructed (Fig. 5B).

Fig. 5
figure 5

Construction of miRNA-mRNA network. A miRNA target prediction analysis within the four online public websites. B The miRNA-mRNA network consists of genes in our diagnostic model and potential target miRNAs

Discussion

Despite breakthroughs in prenatal care and antibiotic prophylaxis, neonatal EOS, unfortunately, remains the third leading cause of neonatal death worldwide, due to a lack of reliability in identifying those infants who are infected [19]. Blood cultures remain the current gold standard for the diagnosis of EOS, however, its sensitivity is low in neonates, and diagnosis is delayed [20]. As a result, many newborns with suspected EOS, especially premature infants, are routinely treated with broad-spectrum intravenous antibiotics for several days, although empirical antibiotic administration may have a potential negative impact on the growth and development of newborns [9]. Hence, to minimize unnecessary antibiotic exposure in newborns with suspected EOS, there is an urgent need in clinical neonatal management for a more reliable method to diagnose EOS that utilizes neonatal blood or tissue.

In our current bioinformatics analysis study, potential diagnostic biomarkers between the sepsis and control groups were identified on the four platforms of the GSE25504 dataset. Then, we selected 28 common DEGs to construct a binomial LASSO model. Finally, a prognostic model consisting of eight genes was built and showed good diagnostic power on the four platforms. To verify the capability of the model in the diagnosing of neonatal EOS in actual clinical practice, qRT-PCR analysis was performed in a clinical cohort that consisted of peripheral blood samples from 99 normal and 60 EOS infants. All genes in the model except CD8B, SIRPG, GPR84, and MAL were differentially expressed in the peripheral blood of EOS infants and normal infants. After the score of infants was calculated with the same formula based on the four DEGs (CST7, CD3G, CD247, and ANKRD22), ROC analysis revealed that this diagnostic model performed well in our clinical cohort. In addition, when compared with conventional inflammatory indicators such as CRP, Hb, NEU%, and PCT, the model has better diagnostic performance. All the above results indicated that the diagnostic model constructed in our study could separate EOS infants from normal infants.

Most of the four genes in our diagnostic model are more or less associated with sepsis. Neutrophil-specific CST7 was significantly upregulated in the whole blood of patients with sepsis, and its encoded Cysteine F was involved in regulating the cytotoxicity of natural killer (NK) cells within the tumor microenvironment [21, 22]. CD3G is an upregulated gene involved in T cells and has been reported to be inversely correlated with sequential organ failure (SOFA) and mortality in sepsis [23]. CD247 has been reported to be involved in human and murine sepsis by many studies, and it can be involved in the occurrence and development of sepsis as a key gene of sepsis [24,25,26,27,28,29]. As for homo sapiens ankyrin repeat domain 22 (ANKRD22), which encoded a specific mitochondrial protein, it has been demonstrated to be involved in the progression of multiple tumors, including colorectal cancer [30], breast cancer [31], pancreatic cancer [32], prostate cancer [33], and nonsmall-cell lung cancer [34]; however, there are few relevant studies on sepsis and further investigation is required.

When compared with the previous study, in which miRNAs obtained from umbilical cord plasma or umbilical cord tissue could well distinguish neonatal EOS from normal infants [35], this four-gene diagnostic model has a better discriminatory ability. In addition, the umbilical cord plasma or tissue may no longer be readily available when the newborn presents with signs of sepsis, making our model more practical. There is no doubt that our study has some limitations. The individual and geographic variability of EOS infants may affect the performance of this model. In addition, the small sample size in our clinical cohort limits the validation of the model, and future multicenter randomized controlled studies are needed to evaluate this model. Finally, our study did not include blood-culture-negative infants with EOS. Considering that blood-culture-positive and negative infants may have different peripheral blood transcriptome genetic changes, we need to collect blood-culture-negative infants with EOS and detect the expression changes of the four genes in their peripheral blood to determine whether the four-gene signature we constructed could identify septic infants with negative blood cultures in the future.

Conclusions

In summary, we constructed a four-gene diagnostic model that can accurately differentiate neonatal EOS with bacterial infection by bioinformatics analysis, which can be used as an ancillary test for the diagnosis of neonatal EOS with bacterial infection in the future.