Identification and validation of a novel four-gene diagnostic model for neonatal early-onset sepsis with bacterial infection

Neonatal early-onset sepsis (EOS) has unfortunately been the third leading cause of neonatal death worldwide. The current study is aimed at discovering reliable biomarkers for the diagnosis of neonatal EOS through transcriptomic analysis of publicly available datasets. Whole blood mRNA expression profiling of neonatal EOS patients in the GSE25504 dataset was downloaded and analyzed. The binomial LASSO model was constructed to select genes that most accurately predicted neonatal EOS. Then, ROC curves were generated to assess the performance of the predictive features in differentiating between neonatal EOS and normal infants. Finally, the miRNA-mRNA network was established to explore the potential biological mechanisms of genes within the model. Four genes (CST7, CD3G, CD247, and ANKRD22) were identified that most accurately predicted neonatal EOS and were subsequently used to construct a diagnostic model. ROC analysis revealed that this diagnostic model performed well in differentiating between neonatal EOS and normal infants in both the GSE25504 dataset and our clinical cohort. Finally, the miRNA-mRNA network consisting of the four genes and potential target miRNAs was constructed. Through bioinformatics analysis, a diagnostic four-gene model that can accurately distinguish neonatal EOS in newborns with bacterial infection was constructed, which can be used as an auxiliary test for diagnosing neonatal EOS with bacterial infection in the future. Conclusion: In the current study, we analyzed gene expression profiles of neonatal EOS patients from public databases to develop a genetic model for predicting sepsis, which could provide insight into early molecular changes and biological mechanisms of neonatal EOS. What is Known: • Infants with suspected EOS usually receive empiric antibiotic therapy directly after birth. • When blood cultures are negative after 48 to 72 hours, empirical antibiotic treatment is often halted. Needless to say, this is not a short time. Additionally, because of the concern for inadequate clinical sepsis production and the limited sensitivity of blood cultures, the duration of antibiotic therapy for the kid is typically extended. What is New: • We established a 4-gene diagnostic model of neonatal EOS with bacterial infection by bioinformatics analysis method. The model has better diagnostic performance compared with conventional inflammatory indicators such as CRP, Hb, NEU%, and PCT.


Introduction
Early-onset sepsis (EOS) in newborns is a multiorgan system dysfunction that can lead to severe neonatal morbidity and mortality when pathogenic microbial strains are isolated from peripheral blood or cerebrospinal fluid within 7 days of birth [1]. Acute kidney injury (AKI) is one of the most common conditions presenting with neonatal EOS [2]. In practice, many newborns with suspected EOS are given intravenous broad-spectrum antibiotics for several days, which may interfere with early breastfeeding, lead to bacterial dysbiosis, and increase the risk of morbidity for many diseases, such as type I diabetes, asthma, and necrotizing enterocolitis (NEC), although the incidence of neonatal EOS is less than 6 per 1,0000 [3][4][5][6]. Early antibiotic treatment of very low birth weight preterm infants with suspected sepsis without blood cultures is associated with an increased risk of subsequent morbidity and mortality [7,8]. In addition, no studies to date have shown that infants with EOS could benefit from antibiotic therapy. Due to the limited predictive power of routine laboratory tests such as complete blood count, erythrocyte sedimentation rate (ESR), procalcitonin (PCT), and C-reactive protein (CRP) to effectively distinguish neonatal EOS from suspected EOS, this may prolong the use of antibiotics in uninfected infants [9][10][11]. Hence, attempts to discover specific biomarkers of neonatal EOS and to reduce unnecessary antibiotic exposure in neonates with suspected sepsis are exceptionally important for clinical neonatal care. In the current study, we analyzed gene expression profiles of neonatal EOS patients from public databases to develop a genetic model for predicting sepsis, which could provide insight into early molecular changes and biological mechanisms of neonatal EOS.

Obtainment of the GSE25504 cohort and identification of differentially expressed genes (DEGs)
Whole blood mRNA expression profiling of neonatal EOS patients in the GSE25504 dataset was downloaded from gene expression omnibus (GEO) databases (http:// www. ncbi. nlm. nih. gov/ geo/), which included microarray data from four platforms (GPL570, GPL6947, GPL13667, and GPL15158). Then, the Limma R package with cut-off criteria of p value less than 0.05 and |logFC|> 0.5 was used to identify DEGs between the sepsis and control groups on four platforms of the GSE25504 dataset. The overlapping DEGs among the four platforms were used to identify gene set enrichment with Gene set enrichment analysis (GSEA) in the Metascape database [12]. Results were considered statistically significant when the p value was less than 0.05.

Construction of the least absolute shrink and selection operator (LASSO) model
Based on these overlapping DEGs, the binomial LASSO model was constructed using the glmnet package on the GPL6947 platform. LASSO is a more refined model obtained by constructing a penalty function that makes it compress some coefficients while setting some coefficients to zero. Therefore, it retains the advantage of subset shrinkage and is a kind of biased estimation dealing with data with complex covariance. The characteristic of LASSO regression is that when building a generalized linear model, variables can be selectively put into the model to obtain better performance parameters, thus avoiding overfitting. Here, the generalized linear model contains one-dimensional continuous dependent variables, multidimensional continuous dependent variables, nonnegative count dependent variables, binary discrete dependent variables, and multivariate discrete dependent variables. The complexity of LASSO is controlled by λ, and the larger λ penalizes linear models with more variables more strongly, so as to finally obtain a good model with fewer variables. The score was at the last setup based on the premise of directly combining the equation underneath with the mRNA expression level duplicating the LASSO regression coefficient (β) when λ min was confirmed. Score = (βmRNA1 × mRNA1) + (βmRNA2 × mRNA2) + … + (βmRNAn × mRN An). The accuracy of the predictive features was assessed by ROC examination. Receiver operating characteristic curve (ROC) analysis was used to examine the model's ability to discriminate between EOS and normal infants.

Clinical specimens and quantitative real-time PCR (qRT-PCR) analysis
In the period from 1 Jan 2022 to 30 Jun 2022, 99 normal and 60 EOS infants from Children's Hospital Affiliated to Zhengzhou University were included in this single-center retrospective case-control study. EOS infants presented with significant clinical signs and symptoms of sepsis (respiratory distress, anemia, fever, decreased absolute neutrophil count, or increased C-reactive protein) were confirmed with the results of positive blood culture. Peripheral blood mononuclear cells (PBMCs) were isolated from 1 mL of whole blood collected from each infant by density separation on a Ficoll-Paque. After total RNA was extracted from PBMCs, qRT-PCR was used to detect the mRNA levels of genes in the model [13]. Finally, the relative mRNA expression levels were normalized to β-ACTIN. Primer sequences are shown in Table 1.

Identification of miRNAs targeting genes in our diagnostic models
Perform miRNA target prediction analysis to identify miR-NAs that target genes in our diagnostic model by binding to their 3′ UTRs with four online public websites, including TargetScan [14], mirDIP [15], miRWalk [16], and miRmap [17] databases. Subsequently, target miRNAs predicted concordantly by the four databases were chosen to construct a miRNA-mRNA network by Cytoscape software [18].

Statistical analysis
Categorical data were compared with the Pearson chi-square test or Fisher exact test whenever appropriate, and quantitative variables were analyzed using the independent-sample t-test. The clinical characteristics of 159 infants are shown in Table 2. Results were considered statistically significant when the p value was < 0.05.

Identification of overlapped DEGs
As shown in Fig. 1A, DEGs on the four platforms were screened with a cut-off criterion of p value < 0.05 and |logFC|> 0.5; 20 upregulated genes and 8 down-regulated genes were identified as common DEGs (Fig. 1B), which are shown in Fig. 1C. We also constructed a protein-protein interaction (PPI) network based on these DEGs (Fig. 1D). Twenty-eight nodes and 11 edges were acquired from the PPI network. The local clustering coefficient was 0.332, and the PPI enrichment p value was equal to 4.78e − 05. We found that CD247, CD3G, CD8B, and LCK might be the core genes. Finally, the results of the GSEA analysis showed that these overlapping DEGs are mainly associated with pathways related to infection, neutrophil degranulation, and cellular immune response (Fig. 1E).

Identification of critical genes involved in neonatal EOS
According to the binomial LASSO analysis, when λ min was equal to 0.02353066, eight genes (CST7, CD3G, CD8B, CD247, SIRPG, GPR84, MAL, and ANKRD22) were identified that most accurately predicted EOS ( Fig. 2A).  Fig. 2B, infants with EOS had significantly higher scores than these normal infants on the four platforms. What is more, the characterization of this eight-genes signature showed good diagnostic power with AUCs of 1, 1, 0.905, and 0.923 on the four platforms, respectively (Fig. 2C).

Verification of the diagnostic model in a clinical cohort
To verify the capability of the model in the diagnosing of neonatal EOS in actual clinical practice, qRT-PCR analysis was performed in a clinical cohort. There were no differences in gender, age, and NEU% levels between the EOS and normal infant groups ( Table 2). The types of bacteria that the EOS infants were infected with are  CST7  GTG TGA AGC CAG GAT TTC CTAA  TGT CGT TCG TGC AGT TGT TGA  CD3G  TGG CCC AGT CAA TCA AAG GAA  CAA GTC AGA AGT ACC GAA CCATC  CD247  GGC ACA GTT GCC GAT TAC AGA  CTG CTG AAC TTC ACT CTC AGG  CD8B  AGA CCC CTG CAT ACA TAA AGGT  CGC TGT CTC AGC CAG TAG AT  SIRPG  CCC GGC ATC ATC CCT TAC TG  TTC CAG GGG ACG TAG ATG GG  ANKRD22  AGG GCA TGT GAG AAT CGT TTC  GTA GCA TTC GTA CAA GAG CCTC  MAL  ACC GCT GCC CTC TTT TAC C  GAA GCC GTC TTG CAT CGT GAT  GPR84 GTG CTG GGC TAT CGT TAT GTT GAA TCG GGT ACG GAG CTT GG β-ACTIN CGT GGG CCG CCC TAG GCA CCA TTG GCT TAG GGT TCA GGG GGG  Fig. 3. EOS infants had higher levels of PCT and CRP and lower levels of Hb when compared with those in the normal group. All genes in the model except CD8B, SIRPG, GPR84, and MAL were differentially expressed in the peripheral blood of EOS infants and normal infants (Fig. 4A). Four genes that were not significantly different were not included when calculating infant scores using the same formula. ROC analysis revealed that this diagnostic model performed well in our clinical cohort (Fig. 4B). In addition, when compared with conventional inflammatory indicators such as C-reactive protein (CRP), hemoglobin (Hb), neutrophil percentage (NEU%), and procalcitonin (PCT), the model has better diagnostic performance.

Construction of the miRNA-mRNA network
Due to the results of miRNA target prediction analysis within the four online public websites, as shown in Fig. 5A, a total of 5 miR-NAs targeting the 3′ UTR of ANKRD22, 7 miRNAs targeting the 3′ UTR of CD3G, 2 miRNAs targeting the 3′ UTR of CST7, and 9 miRNAs targeting the 3′ UTR of CD247 were identified. Then, the miRNA-mRNA network consisted of genes in our diagnostic model, and potential target miRNAs were constructed (Fig. 5B).

Discussion
Despite breakthroughs in prenatal care and antibiotic prophylaxis, neonatal EOS, unfortunately, remains the third leading cause of neonatal death worldwide, due to a lack of reliability in identifying those infants who are infected [19]. Blood cultures remain the current gold standard for the diagnosis of EOS, however, its sensitivity is low in neonates, and diagnosis is delayed [20]. As a result, many newborns with suspected EOS, especially premature infants, are routinely treated with broad-spectrum intravenous antibiotics for several days, although empirical antibiotic administration may have a potential negative impact on the growth and development of newborns [9]. Hence, to minimize unnecessary antibiotic exposure in newborns with suspected EOS, there is an urgent need in clinical neonatal management for a more reliable method to diagnose EOS that utilizes neonatal blood or tissue. In our current bioinformatics analysis study, potential diagnostic biomarkers between the sepsis and control groups were identified on the four platforms of the GSE25504 dataset. Then, we selected 28 common DEGs to construct a binomial LASSO model. Finally, a prognostic model consisting of eight genes was built and showed good diagnostic power on the four platforms. To verify the capability of the model in the diagnosing of neonatal EOS in actual clinical practice, qRT-PCR analysis was performed in a clinical cohort that consisted of peripheral blood samples from 99 normal and 60 EOS infants. All genes in the model except CD8B, SIRPG, GPR84, and MAL were differentially expressed in the peripheral blood of EOS infants and normal infants. After the score  of infants was calculated with the same formula based on the four DEGs (CST7, CD3G, CD247, and ANKRD22), ROC analysis revealed that this diagnostic model performed well in our clinical cohort. In addition, when compared with conventional inflammatory indicators such as CRP, Hb, NEU%, and PCT, the model has better diagnostic performance. All the above results indicated that the diagnostic model constructed in our study could separate EOS infants from normal infants.
Most of the four genes in our diagnostic model are more or less associated with sepsis. Neutrophil-specific CST7 was significantly upregulated in the whole blood of patients with sepsis, and its encoded Cysteine F was involved in regulating the cytotoxicity of natural killer (NK) cells within the tumor microenvironment [21,22]. CD3G is an upregulated gene involved in T cells and has been reported to be inversely correlated with sequential organ failure (SOFA) and mortality in sepsis [23]. CD247 has been reported to be involved in human and murine sepsis by many studies, and it can be involved in the occurrence and development of sepsis as a key gene of sepsis [24][25][26][27][28][29]. As for homo sapiens ankyrin repeat domain 22 (ANKRD22), which encoded a specific mitochondrial protein, it has been demonstrated to be involved in the progression of multiple tumors, including colorectal cancer [30], breast cancer [31], pancreatic cancer [32], prostate cancer [33], and nonsmall-cell lung cancer [34]; however, there are few relevant studies on sepsis and further investigation is required.
When compared with the previous study, in which miR-NAs obtained from umbilical cord plasma or umbilical cord tissue could well distinguish neonatal EOS from normal infants [35], this four-gene diagnostic model has a better discriminatory ability. In addition, the umbilical cord plasma or tissue may no longer be readily available when the newborn presents with signs of sepsis, making our model more practical. There is no doubt that our study has some limitations. The individual and geographic variability of EOS infants may affect the performance of this model. In addition, the small sample size in our clinical cohort limits the validation of the model, and future multicenter randomized controlled studies are needed to evaluate this model. Finally, our study did not include blood-culture-negative infants with EOS. Considering that blood-culture-positive and negative infants may have different peripheral blood transcriptome genetic changes, we need to collect blood-culture-negative infants with EOS and detect the expression changes of the four genes in their peripheral blood to determine whether the four-gene signature we constructed could identify septic infants with negative blood cultures in the future.

Conclusions
In summary, we constructed a four-gene diagnostic model that can accurately differentiate neonatal EOS with bacterial infection by bioinformatics analysis, which can be used as an ancillary test for the diagnosis of neonatal EOS with bacterial infection in the future.