Abstract
COVID-19 is associated with heterogeneous outcome. Early identification of a severe progression of the disease is essential to properly manage the patients and improve their outcome. Biomarkers reflecting an increased inflammatory response, as well as individual features including advanced age, male gender, and pre-existing comorbidities, are risk factors of severe COVID-19. Yet, these features show limited accuracy for outcome prediction. The aim was to evaluate the prognostic value of whole blood transcriptome at an early stage of the disease. Blood transcriptome of patients with mild pneumonia was profiled. Patients with subsequent severe COVID-19 were compared to those with favourable outcome, and a molecular predictor based on gene expression was built. Unsupervised classification discriminated patients who would later develop a COVID-19-related severe pneumonia. The corresponding gene expression signature reflected the immune response to the viral infection dominated by a prominent type I interferon, with IFI27 among the most over-expressed genes. A 48-genes transcriptome signature predicting the risk of severe COVID-19 was built on a training cohort, then validated on an external independent cohort, showing an accuracy of 81% for predicting severe outcome. These results identify an early transcriptome signature of severe COVID-19 pneumonia, with a possible relevance to improve COVID-19 patient management.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Since its outbreak in late 2019, the coronavirus disease 2019 (COVID-19) pandemic has seen a considerable improvement, with a decrease of detected cases and mortality worldwide (Coronavirus Disease (COVID-19) Situation Reports n. d.). However, patient management can still be challenging since the clinical course can be highly heterogeneous, with a wide spectrum of biological responses and clinical manifestations, going from mild respiratory symptoms, to lung injury and pneumonia, and, in more severe and critical cases, to multiple organ failure and death (Osuchowski et al. 2021).
From an immunological point of view, severe COVID-19 and poor prognosis associate with a specific profile showing an exaggerated immune response, characterized by a “cytokine storm”, with high levels of IL-6 and IL-10 among others, and by a dramatic changes in blood cell sub-populations, with decreased lymphocytes, T-cell subsets, eosinophils, and platelets, and increased neutrophils and neutrophils-to-lymphocytes ratio (Chen et al. 2020; Carissimo et al. 2020; Martín-Sánchez et al. 2021; Hadjadj et al. 2020; Mann, et al. 2020; Laing et al. 2020; Ahern et al. 2022). Longitudinal studies analysing the leukocyte transcriptome of COVID-19 patients have described a molecular profile characterized by robust overrepresentation of interferon-related gene expression, marked decrease of transcriptional levels of genes contributing to general protein synthesis and bioenergy metabolism, and dysregulated expression of genes associated with coagulation, platelet function, complement activation, and TNF/IL-6 signalling (Ahern et al. 2022; Gill et al. 2020; Daamen et al. 2022; Yan et al. 2021; McClain et al. 2021).
Early identification of patients at risk of progressing to severe disease is essential for patient management and for the administration of specific and individualized therapies, in order to improve patient outcome and to optimize the allocation of healthcare resources. Several laboratory biomarkers, including parameters related to increased inflammatory response (such as lymphopenia, neutrophilia, raised C-reactive protein), have been associated with COVID-19 severity, hospitalization, intensive care unit admission, and mortality (Zakeri et al. 2021). Moreover, a higher risk of severe progression and mortality is related to individual factors, including older age, male gender and the presence of diverse comorbidities, such as overweight and obesity, chronic lung disease, immune depression, diabetes, hypertension and malignancy (Li et al. 2021). Attempts to build a clinical predictor score to estimate an individual hospitalized patient’s risk of developing critical illness has been made, yet with limited accuracy (Lombardi et al. 2021; Liang et al. 2020). Up to now, it is not clear whether other markers, such as molecular classifiers built on blood-based features, can improve the early prediction of this risk. Indeed, longitudinal studies on blood transcriptome and proteome have identified strong signatures associating with COVID-19 severity (Buturovic et al. 2022; Ng, et al. 2021; Kwan et al. 2021; Lee et al. 2021). Whether these markers can be used for early prognostic prediction is not known.
Here, we performed an ancillary cross-sectional study within the prospective longitudinal COVIDeF cohort, focusing on the time of first clinical evaluation of patients with mild COVID-19 pneumonia referred to Assistance Publique-Hôpitaux de Paris hospitals (Paris, France) during the first pandemic wave. We analysed the whole blood transcriptome in order to identify an early signature predicting severe progression of the disease. A control cohort of patients with non-COVID-19 pneumonia was included to warrant the specificity of the COVID-19 prognostic signature.
Results
Cohort presentation
A total of 159 samples, collected from patients enrolled during the first COVID-19 outbreak (April-June 2020) and presenting mild pneumonia at the moment of first clinical evaluation, were analysed. They included 115 patients with COVID-19 diagnosis and 44 patients with pneumonia not related to COVID-19. Among the 115 patients diagnosed with COVID-19, 11 evolved towards severe pneumonia, 51 towards intermediate pneumonia and 53 towards mild pneumonia (Supplementary Table S1).
In line with established risks factors (Li et al. 2021; Bergantini et al. 2021), the risk of evolution towards severe pneumonia was associated with age, diabetes, temperature, C-reactive protein (CRP), procalcitonin, fibrinogen, neutrophils and lymphocytes at inclusion, oxygen saturation and CT scan findings (Table 1).
Unsupervised transcriptome-based classification of samples
Unsupervised principal component analysis of the whole transcriptome dataset discriminated patients with COVID-19 from controls (first principal component PC1), and patients who would later develop a severe COVID-19 pneumonia from those with mild or intermediate evolution (second principal component PC2, Fig. 1a). Over-representation analysis of the top genes most contributing to PC1 showed an enrichment in signalling pathways mainly related to neutrophil activation, while the top 100 genes most contributing to PC2 were enriched in signalling pathways related to immune response to virus infection, including complement activation, regulation of humoral immune response, response to type I interferon, and regulation of viral genome replication (Supplementary Table S2). Consistently, the most contributing gene to PC1 was CD177, a marker of neutrophil activation. For PC2, the most contributing genes were IFI27, involved in type I interferon cell response, and OTOF, both over-expressed in COVID-19 patients and associating with the severity of evolution (Fig. 1b).
Global blood transcriptome discriminates patients depending on the type of pneumonia and the evolution of COVID-19 pneumonia. a) Sample projections based on the combination of the first two principal components (PC1, PC2) of unsupervised PCA performed on the whole dataset (n = 16001 genes, n = 159 samples). The center of each group is indicated by the larger circles. b) Boxplot of CD177, IFI27 and OTOF gene expression in the different group of analysis. *Student’s T-test p-value < 0.05, **Student’s T-test p-value < 0.001; ***Student’s T-test p-value < 10e-6
Considering the variability possibly due to blood cell composition, we inferred the score of different blood cell subtypes for each sample (Supplementary Table S3), then we evaluated the impact of cell composition on sample classification. Globally, compared to controls, COVID-19 samples showed lower neutrophils and higher lymphocytes and lymphocyte subtypes (Table 1, Supplementary Table S4). Indeed, inverse correlation was observed between neutrophils and global lymphocytes proportion, and between neutrophils and lymphocyte T CD8 + and CD4 + memory resting subtypes in particular (Supplementary Figure S1a). However, the variability due to the global blood formula alone couldn’t properly discriminate COVID-19 patients in terms of pneumonia evolution (Supplementary Figure S1b).
Blood early transcriptome signature of COVID-19 pneumonia
By comparing COVID-19 samples (n = 115) to controls (n = 44), and after adjustment on age and blood cell composition, we identified 68 differentially expressed genes (Benjamin-Hochberg adjusted p-value < 0.05 and a logFC > 1.5; Supplementary Table S5), mostly over-expressed in COVID-19 (n = 52/68). Gene ontology analysis of these over-expressed genes in the COVID-19 samples showed an enrichment in pathways related to virus response mainly involving type I interferon signalling (Fig. 2a and b; Supplementary Figure S2; Supplementary Table S6).
Differentially expressed genes in early COVID-19 pneumonia. a) Volcano plot of the top differentially expressed genes in COVID-19 (n = 115) versus controls (n = 44). b) Dot plot of the 10 most GO enriched signalling pathways of the differentially over-expressed genes in COVID-19 samples versus controls
Blood early transcriptome signature of future severe COVID-19 pneumonia
In patients with early COVID-19 pneumonia, blood samples were compared between those evolving towards severe pneumonia (n = 11) and those remaining mild (n = 53). After adjustment on age and blood cell composition, we identified 345 differentially expressed genes (Benjamin-Hochberg adjusted p-value < 0.05 and a logFC > 1.5; Supplementary Table S7). The enriched signalling pathways were represented by a response to virus infection involving a response to type I interferon, as assessed by GSEA analysis (Fig. 3a and b; Supplementary Figure S3; Supplementary Table S8).
Differentially expressed genes in patients with early COVID-19 pneumonia evolving towards severity. a) Volcano plot of the differentially expressed genes in patients with severe (n = 11) versus mild (n = 53) evolution. b) Dot plot of the top 10 activated and the top 10 suppressed signalling pathways enriched for the differentially expressed genes in patients with future severe COVID-19 pneumonia. c) Venn diagram representation of differentially expressed genes in patients with severe COVID-19 pneumonia in our study (in red) and in the studies of Wang et al. and Jackson et al. (in orange) (Jackson et al. 2022; Wang et al. 2022). d) Venn diagram representation of differentially expressed genes in patients with severe COVID-19 pneumonia in our study (in red) and in patients with severe Influenza infection in the studies of Zerbib et al. and Dunning et al. (in green) (Zerbib et al. 2020; Dunning et al. 2018)
We then tested how similar our early transcriptome signature of future severe COVID-19 pneumonia was with the longitudinal signature of evolution from mild towards severe COVID-19 pneumonia. For that aim, we analysed the overlap of our signature with published signatures reflecting the longitudinal evolution of blood transcriptome towards severe COVID-19 pneumonia (Jackson et al. 2022; Wang et al. 2022). Similarities were observed, including increased expression of CD177, IFI27 and OTOF (Fig. 3c; Supplementary Table S9). Of note, CD177 was also common when studying the overlap between our early signature and differentially expressed genes associated with the evolution towards severe Influenza infection (Fig. 3d; Supplementary Table S10), another virus infection, underlining the importance neutrophil induction beyond interferon activation in virus infections (Zerbib et al. 2020; Dunning et al. 2018).
Early prediction of severe forms of COVID-19
To select a limited set of genes predicting severity of COVID-19, we trained an Elastic Net-penalized linear model on the sub-cohort of COVID-19 mild pneumonia patients with severe or mild evolution of the disease (n = 11 and n = 53, respectively), starting from the 2500 most variable genes. Forty-eight genes were selected (Supplementary Table S11), properly discriminating severe from mild evolution of COVID-19 pneumonia in the training cohort. Of note, patients with intermediate COVID-19 pneumonia outcome—not used for the training—were scattered between patients with mild and severe COVID-19 pneumonia outcome and were referred to as a grey zone. Using receiver operated characteristic (ROC) curve analysis, optimal thresholds (-0.02 and 7.69) were identified on the first component of the 48-genes principal component analysis projection (Supplementary Figures S4-6). Using an independent validation cohort of 77 patients (28 with severe outcome, 23 with intermediate outcome and 26 with mild outcome), we could confirm the classification performance (Fig. 4). Sensitivity, specificity and accuracy for predicting severe outcome were 0.64, 0.91 and 0.81 respectively (Supplementary Table S12). In a multivariate model combining the 48-genes predictor, age, sex and blood cell composition, the 48-genes predictor remained highly significant of severe outcome against mild outcome (logistic regression p-value < 0.001; Table 2). Of note, this signature was not discriminant between Covid and control patients (data not shown).
Classification of samples based on the 48 selected genes discriminating COVID-19 patients depending on pneumonia evolution. Samples projection based on the two principal components (PC1, PC2) of unsupervised PCA performed using the 48 genes selected by Elastic net regression on the training cohort. In faint circles are presented the samples from the training cohort, on which the optimization of gene selection was operated. In bright squares are presented the samples from the external independent validation cohort
Post hoc analyses showed a positive correlation between the 48-genes predictor and the following biochemical variables at admission: C-reactive protein (r = 0.65, p = 2.603e-07), procalcitonin (r = 0.70, p-value = 2.073e-06) and fibrinogen (r = 0.62, p-value = 4.364e-05). Of note, diabetes, a well-established risk factor for the evolution towards severe COVID-19 pneumonia, was weakly correlated with the 48-genes predictor (r = 0.37, p-value = 0.0256).
Ability of the 48-gene predictor of severe outcome to monitor longitudinal evolution towards severe COVID-19 pneumonia
To explore the longitudinal performance of our transcriptomic signature, we computed our 48-gene predictor in a published cohort (Supplementary Fig. 7) (Wang et al. 2022). Globally, we found a positive correlation between the 48-genes predictor values and COVID-19 severity assessed by the WHO severity level (r = 0.53, p-value = 6.945e-16; Supplementary Fig. 8). For patients with 2 to 4 COVID-19 severity levels, the longitudinal evolution of the 48-genes predictor showed a decrease, while for patients with 6 to 9 COVID-19 severity levels, the longitudinal evolution was more variable (Fig. 5). Finally, among the 8 patients with a change in the COVID-19 severity level, with our 7.69 threshold, the 48-genes predictor was globally discriminating the patients with a worsening pneumonia (Fig. 5).
Longitudinal ability of the 48-genes predictor to monitor evolution of COVID-19 pneumonia towards severity. The red and blue curves represent the mean value of the 48-genes predictor over time in 13 patients with 6–9 and 14 patients with 2–4 WHO COVID-19 severity levels respectively. For 8 patients with a change of severity level during follow-up, individual values are provided (broken lines), with colours reflecting the severity level at each time point during follow-up. The dashed horizontal lines indicate the 48-genes predictor thresholds established on the training cohort
Discussion
In this study focusing on patients with early-stage COVID-19 pneumonia, we identified a blood transcriptome signature predicting the risk of evolving towards a severe pneumonia. This signature could help to improve patients’ management, by proposing specific surveillance and treatments. This signature corresponds to a differential inflammatory profile between patients with severe and mild outcome, with the implication of humoral immune response, complement activation and interferon signalling pathway. This signature includes the over expression of inflammation markers previously reported in patients with severe COVID-19 pneumonia, such as IFI27 (Shojaei et al. 2023) or CD177 (Jackson et al. 2022; Wang et al. 2022; Lévy et al. 2019; An et al. 2021; An et al. 2022).This signature shows a gradient of expression from mild to intermediate and severe forms. Thus, the inflammatory signature observed in patients with severe COVID-19 pneumonia seems to be present early in the course of the disease, and to reflect the risk of developing a severe outcome.
Based on this observation we designed a predictor, with optimal selection of transcriptome biomarkers able to classify patients depending on their COVID-19 pneumonia evolution. This predictor could be validated on an independent validation cohort, where patients were evaluated longitudinally. However, with an accuracy of 0.81 the prediction of evolution towards a severe COVID-19 pneumonia is not absolute. This relates to the limited sensitivity in the validation cohort, with some COVID-19 patients with severe outcome grouped with patients with intermediate outcome. Nevertheless, taking into account the importance of both identifying early patients at risk of evolving towards a severe pneumonia and optimizing healthcare resources, in the validation cohort, none of the patients with severe COVID-19 pneumonia outcome were predicted as mild and only one patient with an actual mild COVID-19 pneumonia outcome was predicted as severe. In addition, the clinical criteria for classifying COVID-19 patients in the validation cohort, and the time of sampling during the course of the disease may have impacted the evaluation of accuracy on the validation cohort. Another limitation of the prediction of outcome is the broad distribution of patients evolving towards intermediate forms of pneumonia, with some overlap with patients evolving towards mild pneumonia -mainly in the training cohort-, and with those evolving towards severe pneumonia -mainly in the validation cohort. Though being important, the proper evaluation of intermediate patients is difficult for two reasons. Firstly, no clear clinical definition is available, with variable diagnostic criteria and thresholds to define intermediate Covid pneumonia (Jackson et al. 2022; Wang et al. 2022). Including these intermediate patients would have increased the size of the training cohort, but also the risk of misclassification. Limiting the inclusion to “mild” and “severe” classes warranted well-defined labels for training the predictor. Secondly, the cohort size is not large enough to assess the existence of statistically significant thresholds in the 48-genes predictor in patients with intermediate COVID-19 pneumonia. However, in our study, these intermediate patients indisputably fall in-between patients with mild and severe outcome as a continuum and they probably represent patients who deserve a closely clinical surveillance thus globally strengthening the validity of transcriptome prediction. This grey zone may also reflect a biological variability, and thus a certain limitation of the ability to predict outcome with this technique. Another potential issue in this study is the risk of overfitting due to the limited number of patients and the high number of features. This risk was mitigated by the cross-validation strategy in the training cohort, and the validation in two independent cohorts. Finally, a post hoc association between the 48-genes predictor and clinical and biochemical variables showed several significant associations. However, the prognostic value of these associated variables could not be tested in the final multivariate model due to the limited cohort size, and to the limited data available in the independent validation cohort.
This prognostic signature appears specific to COVID-19 patients, compared to controls (patients with non-COVID-19 pneumonia). Compared to controls, COVID-19 patients present a transcriptome signature reflecting pathways related to virus response mainly involving type I interferon response. Type 1 interferon activation has been well established when COVID-19 progresses towards severe pneumonia in longitudinal series, with several markers reported including IFI27, SIGLEC1, OAS1/2, IFI44, IFI44L, ISG15 (Shaath et al. 2020; Krämer et al. 2021; Masood et al. 2021; Khorramdelazad et al. 2022; Xu et al. 2022). Another potentially relevant gene identified here is OTOF, associated with inflammation and described as a type I IFN-induced effector (Roberson et al. 2022; Ding et al. 2022). Our results show that interferon type 1 activation occurs early in the course of COVID-19 pneumonia. This signature could contribute to diagnose the SARS-CoV-2 infection.
In addition, beyond COVID-19 pneumonia, to which extent this type 1 interferon signature is systematically present in SARS-CoV-2 infected patients remains to be established, especially in patients with mild or asymptomatic forms of the disease.
This study comes after several publications showing the implication of type I interferon in the evolution towards severe COVID-19 pneumonia. However, contrasting with a vast majority of previous works, this study is focusing on early-stage patients, when COVID-19 pneumonia is still mild. The clinical characterization is quite extensive, and the follow-up well documented, enabling a proper classification in terms of outcome. This original design was required for demonstrating the existence of a signature predicting the outcome. Of note, the inclusion of patients is restricted to the first outbreak wave in the training cohort. To which extent do these signatures stand with the new SARS-COV2 variants remains to be established. In addition, the relative proportion of severe pneumonia in the subsequent outbreak waves decreased, in the context of the progressive immunisation of population through vaccines and history of COVID-19 infections. However, severe pneumonia still occurs, and disease complications are still challenging to predict at individual levels. The early transcriptome signature proposed here may help improving this challenging detection.
In conclusion, whole blood transcriptome is able to early predict the outcome of COVID-19 pneumonia. This discrimination mainly relies on type 1 interferon activation, along with other immune alterations, which are already present at an early stage of the disease in patients later developing a severe pneumonia.
Methods
Patients and samples
A total of 159 patients presenting an early-stage pneumonia at the moment of first clinical evaluation at the hospital were recruited prospectively between April and June 2020 in Assistance Publique-Hôpitaux de Paris hospitals (Paris, France), as part of the multicentre longitudinal COVIDeF cohort (NCT04352348). Early-stage COVID-19 pneumonia was defined as confirmed SARS-Cov-2 infection, requiring at the admission supplemental oxygen but not ≥ 6 L/minute, or characterized by oxygen saturation < 95% or by the presence of one or more pneumonia morphologic criteria (CT scan or chest X-ray). COVID-19 diagnosis was made for 115 patients, based on positive SARS-CoV-2 PCR and/or serology test (n = 79), and/or on the presence of typical clinical symptoms with CT findings (n = 36). Whole blood samples for transcriptome analysis were collected at the moment of first clinical evaluation and only patients with an early-stage pneumonia were included. Disease evolution was evaluated on a time lap of 14 days after inclusion, and patients’ status was classified as mild (n = 53), intermediate (n = 51) or severe (n = 11) pneumonia, according to the onset and evolving severity of COVID-19 complications, including hospitalization duration, need for oxygen supply, mechanical ventilation, or extra corporeal oxygenation, death. Specifically: i) patients who didn’t need oxygen supply, either hospitalized or not, were classified as having mild progression COVID-19 pneumonia; ii) patients who were hospitalized with standard oxygen supply were classified as having intermediate progression COVID-19 pneumonia; iii) patients who were hospitalized and received mechanical ventilation, or extra corporeal oxygenation, or deceased, were classified as having severe progression COVID-19 pneumonia. Cases of pneumonia of other etiology (n = 44), used as controls, corresponded to patients with negative PCR and not diagnosed as COVID-19 by the physician based on CT findings.
RNA collection and extraction
Whole blood samples were collected into PAXgene tubes (PreAnalytiX, Hombrechtikon, Switzerland), following the manufacturer’s instruction. Total RNA was extracted on a QIAcube extractor, following the manufacturer’s instruction (Qiagen, Hilden, Germany), at the CRB platform (Saint Antoine hospital, Paris). Quantification and quality control of RNA were performed on a 2100 Bioanalyser System (Agilent Technologies, Inc., Santa Clara, CA, US). All samples passed the integrity quality control (RIN > 7).
Transcriptome data generation
Quantification and quality control of nucleic acids was performed by capillary migration on a Fragment Analyzer (Agilent Technologies, Inc.). Starting from 100 ng total RNA, mRNAs poly(A) were selected using oligo dT magnetic beads (NEBNext® Poly(A) mRNA Magnetic Isolation Module, New England Biolabs, Ipswich, MA, US), fragmented at around 300 bp and converted to oriented DNA (NEBNext® Ultra™ II RNA First Strand Synthesis Module & Directional RNA Second Strand Synthesis Module, New England Biolabs). Size selection and purification were performed using magnetic beads (Sera-Mag magnetic beads, GE Healthcare, Chicago, IL, US), and libraries were prepared (NEBNext® Ultra™ II End repair/A-tailing Module & Ligation Module, New England Biolabs), amplified by PCR (KAPA Hifi HotStart ReadyMix, Roche, Basil, Switzerland), quantified by qPCR (NEBNext® Custom 2X Library Quant Kit Master Mix, New England Biolabs; QuantStudio 6 Flex Real-Time PCR System, Life Technologies, Carlsbad, CA, US) and the related size profile was analyzed by capillary migration on a Fragment Analyzer (Agilent Technologies, Inc.). Paired-end sequencing (twice 100 cycles) was performed by “sequencing-by-synthesis” technology on a Flow Cell S2 NovaSeq 6000 platform (Illumina, San Diego, CA, US). Transcript quantification was done using the Salmon tool (Patro et al. 2017) (v.1.4.0) on transcriptome reference from GENCODE (Frankish et al. 2021) (release 33—GRCh38.p13).
Bioinformatics analyses
Quality control was performed on raw count matrix. All samples passed this control. Counts were aggregated for transcripts corresponding to the same gene, and only genes with a count sum > 0 in all samples were further considered. Globin genes were also discarded, as previously published (Harrington et al. 2020).
Counts were normalized with DESeq2 (Love et al. 2014) (v.1.24.0). From gene counts, blood cell composition was inferred using the online CIBERSORTx tool (Stanford University 2022) (Newman et al. 2019), with the following parameters: B-mode batch correction, disabled quantile normalization, absolute mode, n = 500 permutations. For each cell types, a score is generated, that reflects the abundance of each cell type in a mixture. Given the high correlation of each blood cell type proportion with neutrophils proportion (Supplementary Figure S1a), neutrophils proportion was chosen as a unique proxy of cell blood composition.
The edgeR package (Robinson et al. 2010) (v.3.26.8) was used to read and pre-process the data before analysis: raw counts were converted to counts per million (CPM), and lowly expressed genes were removed using a CPM > 1 in at least 3 samples as cut-off, obtaining a final dataset of n = 16,001 genes and n = 159 samples. Normalization was then performed by using the trimmed mean of M-values (TMM) method (Robinson and Oshlack 2010), as implemented in the edgeR package. The same packages and method were used to process and normalize data from the validation cohort (Ahern et al. 2022) and the additional longitudinal cohort (Wang et al. 2022). Global data structure was assessed on log2-CPM of normalized data by unsupervised principal component analysis (PCA). This method was chosen for the interpretability of PCA axes for deciphering the biological meaning of the observed variability. Over-representation analysis of genes most contributing to PCA components was performed by using the clusterProfiler package (Wu et al. 2021) (v.3.12.0). Of note, the edgeR normalisation did not significantly modify the normalized expression levels compared to CIBERSORTx (gene expression correlation r = 0.999, p-value < 2.2e-16).
To remove heteroscedasticity of counts data, normalized data were transformed using the voom function (Liu et al. 2015) implemented in the edgeR package. Differential expression analysis was performed by applying linear modelling using the limma package (Ritchie et al. 2015) (v. 3.40.6), including the estimated neutrophils count and age as covariates in the model matrix. Differentially expressed genes were selected using a Benjamin-Hochberg adjusted p-value < 0.05 and a logFC > 1.5 as cut-offs. Gene set enrichment analysis (GSEA) of differentially expressed genes was performed using the clusterProfiler package.
For predicting COVID-19 severity from transcriptome, gene selection was performed on the sub-cohort of severe and mild cases, by fitting an Elastic Net regularized regression (α = 0.5) on the most variable genes (n = 2500), with a tenfold cross-validation, using the glmnet package (Friedman et al. 2010) (v. 4.1–1). The predictive model, combining 48 discriminating genes, was assessed on an independent validation cohort from the whole blood RNAseq dataset recently published by Ahern et al9. Seventy-seven samples were selected (26 mild, 23 intermediates and 28 severe COVID-19 pneumonia, based on classification criteria similar to those we used in our cohort), using the following criteria: confirmed COVID-19 diagnosis and last sample for the same patient with multiple sampling time points.
Another cohort was used to explore the ability of the 48-genes predictor to monitor the longitudinal evolution of patients (Wang et al. 2022). PCA using the 48-genes predictor was performed using the PCA weights from the training cohort.
Statistical analyses
Quantitative variable correlations were performed using Pearson’s test. Quantitative and qualitative variable comparisons between groups were performed using Kruskal–Wallis’s test and Fisher’s test, respectively. ROC curve analysis was performed and the optimal thresholds predictive of COVID-19 outcome were defined using Youden’s J index. A multivariate analysis was performed on the combined samples of the training and the external independent cohorts, using a logistic regression model on 4 variables: the 48-genes predictor (based on the PC1 coordinates of the 48-genes principal component analysis projection), the neutrophils score, and the patients’ age and sex. All p-values were two-sided and adjusted for multiple comparisons using Benjamini-Hochberg’ method. The level of significance was set at adjusted p-value < 0.05. All tests were computed in R software environment.
Data availability
The dataset generated and analysed during the current study is available in the EMBL-EBI BioStudies repository (reference number: S-BSST1135; https://www.ebi.ac.uk/biostudies/studies/S-BSST1135?key=62d4dc30-e0d4-4f4b-891f-58c5777a0cd3).
References
Ahern DJ et al (2022) A blood atlas of COVID-19 defines hallmarks of disease severity and specificity. Cell 185:916-938.e58
An S et al (2021) Genome-Wide Profiling Reveals Alternative Polyadenylation of Innate Immune-Related mRNA in Patients With COVID-19. Front Immunol 12:756288
An S et al (2022) Systematic analysis of clinical relevance and molecular characterization of m6A in COVID-19 patients. Genes Dis 9:1170–1173
Bergantini L et al (2021) Prognostic bioindicators in severe COVID-19 patients. Cytokine 141:155455
Buturovic L et al (2022) A 6-mRNA host response classifier in whole blood predicts outcomes in COVID-19 and other acute viral infections. Sci Rep 12:889
Carissimo G et al (2020) Whole blood immunophenotyping uncovers immature neutrophil-to-VD2 T-cell ratio as an early marker for severe COVID-19. Nat Commun 11:5243
Chen R et al (2020) Longitudinal hematologic and immunologic variations associated with the progression of COVID-19 patients in China. J Allergy Clin Immunol 146:89–100
Coronavirus Disease (COVID-19) Situation Reports (n. d.) https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports
Daamen AR et al (2022) COVID-19 patients exhibit unique transcriptional signatures indicative of disease severity. Front Immunol 13:989556
Ding H et al (2022) Membrane Protein OTOF Is a Type I Interferon-Induced Entry Inhibitor of HIV-1 in Macrophages. Mbio 13:e0173822
Dunning J et al (2018) Progression of whole-blood transcriptional signatures from interferon-induced to neutrophil-associated patterns in severe influenza. Nat Immunol 19:625–635
Frankish A et al (2021) GENCODE 2021. Nucleic Acids Res 49:D916–D923
Friedman J, Hastie T, Tibshirani R (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 33:1–22
Gill SE et al (2020) Transcriptional profiling of leukocytes in critically ill COVID19 patients: implications for interferon response and coagulation. Intensive Care Med Exp 8:75
Hadjadj J et al (2020) Impaired type I interferon activity and inflammatory responses in severe COVID-19 patients. Science. https://doi.org/10.1126/science.abc6027
Harrington CA et al (2020) RNA-Seq of human whole blood: evaluation of globin RNA depletion on Ribo-Zero library method. Sci Rep 10:6271
Jackson H et al (2022) Characterisation of the blood RNA host response underpinning severity in COVID-19 patients. Sci Rep 12:12216
Khorramdelazad H et al (2022) Type-I interferons in the immunopathogenesis and treatment of Coronavirus disease 2019. Eur J Pharmacol 927:175051
Krämer B et al (2021) Early IFN-α signatures and persistent dysfunction are distinguishing features of NK cells in severe COVID-19. Immunity 54:2650-2669.e14
Kwan PKW et al (2021) A blood RNA transcriptome signature for COVID-19. BMC Med Genomics 14:155
Laing AG et al (2020) A dynamic COVID-19 immune signature includes associations with poor prognosis. Nat Med 26:1623–1635
Lee J-S et al (2021) Longitudinal proteomic profiling provides insights into host response and proteome dynamics in COVID-19 progression. Proteomics 21:e2000278
Lévy Y et al (2021) CD177, a specific marker of neutrophil activation, is associated with coronavirus disease 2019 severity and death. iScience 24:102711
Li J et al (2021) Epidemiology of COVID-19: A systematic review and meta-analysis of clinical characteristics, risk factors, and outcomes. J Med Virol 93:1449–1458
Liang W et al (2020) Development and Validation of a Clinical Risk Score to Predict the Occurrence of Critical Illness in Hospitalized Patients With COVID-19. JAMA Intern Med 180:1081–1089
Liu R et al (2015) Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. Nucleic Acids Res 43:e97
Lombardi Y et al (2021) External validation of prognostic scores for COVID-19: a multicenter cohort study of patients hospitalized in Greater Paris University Hospitals. Intensive Care Med 47:1426–1439
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550
Mann ER et al (2020) Longitudinal immune profiling reveals key myeloid signatures associated with COVID-19. Sci Immunol 5:eabd6197
Martín-Sánchez E et al (2021) Immunological Biomarkers of Fatal COVID-19: A Study of 868 Patients. Front Immunol 12:659018
Masood KI et al (2021) Upregulated type I interferon responses in asymptomatic COVID-19 infection are associated with improved clinical outcome. Sci Rep 11:22958
McClain MT et al (2021) Dysregulated transcriptional responses to SARS-CoV-2 in the periphery. Nat Commun 12:1079
Newman AM et al (2019) Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 37:773–782
Ng DL et al (2021) A diagnostic host response biosignature for COVID-19 from RNA profiling of nasal swabs and blood. Sci Adv 7:eabe5984
Osuchowski MF et al (2021) The COVID-19 puzzle: deciphering pathophysiology and phenotypes of a new disease entity. Lancet Respir Med 9:622–642
Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417–419
Ritchie ME et al (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43:e47
Roberson EDO et al (2022) Transcriptomes of peripheral blood mononuclear cells from juvenile dermatomyositis patients show elevated inflammation even when clinically inactive. Sci Rep 12:275
Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11:R25
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140
Shaath H, Vishnubalaji R, Elkord E, Alajez NM (2020) Single-Cell Transcriptome Analysis Highlights a Role for Neutrophils and Inflammatory Macrophages in the Pathogenesis of Severe COVID-19. Cells 9:2374
Shojaei M et al (2023) IFI27 transcription is an early predictor for COVID-19 outcomes, a multi-cohort observational study. Front Immunol 13:1060438
Wang Y et al (2022) Blood transcriptome responses in patients correlate with severity of COVID-19 disease. Front Immunol 13:1043219
Wu T et al (2021) clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation 2(3):100141
Xu G et al (2022) The Transient IFN Response and the Delay of Adaptive Immunity Feature the Severity of COVID-19. Front Immunol 12:816745
Yan Q et al (2021) Longitudinal Peripheral Blood Transcriptional Analysis Reveals Molecular Signatures of Disease Progression in COVID-19 Patients. J Immunol 206:2146–2159
Zakeri R et al (2021) Biological responses to COVID-19: Insights from physiological and blood biomarker profiles. Curr Res Transl Med 69:103276
Zerbib Y et al (2020) Pathway mapping of leukocyte transcriptome in influenza patients reveals distinct pathogenic mechanisms associated with progression to severe infection. BMC Med Genomics 13:28
Acknowledgements
We thank the COVIDeF investigators for contributing to the data collection. We thank Guillaume Meurice, Alban Lermine and Michel Vidaud from the SeqOIA genomic platform for generating the transcriptome data and supporting this project. We thank Dr. Arianna Fiorentino from the plateforme de Rercherche Clinique de Saint Antoine Hospital (APHP). We thank Mireille Toy-Miou, Linda Gimeno and Krsytel Torelino from the Unité de Recherche Clinique of Pitié-Salpétrière Hospital(APHP) from handling the clinical data collection.
Funding
This project has received funding from the Agence Nationale de la Recherche Flash-Covid-19 program (reference: COVIDOMIC, ANR-20-COVI-000; to G.A.), Fondation AP-HP pour la recherche, Fondation de France and Programme Hospitalier de Recherche Clinique -PHRC COVID-19–20-0048 (Ministère de la Santé).
Author information
Authors and Affiliations
Consortia
Contributions
R.A., G.A.: conceptualization; N.C., F.T., P.H., G.G., V.P, A.B, H.G., P.M. and the COVIDeF group: clinical data and sample collection; F.T., T.S., G.A.: project administration and ethical aspects management; R.A., T.S.: samples handling and genomic data generation; R.A., A.J., D.dM., M.F.B., G.A.: bioinformatics and statistical analyses; R.A., M.F.B., G.A.: original draft preparation. All authors: manuscript review and editing.
Corresponding authors
Ethics declarations
Ethics approval
The COVIDeF cohort Study is registered at ClinicalTrials.gov (NCT04352348; date of trial registration: 20/04/2020), ethics approval was obtained from the Comité de Protection des Personnes Ile de France XI (ID RCB, 2020-A00754-35). The sponsor is the Direction de la Recherche Clinique de l’Assistance Publique des Hôpitaux de Paris (DRCI de l’APHP).
All methods were performed in accordance with the relevant guidelines and regulations.
Consent to participate
Signed informed consent for molecular analysis of blood samples and for access to clinical data was obtained from all patients, and the study was approved by the Ethic Committee.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
10142_2024_1359_MOESM1_ESM.tiff
Supplementary file1 (Supplementary Figure S1) (TIFF 3068 KB) Blood cell composition inferred from transcriptome. a) Correlation plot between the inferred proportion of different blood cell subtypes. b) Blood cell composition on its own poorly discriminates patients depending on COVID-19 pneumonia evolution. Sample projections based on the combination of the first two principal components (PC1, PC2) of unsupervised PCA performed on the inferred proportion of blood cell subtypes (i=20 blood cell types, n=159 samples). The center of each group is indicated by the larger circles
10142_2024_1359_MOESM2_ESM.tiff
Supplementary file2 (Supplementary Figure S2) (TIFF 3012 KB) Whole blood early transcriptome signature of COVID-19 pneumonia. Unsupervised clustering of samples using the 68 differentially expressed genes in the comparison COVID-19 pneumonia versus controls, after adjustment on age and blood cell composition
10142_2024_1359_MOESM3_ESM.tiff
Supplementary file3 (Supplementary Figure S3) (TIFF 2586 KB) Whole blood early transcriptome signature of patients evolving towards severe COVID-19 pneumonia. Unsupervised clustering of samples using the 100 most significant differentially expressed genes in the comparison of patients with evolution towards severe COVID-19 pneumonia versus those evolving towards mild COVID-19 pneumonia, after adjustment on age and blood cell composition
10142_2024_1359_MOESM4_ESM.tiff
Supplementary file4 (Supplementary Figure S4) (TIFF 82 KB) ROC curve analysis and optimal thresholds for COVID-19 pneumonia outcome discrimination between patients with mild and intermediate outcome on the training cohort. The optimal threshold is -0.02 (AUC = 0.80)
10142_2024_1359_MOESM5_ESM.tiff
Supplementary file5 (Supplementary Figure S5) (TIFF 80 KB) ROC curve analysis and optimal thresholds for COVID-19 pneumonia outcome discrimination between patients with intermediate and severe outcome on the training cohort. The optimal threshold is 7.69 (AUC = 0.98)
10142_2024_1359_MOESM6_ESM.tiff
Supplementary file6 (Supplementary figure S6) (TIFF 196 KB) Discrimination of samples based on the 48 selected genes discriminating COVID-19 pneumonia evolution. Samples projection based on the two principal components (PC1, PC2) of unsupervised PCA performed using the 48 genes selected by Elastic net regression on the training cohort. In faint circles are presented the COVID-19 pneumonia samples with mild and severe evolution from the training cohort (n=64), on which the optimization of gene selection was operated. In bright squares are presented the COVID-19 pneumonia samples with intermediate evolution. The dashed red lines indicate optimal thresholds for COVID-19 pneumonia outcome discrimination (determined on the training cohort)
10142_2024_1359_MOESM7_ESM.tiff
Supplementary file7 (Supplementary Figure S7) (TIFF 1189 KB) PCA projection of 203 additional samples (bright squares), based on the 48-genes selection using the PCA weights established on the training cohort (faint circles)
10142_2024_1359_MOESM8_ESM.tiff
Supplementary file8 (Supplementary Figure S8) (TIFF 867 KB) Correlation analysis between the 48-genes predictor and COVID-19 WHO severity levels in the Wang et al cohort24
10142_2024_1359_MOESM10_ESM.xlsx
Supplementary file10 (XLSX 18 KB) - Supplementary Table S2: Gene ontology enrichment on the first two axes of the principal component analysis of patients’ transcriptome, including COVID-19 and non-COVID-19 pneumonia.
10142_2024_1359_MOESM11_ESM.xlsx
Supplementary file11 (XLSX 45 KB) - Supplementary Table S3: Blood cell composition at inclusion, inferred from transcriptome data.
10142_2024_1359_MOESM12_ESM.xlsx
Supplementary file12 (XLSX 11 KB) - Supplementary Table S4: Blood cell composition at inclusion: comparison between the different groups of pneumonia.
10142_2024_1359_MOESM13_ESM.xlsx
Supplementary file13 (XLSX 16 KB) - Supplementary Table S5: Differentially expressed genes in COVID-19 pneumonia versus controls (Limma analysis).
10142_2024_1359_MOESM14_ESM.xlsx
Supplementary file14 (XLSX 14 KB) - Supplementary Table S6: Gene ontology enrichment of the over-expressed genes in the COVID-19 samples.
10142_2024_1359_MOESM15_ESM.xlsx
Supplementary file15 (XLSX 36 KB) - Supplementary Table S7: Differentially expressed genes in patients with severe COVID-19 pneumonia outcome versus patients with mild COVID-19 pneumonia outcome (Limma analysis).
10142_2024_1359_MOESM16_ESM.xlsx
Supplementary file16 (XLSX 14 KB) - Supplementary Table S8: Gene set enrichment analysis of the over-expressed genes in severe COVID-19 pneumonia samples.
10142_2024_1359_MOESM17_ESM.xlsx
Supplementary file17 (XLSX 11 KB) - Supplementary Table S9: Intersection of DEGs in severe COVID-19 found in our study and in the studies of Wang et al24 and Jackson et al23.
10142_2024_1359_MOESM18_ESM.xlsx
Supplementary file18 (XLSX 10 KB) - Supplementary Table S10: Intersection of DEGs in severe COVID-19 found in our study and in severe Influenza infection in the studies of Zerbib et al25 and Dunning et al26.
10142_2024_1359_MOESM19_ESM.xlsx
Supplementary file19 (XLSX 15 KB) - Supplementary Table S11: Forty-eight genes properly discriminating severe from mild evolution of COVID-19 pneumonia in the validation cohort (Elastic Net-penalized linear model).
10142_2024_1359_MOESM20_ESM.xlsx
Supplementary file20 (XLSX 10 KB) - Supplementary Table S12: Confusion matrix for the COVID-19 pneumonia outcome prediction on the validation cohort.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Armignacco, R., Carlier, N., Jouinot, A. et al. Whole blood transcriptome signature predicts severe forms of COVID-19: Results from the COVIDeF cohort study. Funct Integr Genomics 24, 107 (2024). https://doi.org/10.1007/s10142-024-01359-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10142-024-01359-2