Abstract
Bulk RNA sequencing (RNA-seq) of blood is typically used for gene expression analysis in biomedical research but is still rarely used in clinical practice. In this study, we propose that RNA-seq should be considered a diagnostic tool, as it offers not only insights into aberrant gene expression and splicing but also delivers additional readouts on immune cell type composition as well as B-cell and T-cell receptor (BCR/TCR) repertoires. We demonstrate that RNA-seq offers insights into a patient’s immune status via integrative analysis of RNA-seq data from patients infected with various SARS-CoV-2 variants (in total 196 samples with up to 200 million reads sequencing depth). We compare the results of computational cell-type deconvolution methods (e.g., MCP-counter, xCell, EPIC, quanTIseq) to complete blood count data, the current gold standard in clinical practice. We observe varying levels of lymphocyte depletion and significant differences in neutrophil levels between SARS-CoV-2 variants. Additionally, we identify B and T cell receptor (BCR/TCR) sequences using the tools MiXCR and TRUST4 to show that—combined with sequence alignments and BLASTp—they could be used to classify a patient's disease. Finally, we investigated the sequencing depth required for such analyses and concluded that 10 million reads per sample is sufficient. In conclusion, our study reveals that computational cell-type deconvolution and BCR/TCR methods using bulk RNA-seq analyses can supplement missing CBC data and offer insights into immune responses, disease severity, and pathogen-specific immunity, all achievable with a sequencing depth of 10 million reads per sample.
Similar content being viewed by others
Introduction
Peripheral blood is the tissue of choice in clinical diagnostics and biomedical research due to minimally invasive sample collection. As blood perfuses all organs, it provides insights into various diseases and medical conditions1,2. In general, we can investigate active pathways and organismal responses to stimuli (e.g., a viral infection) on a transcriptomic level3,4 by using well-established sequencing techniques such as bulk RNA sequencing (RNA-seq). A blood sample contains various cell types with different expression profiles. Complete blood counts (CBCs) are routinely assessed in the clinical setting and provide specific information regarding the proportions of cells present5. Within the white blood cell compartment, the percentages of neutrophils, lymphocytes, monocytes, eosinophils, and basophils provide insight into the type and response to infection and underlying disease and/or therapy6,7,8,9,10,11,12,13,14. However, CBCs are frequently unavailable in publicly accessible datasets, limiting insights into the status of the immune system.
Employing either bulk RNA-seq or single-cell RNA-seq (scRNA-seq, i.e., profiling gene expression at the individual cell level)15 can provide a detailed description of cellular composition, all based on the expression levels of genes16,17. Performing scRNA-seq on each patient and cell type is not feasible due to the logistics and high costs of scRNA-seq18. To gain insights into the individual immune reactions to disease from RNA-seq data alone, it is essential to determine the composition of immune cells and gene expression in patient samples. CBCs—if available—provide a fundamental understanding of changes in the immune system; however, they do not specify more fine-grained segmentation into functional subgroups, which often drive disease progression. Hence, computational techniques such as MCP-counter19, xCell20, EPIC21, and quanTIseq22 (see “Methods” and Suppl. Materials 1 and 2 for differences, strengths, and weaknesses of each method) deconvolve bulk RNA-seq using signatures or gene sets of cell-type specific genes. They give a robust estimate of the abundance of various immune cell types within and across patient samples. Recent benchmarks23 of these methods also compared their predictions with experimentally derived flow cytometry fractions and in silico generated pseudo-bulk samples, where ground-truth cell-type proportions are known. Such insights into the status of an immune system are helpful for diagnostics, prognosis, and treatment selection with demonstrated potential in oncology24 and other diseases10,12. Especially in studying tumor microenvironments, deconvolution methods have proven to be effective, as they enable researchers to estimate the proportions of tumor-infiltrating immune cells25,26. Here, we compare four different deconvolution approaches using bulk RNAseq to analyze changes in the white blood cell compartment over time in individuals infected with SARS-CoV-2. Previous works have already profiled the immune cell environment in early COVID-19 patients using scRNA-seq and detected a large decrease in T cells27.
In this study, we (1) compared computationally estimated immune cell abundances to CBC counts, the current gold standard. Moreover, we (2) investigate the immune cell abundances in patients infected with SARS-CoV-2 variants that differed in severity and tracked their progression over time, comparing them to a baseline model (i.e., seronegative samples taken from individuals that reportedly were never infected with SARS-CoV-2) to elucidate immune response differences and their progression over time to a healthy state. Additionally (3), we characterized the BCR and TCR profiles in infected patients. Finally (4), we compare how the performance of these methods is influenced by sequencing depth, i.e., how many reads have been sequenced for each sample.
Methods
Datasets
We utilized publicly available data from human buffy coat white blood cells from four distinct bulk RNA-seq experiments: GSE190680 (variants: Alpha, Alpha + EK (i.e., Alpha with an additional E484K mutation in the spike protein), Gamma)28, GSE162562 (seronegative)29, GSE201530 (variant: Omicron BA.1)30, and GSE205244 (variants: Omicron BA.1 and Omicron BA.2)31 (Suppl. Table 1a). Variants had samplings of days 0–5, 6–10, 11–15, 16–30, and > 30 after hospitalization or onset of symptoms (Suppl. Table 1a,b). All samples were processed by nf-core RNA-seq v. 3.8.1 using default parameters32. All 252 samples were controlled for quality by utilizing the reports of FastQC33 and MultiQC34, and only those (196 samples in total) with sufficient quality were included in the subsequent analyses (Suppl. Table 1b, Suppl. Table 2)35,36. Samples came from different studies but were processed in the same laboratory and with the same staff to avoid technical differences37.
Immune deconvolution methodology
Cell-type deconvolution is a computational method applied to bulk RNA-seq data to estimate the abundance of cell types in a biological sample and is primarily used in the context of immune cells. In this study, we employ several tools bundled in the immunedeconv tool (using default settings established there), as it was previously shown that no single tool generally outperforms all others across all immune cell types23 (for marker genes, see Suppl. Fig. 1a,b and https://doi.org/10.6084/m9.figshare.24442423.v1). Computational cell type deconvolution methods generally produce fractions or scores representing the abundance of specific cell types in the samples, which we use here for inter-sample comparisons between patients infected with different SARS-CoV-2 variants (see Suppl. Materials 119,20,21,22).
BCR/TCR repertoire methodology
BCR/TCR repertoire analysis refers to the study of the diverse collection of BCRs and TCRs present within an individual's B and T cell repertoire (i.e., all unique antigen-specific receptors expressed on the surface of T cells and B cells), respectively. These receptors play a crucial role in the adaptive immune system by recognizing and binding to specific antigens derived from pathogens or abnormal cells, thus serving as biomarkers for past or current infections. Each BCR and TCR has a unique amino acid sequence, which contributes to the vast diversity and specificity of the immune response. We used two methods—MiXCR38 and TRUST439—to investigate bulk RNA-seq data by reconstructing B and T cell repertoires (Suppl. Materials 240,41,42).
We used the Python package scirpy43 to analyze results from both methods. To extract only BCR/TCR sequences that differ from those found in a healthy population, we utilized the following steps: we computed a pairwise distance matrix for input sequences to identify sequences forming clonotypes and likely targeting similar antigens. Our objective was to identify sequences targeting SARS-CoV-2 antigens, enabling us to determine which BCR and TCR sequences respond to the virus. As sequences present in seronegative samples cannot target the SARS-CoV-2 virus, we can disregard them and focus on sequences exclusive to infected patients, further refining our search for the specific anti-SARS-CoV-2 receptor sequence. We use the ClustalW algorithm44 to perform multiple sequence alignment (see Suppl. Materials 345,46,47,48). In the final steps, we employed the protein BLAST (BLASTp) tool49,50 to annotate clusters and sequences (e.g., those surpassing the cutoff and unlinked to sequences originating from healthy samples) to associate discovered sequences with specific viruses.
Subsampling to a lower sequencing depth
RNA-seq always comes with a tradeoff between costs and information gain. Given the high sequencing depth (up to 200 million reads per sample) of the samples investigated here, we were interested in establishing a lower bound for obtaining robust results. To this end, we downsampled the samples to fifty, ten, seven, five, three, and one million reads using samtools51. Subsequently, we repeated both the immune deconvolution analysis and the BCR/TCR (only on fifty and ten million reads) analyses. First, we generated TPMs expression matrices from the downsampled FASTQ files using salmon52 and revisited the immune deconvolution methods, then executed MiXCR and TRUST4 on the downsampled data and performed the subsequent analyses as described above.
Results
In this study, we highlight the potential of RNA-seq data in clinical practice. Typically used for studying gene expression, RNA-seq data offers crucial insights into the status of the immune system via computational cell type deconvolution as well as the analysis of BCR and TCR sequences. While such advanced analysis techniques are increasingly widespread in oncology, we focus here on demonstrating their applicability in infectious diseases by example of SARS-CoV-2 infection. We re-analyzed data from 196 SARS-CoV-2 patients over time from the initial hospitalization through recovery28,30,31. First, we deconvolve the bulk RNA-seq data into immune cell-type fractions that changed as the patients went from the initial hospitalization through recovery. We show that the estimated values of the immune deconvolution methods approximate the CBC information. We further elucidate that with computational immune deconvolution methods, we can reveal changes between patients infected with SARS-CoV-2 variants with differing severity of disease53,54. Next, we illustrate how we can utilize BCR and TCR computational analysis to classify the patients’ cause of disease and investigate the effect of sequencing at various depths on the robustness of our results.
Approximated immune cell abundances by immune deconvolution methods are close to real complete blood count data
In Fig. 1, we observe a consistent positive correlation across lymphocytes and neutrophils using the four deconvolution methods with the CBC data for patients with the Alpha and Alpha + EK (Alpha with an additional E484K mutation at the spike protein) variant infections. Monocytes show overall lower correlation values, likely due to lower abundance overall. The strength of these correlations varies as scores fluctuate in magnitude based on the cell type and method employed (Suppl. Fig. 2). The chart highlights that the EPIC method's outcomes closely align with the CBC data, standing out, particularly in its highest accuracy between all methods for monocytes. While both quanTIseq and xCell yield commendable results for neutrophils and lymphocytes, MCP-counters predictions for both cell types appear to be less reliable. Importantly, when consolidating the findings from all methodologies, the immune deconvolution results consistently align with the CBC data. Additionally, xCell, quanTIseq, and EPIC show highly correlating results in lymphocytes and neutrophils. However, neutrophils and monocytes especially appear to be harder to estimate using deconvolution. Method choice can play an important role here, as only EPIC is able to detect monocytes consistently. This reaffirms that immune deconvolution could serve as an instrument for assessing immune cell levels derived from RNA-seq data, though results do vary between methods and cell types.
Immune deconvolution revealed differences in patients with different severity of disease progression
During the SARS-CoV-2 pandemic, different SARS-CoV-2 variants emerged (e.g., ancestral, Alpha, Alpha + EK, Gamma, Omikron BA.1, and Omikron BA.2), which differed in transmissibility and severity. Variants that emerged during the end of the pandemic were associated with less severe disease55. The Alpha variant was reported to demonstrate an increase in transmissibility due to the N501Y mutation in comparison to the wild-type virus56. The Alpha + EK variant was reported to more efficiently evade a neutralizing antibody response due to the additional E484K mutation but was not associated with more severe disease. The Gamma variant carried both N501Y and E484K mutations and was reported to enhance transmissibility with potential antibody resistance, but, again, disease severity was reported unchanged. The Omicron BA.1 and BA.2 variants, with numerous spike protein mutations, mediated immune escape. However, while their transmissibility increased, these variants, in general, demonstrated less severe disease outcomes. This pattern of viral evolution has been reported previously as a virus adapts to its human hosts over time, favoring transmission over severity55.
We hypothesize that non-hospitalized patients infected with an Omicron variant might demonstrate an immune response closer to healthy, non-infected individuals than to hospitalized patients infected with earlier SARS-CoV-2 variants. To explore this hypothesis, we first compared trends in the abundance of B cells, Neutrophils, T cell CD4+, and T cell CD8+ in the SARS-CoV-2 variants and the seronegative samples using four different deconvolution tools (quanTIseq, MCP-counter, EPIC, and xCell, see “Methods”) (Figs. 2 and 3). We found that all four methods, in general, recapitulated the same trends. The immune cell fractions or scores across all methods and immune cell types evolved over time to more closely resemble seronegative samples as healthy patients. We also observed that non-hospitalized patients infected with an Omicron variant more closely resembled the seronegative patients as compared to the hospitalized patients infected with earlier variants, especially as compared to the time when they were initially hospitalized.
We further categorized samples into different time brackets after hospitalization or onset of symptoms (days 0–5, 6–10, 11–15, 16–30, and > 30). Over time, the projected immune cell fractions appeared to progressively align with those observed in seronegative samples, consistent with patient recovery over time (Fig. 3). Patients diagnosed with Alpha and Alpha + EK, variants associated with more severe disease, demonstrated a lengthier time until their immune cell fractions approximated those of seronegative individuals (Suppl. Fig. 3).
B cell and T cell repertoire analysis offers insights into past or current infections
In general, when an infection occurs, an 'immunological footprint' in the form of specific BCR and TCR repertoires can be identified. In this section, we investigated whether bioinformatics BCR and TCR repertoire analysis approaches (i.e., a combination of MiXCR and TRUST4) of transcriptomic data, coupled with a computational tool that associates known BCR and TCR repertoires with causes of diseases (i.e., BLASTp49,50), could be used to classify a disease cause for an admitted patient (see “Methods”).
With the computational tool MiXCR, we identified 534 unique receptor sequences, while we identified 569 sequences with TRUST4 across the variants. Of these, 492 sequences were identified by both tools, while 42 and 77 sequences were uniquely identified by MiXCR and TRUST4, respectively. This means that 81% of the sequences were found by both tools, 7% only by MiXCR, and 13% only by TRUST4 (Suppl. Fig. 4). We decided to use only the sequences identified by both tools for further analyses to ensure more reliable results. In the next step, we eliminated sequences that exhibited homology to seronegative samples to account for BCR and TCR sequences that probably lack specificity for SARS-CoV-2, given that no seronegative sample should possess them (see “Methods”, Suppl. Fig. 5). Among the residual sequences, we discerned fifteen that did not display similarity to any sequence also found in seronegative samples. A subsequent BLASTp assessment of these sequences identified anti-SARS-CoV-2 immunoglobulin hits within the top 100 matches for seven sequences (Fig. 4a, Suppl. Table 3), with most sequences stemming from samples with the BA.2 variant. The residual eight sequences predominantly align with generic immunoglobulin sequences. Two sequences among the seven identified exhibit significant importance, as indicated by their notably low E-values. These values suggest the rarity of achieving a similar score by chance for these sequences. The first sequence (CYSTDSSGNHRGVF), identified in a study by Graham et al.57, was among over 100 mononuclear antibodies (mAbs) characterized for their interaction with epitopes from individuals infected with SARS-CoV-2. This study also demonstrated that some of these mAbs possess the ability to neutralize SARS-CoV-2. The second noteworthy sequence (CQQRSNWPPTWTF) emerged from a study by Jennewein et al.58. In this study, 198 antibodies were identified, with fourteen being distinguished as neutralizing antibodies (nAbs) against SARS-CoV-2. The study further explored how some of these nAbs can block the binding of ACE-2, thereby inhibiting viral entry into cells. The sequence logo derived from all fifteen sequences highlights conserved motifs at the start (S), the end (VF), and a recurring pattern (DSS) in the center. In contrast, the intervening positions exhibit significant variability, underscoring the pronounced diversity among these sequences (see Suppl. Fig. 6).
Sequencing depth analysis reveals differences in the robustness of conclusions between deconvolution and TCR/BCR results
Despite a reduction in sequencing depth, the trends observed in immune deconvolution outcomes remained consistent. Notably, there were still significant discrepancies in the levels of immune cells when comparing Alpha and Alpha + EK infections to seronegative cases with a sequencing depth of 50 million (Suppl. Fig. 7a) and with an even lower sequencing depth of 10 million (Suppl. Fig. 7b). Furthermore, temporal analysis reaffirmed these findings, indicating the recurrent trend where, across all variants, there is a convergence toward the levels observed in seronegative samples (Suppl. Figs. 3, 8a,b). Through this analysis, we demonstrated that a lower and even a very low sequencing depth of 10 million is indeed sufficient to discern the trends in immune cell levels and highlight the differential impacts of various variants on the immune system. Furthermore, when evaluating the immune deconvolution scores alongside the CBC data for patients infected with the Alpha and Alpha + EK variants, we aimed to find a lower bound of sequencing depth that still produces robust deconvolution results by downsampling reads even further to 1 million reads (Suppl. Fig. 9). Interestingly, correlation values between CBCs and deconvolution estimates remained largely stable in lymphocytes and neutrophils down to a sequencing depth of 1 million reads. quanTIseq did not detect any monocytes at lower sequencing depths, and MCP-counter and xCell show large drop-offs for correlation of monocytes at 1 million reads, down to R = 0.22, likely due to the low abundance of monocytes in our dataset (Suppl. Fig. 2).
In the repeated BCR/TCR analysis with a significantly reduced sequencing depth of 10 million, we identified only 95 unique BCR and TCR sequences. This is markedly fewer than in the prior analysis, but the decline is anticipated due to the reduced sequencing depth, which results in fewer overall sequences from the RNAseq experiments. Of the 95 sequences, 18 (19%) were solely identified by MiXCR, seven (7%) exclusively by TRUST4, and 70 (74%) were detected by both tools. This indicates that the majority of the sequences were still identifiable by both tools (Suppl. Fig. 10). After eliminating sequences resembling those in seronegative samples, we pinpointed eight unique sequences. Among these, seven were matched to anti-SARS-CoV-2 immunoglobulin sequences (Suppl. Table 4, Suppl. Fig. 11).
The sequences identified by the two BCR/TCR analyses, with full sequencing depth and low sequencing depth, differ between results. Additionally, there is a variation in the positions of the SARS-CoV-2 specific hits. At greater sequencing depth, these hits are more commonly found within the top ten. In contrast, when the sequencing depth is reduced, they are more likely to be ranked higher, and, as a result, the findings become somewhat less substantiated.
A statistical comparison like comparing the p-values for the immune deconvolution is not possible here as MiXCR and TRUST4 do not generate significance values, and the BLAST E-values represent the number of random hits that can be generated in a database of a certain size and, therefore are not suitable to compare the significance of our results but merely the reliability of each sequence match individually. Instead, we declare sequences detected using full sequencing depth as ground truth and compare their overlaps with sequences at lower sequencing depths (Fig. 4b,c), where sequences that are found in infected samples are considered positive cases (see Suppl. Materials 4 for details). With 50 million reads, we could only detect two sequences in infected samples that were also detected at full sequencing depth, while eight other sequences were, in fact, also present in Seronegative samples at full depth, leading to a sensitivity of 0.2. With 10 million reads, no more true positive cases could be detected, and the sensitivity dropped to 0.
In both analyses, we were able to find seven anti-SARS-CoV-2-related hits that appear in the first one hundred BLAST results. Notably, even though MiXCR and TRUST4 identified fewer sequences overall due to the reduced depth, the count of SARS-CoV-2 specific sequences remained consistent.
In conclusion, a sequencing depth of 10 million was adequate to detect SARS-CoV-2-related sequences, just like with greater sequencing depths. However, the latter produces more robust outcomes, as sensitivity to detect infection-related sequences drops drastically at lower depths.
Discussion
We found that the immune deconvolution tools, including quanTIseq, MCP-counter, EPIC, and xCell, generally predict similar trends in immune cell composition (B cells, Neutrophils, T cell CD4+, and T cell CD8+) across SARS-CoV-2 samples that reflect differences in severity and over time. However, we can also see large differences between individual samples. While the immune cell abundances presented in this manuscript were not validated by flow cytometry, the deconvolution methods themselves were previously evaluated using flow cytometry measurements23,59. Our computational results predict a progressive alignment of immune cell fractions with those of seronegative samples, correlating with decreased disease severity and/or individual disease progression. However, individuals with severe disease courses like Alpha and Alpha + EK show extended recovery timelines before reaching these levels, indicating a potential marker of disease severity. A confounder that should be considered in the analysis could be that SARS-CoV-2 can invade immune cells and could potentially skew the results of the immune deconvolution results60. While computational deconvolution methods are able to robustly estimate trends in immune-cell composition correctly, they do show a large variance in prediction accuracy on a sample level. This drawback is especially important when trying to use such methods in a personalized fashion. Here, prediction accuracy is not high enough to give precise results of immune-cell composition in patients. However, so-called second-generation deconvolution methods25 promise to increase prediction quality by employing scRNA-seq datasets as an additional resource in deciphering the cell-type composition of bulk RNA-seq datasets. Such tools may also reveal changes in the functional state of immune cells and thus surpass information provided by CBC measurements.
We further introduced an approach for diagnosing infections using RNA-seq with bioinformatic analysis of BCR and TCR repertoires. We speculate that patterns of BCR and TCR repertoires could be associated with different disease settings. The current system is built on known BCR and TCR repertoires associated with diseases, which means it can only be used for identifying known infections61,62. As data on BCR and TCR repertoires from different clinical settings is deposited and available for analysis, it is possible the information can be used to improve understanding of immune response in individual patients. At present, ethical considerations of detailed genomic analysis in individual patients can limit the types of information gathered and their distribution. However, anonymized data obtained through clinical trials with informed consent may still be useful in exploring how changes in TCR and BCR repertoires evolve during disease and recovery.
Our analyses demonstrate that a reduced sequencing depth of 10 million is sufficient to identify overarching trends in immune cell levels and anti-SARS-CoV-2 specific sequences, although higher sequencing depths yield more robust outcomes. Despite lower depths resulting in findings of less significance and confidence, the overall trends and correlations with CBC data remain consistent. The BCR/TCR analyses further corroborate these findings, as even at reduced sequencing depths, SARS-CoV-2-specific sequences were still identifiable. These results affirm the feasibility of using lower sequencing depths for meaningful analyses in the study of immune responses and pathogen-specific immunity, making it more feasible in a clinical setting due to lower costs.
Since 2001, genome sequencing costs have significantly decreased from $100 million to the $1000 genome milestone, reflecting similar cost reductions in RNA sequencing63. With the impending expiration of Illumina's key patents, the RNA sequencing market could see heightened competition and further price reductions, a recent article in Science just speculated about the costs being reduced to $10064. This shift might be key to embedding sequencing more deeply into routine clinical practice, making it a more accessible tool for patient care and research.
As RNA-sequencing technologies advance and become cheaper, they hold promise for future clinical utility by providing a more detailed view of global gene expression profiles. For example, quantitative polymerase chain reaction (qPCR) has already been adopted in clinical settings for its high sensitivity and specificity in detecting and quantifying microbial pathogens65 or SARS-CoV-266,67. To our knowledge, RNA-seq combined with immune deconvolution is not directly used in a routine clinical setting59; however, it has been employed in research settings analyzing whole blood sequencing datasets14. Notably, techniques other than RNA-seq, such as DNA methylation microarrays, could be used for immune deconvolution. However, in our opinion, RNA-seq data offers the greatest variety of readouts across gene expression, alternative splicing, and immune status.
Previous work has also identified specific immune cell subsets, including neutrophils, to be associated with more severe SARS-CoV2 infection68. In the future, immune deconvolution and BCR/TCR could potentially guide the decision-making of a physician, e.g., the immediate allocation of a newly admitted patient with potentially severe disease progression to the intensive care unit, recognizing that such a tool would require ongoing updates to maintain its utility for predictive modeling69,70. With more blood samplings after admission, we can also see if the disease course will change, and the medical doctor could, based on this analysis and other factors, advise the patient to be submitted to the intensive care unit71. From the experimental side, RNA-seq offers a comprehensive view of gene expression, including genes related to BCRs and TCRs. Still, it's not specifically focused on analyzing the diversity or clonality of these receptors72. In contrast, TCR-seq, designed to target T-cell receptors (and BCR-seq for B-cell receptors73), thoroughly examines the diversity and specificities within the receptor repertoires74. While RNA-seq is valuable for a broad understanding of the immune response, TCR-seq delivers more focused insights into the T-cell repertoire, crucial for studies of immune dynamics and specificities72. The choice between these techniques hinges on whether the research aims for an overall immune profile or a detailed analysis of receptor diversity and clonality72. However, since TCR-seq or BCR-seq is rare and offers a limited readout, we consider RNA-seq data more informative. From the clinical side, we have to consider that our study relies on data either primarily collected from hospitalized elderly patients (i.e., Alpha, Alpha + EK, and Gamma) or mild disease progression (i.e., Omikron BA.1 and Omikron BA.2) potentially introducing a selection bias. Moreover, differences in local healthcare systems, as well as individual patient factors (e.g., age and preconditions), could influence recovery timelines and should be factored into any broader applications of these findings. Additionally, the predictive methods used for immune cell fraction estimations, while robust and consistent over multiple sequencing depths, are not without their limitations and potential discrepancies. The stable performance in terms of correlation with CBCs across sequencing depths is likely because relative gene expression differences in signature genes are still present even at very low sequencing depths.
In addition to immune cell composition, analyzing immune cell receptors of B and T cells by employing tools such as MiXCR38 and TRUST439 in combination with BCR/TCR databases can provide a rapid determination of the type of a previously discovered virus or infection. However, with our proposed method, we find potential clonotypes but are not able to confirm if they come from a new virus variant. In addition, we noticed large decreases in sensitivity when shallower sequenced samples are available. Genomic data analysis from cell preparation, library generation, sequencing, and quality control is, with the current technology, not feasible in a matter of hours, as is the case for CBCs. Recent advances to introduce RNA-seq into clinical settings describe a complete workflow to finish in about 1 week75. One technology that is able to improve the precision of BCR/TCR detection is Oxford Nanopore sequencing. While not currently implemented in many studies of the transcriptome due to sequencing error limitations and PCR-induced distortions76, it promises to increase clonotype detection and tracking77.
In summary, we employed computational immune deconvolution tools at distinct SARS-CoV-2 data sets, illustrating that they can be used to supplement immune cell abundance estimates for bulk RNA-seq data that is not accompanied by CBC information. Additionally, these tools can be used for discerning trends in immune cell fractions during disease recovery and for comparing differences in immune cell fractions between more and less severe SARS-CoV-2 variants. Using the proposed workflow to utilize BCR/TCR methods combined with alignments and BLASTp could help to pinpoint the type of viral infection. Our presented bioinformatic strategies combined with expert medical judgment, new technologies, and automatizations could promise a path toward precision medicine, where treatment plans are personalized and optimized for each individual in the future based on individualized genetic analyses.
Data availability
Computational scripts can be found at: https://github.com/biomedbigdata/SARS-CoV-2_immunedeconv_bcrtcr. Analysis results can be downloaded as an RData object in Supplemental Materials and on figshare: https://doi.org/10.6084/m9.figshare.24221167. Data can be publicly found at: GSE190680 (variants: Alpha, Alpha + EK, Gamma), GSE162562 (Seronegatives), GSE201530 (variant: Omikron BA.1), GSE205244 (variants: Omikron BA.1 and Omikron BA.2). List of marker genes per method can be found here: https://doi.org/10.6084/m9.figshare.24442423.v1.
References
Morrow, J. D. et al. Hepatitis C and HIV detection by blood RNA-sequencing in cohort of smokers. Sci. Rep. 13, 1357 (2023).
Wargodsky, R. et al. RNA sequencing in COVID-19 patients identifies neutrophil activation biomarkers as a promising diagnostic platform for infections. PLoS ONE 17, e0261679 (2022).
Barton, A. J., Hill, J., Pollard, A. J. & Blohmke, C. J. Transcriptomics in human challenge models. Front. Immunol. 8, 1839 (2017).
Supplitt, S., Karpinski, P., Sasiadek, M. & Laczmanska, I. Current achievements and applications of transcriptomics in personalized cancer medicine. Int. J. Mol. Sci. 22, 1422 (2021).
Tefferi, A., Hanson, C. A. & Inwards, D. J. How to interpret and pursue an abnormal complete blood cell count in adults. Mayo Clin. Proc. 80, 923–936 (2005).
Leach, M. Interpretation of the full blood count in systemic disease—A guide for the physician. J. R. Coll. Phys. Edinb. 44, 36–41 (2014).
Kong, Y., Rastogi, D., Seoighe, C., Greally, J. M. & Suzuki, M. Insights from deconvolution of cell subtype proportions enhance the interpretation of functional genomic data. PLoS ONE 14, e0215987 (2019).
Kuksin, M. et al. Applications of single-cell and bulk RNA sequencing in onco-immunology. Eur. J. Cancer 149, 193–210 (2021).
Bracci, P. M. et al. Pre-surgery immune profiles of adult glioma patients. J. Neurooncol. 159, 103–115 (2022).
O’Connell, G. C. & Chang, J. H. C. Analysis of early stroke-induced changes in circulating leukocyte counts using transcriptomic deconvolution. Transl. Neurosci. 9, 161–166 (2018).
Qi, L. et al. Deconvolution of the gene expression profiles of valuable banked blood specimens for studying the prognostic values of altered peripheral immune cell proportions in cancer patients. PLoS ONE 9, e100934 (2014).
Akthar, M. et al. Deconvolution of whole blood transcriptomics identifies changes in immune cell composition in patients with systemic lupus erythematosus (SLE) treated with mycophenolate mofetil. Arthritis Res. Ther. 25, 111 (2023).
Monaco, G. et al. RNA-Seq signatures normalized by mRNA abundance allow absolute deconvolution of human immune cell types. Cell Rep. 26, 1627-1640.e7 (2019).
Thompson, R. C. et al. Molecular states during acute COVID-19 reveal distinct etiologies of long-term sequelae. Nat. Med. 29, 236–246 (2023).
Moreno, P. et al. Expression Atlas update: Gene and protein expression in multiple species. Nucleic Acids Res. 50, D129–D140 (2022).
Lee, H. K. et al. Analysis of immune responses in patients with CLL after heterologous COVID-19 vaccination. Blood Adv. 7, 2214–2227 (2023).
Knabl, L. et al. BNT162b2 vaccination enhances interferon-JAK-STAT-regulated antiviral programs in COVID-19 patients infected with the SARS-CoV-2 Beta variant. Commun. Med. https://doi.org/10.1038/s43856-022-00083-x (2022).
Chen, G., Ning, B. & Shi, T. Single-cell RNA-Seq technologies and related computational data analysis. Front. Genet. 10, 317 (2019).
Becht, E. et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 17, 218 (2016).
Aran, D., Hu, Z. & Butte, A. J. xCell: Digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 18, 220 (2017).
Racle, J., de Jonge, K., Baumgaertner, P., Speiser, D. E. & Gfeller, D. Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. Elife https://doi.org/10.7554/eLife.26476 (2017).
Finotello, F. et al. Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome Med. https://doi.org/10.1186/s13073-019-0638-6 (2019).
Sturm, G. et al. Comprehensive evaluation of transcriptome-based cell-type quantification methods for immuno-oncology. Bioinformatics 35, i436–i445 (2019).
Finotello, F. & Trajanoski, Z. Quantifying tumor-infiltrating immune cells from transcriptomics data. Cancer Immunol. Immunother. 67, 1031–1040 (2018).
Merotto, L., Zopoglou, M., Zackl, C. & Finotello, F. Next-generation deconvolution of transcriptomic data to investigate the tumor microenvironment. In International Review of Cell and Molecular Biology (Academic Press, 2023).
Fridman, W. H. et al. The immune microenvironment: A major player in human cancers. Int. Arch. Allergy Immunol. 164, 13–26 (2014).
Wen, W. et al. Immune cell profiling of COVID-19 patients in the recovery stage by single-cell sequencing. Cell Discov. 6, 31 (2020).
Lee, H. K. et al. Immune transcriptome analysis of COVID-19 patients infected with SARS-CoV-2 variants carrying the E484K escape mutation identifies a distinct gene module. Sci. Rep. 12, 2784 (2022).
Lee, H. K. et al. Immune transcriptomes of highly exposed SARS-CoV-2 asymptomatic seropositive versus seronegative individuals from the Ischgl community. Sci. Rep. 11, 4243 (2021).
Lee, H. K. et al. Prior vaccination exceeds prior infection in eliciting innate and humoral immune responses in omicron infected outpatients. Front. Immunol. 13, 916686 (2022).
Lee, H. K., Knabl, L., Walter, M., Furth, P. A. & Hennighausen, L. Limited cross-variant immune response from SARS-CoV-2 Omicron BA.2 in naïve but not previously infected outpatients. iScience 25, 105369 (2022).
Ewels, P. A. et al. The nf-core framework for community-curated bioinformatics pipelines. Nat. Biotechnol. 38, 276–278 (2020).
Babraham Bioinformatics—FastQC A Quality Control tool for High Throughput Sequence Data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048. https://doi.org/10.1093/bioinformatics/btw354 (2016).
Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 5, 2122 (2016).
Lancaster, I., Patel, D., Sethi, V., Connelly, W. & Namey, J. Myelodysplastic syndrome in a case of new-onset pancytopenia. Clin. Case Rep. 10, e05533 (2022).
Gustafsson, J. et al. Sources of variation in cell-type RNA-Seq profiles. PLoS ONE 15, e0239495 (2020).
Bolotin, D. A. et al. MiXCR: Software for comprehensive adaptive immunity profiling. Nat. Methods 12, 380–381 (2015).
Song, L. et al. TRUST4: Immune repertoire reconstruction from bulk and single-cell RNA-seq data. Nat. Methods 18, 627–630 (2021).
Smakaj, E. et al. Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences. Bioinformatics 36, 1731–1739 (2020).
Lefranc, M.-P. et al. IMGT, the international ImMunoGeneTics information system. Nucleic Acids Res. 37, D1006–D1012 (2009).
Yu, K., Shi, J., Lu, D. & Yang, Q. Comparative analysis of CDR3 regions in paired human αβ CD8 T cells. FEBS Open Bio 9, 1450–1459 (2019).
Scirpy: A Scanpy extension for analyzing single-cell T-cell receptor sequencing data. https://doi.org/10.37473/dac/10.1101/2020.04.10.035865.
Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).
Henikoff, S. & Henikoff, J. G. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992).
igraph – Network analysis software. https://igraph.org/.
Bodenhofer, U., Bonatesta, E., Horejš-Kainrath, C. & Hochreiter, S. msa: An R package for multiple sequence alignment. Bioinformatics https://doi.org/10.1093/bioinformatics/btv494 (2015).
Wagih, O. ggseqlogo: A versatile R package for drawing sequence logos. Bioinformatics 33, 3645–3647 (2017).
Protein BLAST: search protein databases using a protein query. https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins.
Johnson, M. et al. NCBI BLAST: A better web interface. Nucleic Acids Res. 36, W5-9 (2008).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience https://doi.org/10.1093/gigascience/giab008 (2021).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Esper, F. P. et al. Alpha to Omicron: Disease severity and clinical outcomes of major SARS-CoV-2 variants. J. Infect. Dis. 227, 344–352 (2023).
Arabi, M. et al. Severity of the Omicron SARS-CoV-2 variant compared with the previous lineages: A systematic review. J. Cell. Mol. Med. 27, 1443–1464 (2023).
Carabelli, A. M. et al. SARS-CoV-2 variant biology: Immune escape, transmission and fitness. Nat. Rev. Microbiol. 21, 162–177 (2023).
Liu, Y. et al. The N501Y spike substitution enhances SARS-CoV-2 infection and transmission. Nature 602, 294–299 (2022).
Graham, C. et al. Neutralization potency of monoclonal antibodies recognizing dominant and subdominant epitopes on SARS-CoV-2 Spike is impacted by the B.1.1.7 variant. Immunity 54, 1276-1289.e6 (2021).
Jennewein, M. F. et al. Isolation and characterization of cross-neutralizing coronavirus antibodies from COVID-19+ subjects. Cell Rep. 36, 109353 (2021).
Kalatskaya, I. et al. Revealing the immune cell subtype reconstitution profile in patients from the CLARITY study using deconvolution algorithms after cladribine tablets treatment. Sci. Rep. 13, 8067 (2023).
Pontelli, M. C. et al. SARS-CoV-2 productively infects primary human immune system cells in vitro and in COVID-19 patients. J. Mol. Cell Biol. 14, mjac021 (2022).
Zheng, B., Yang, Y., Chen, L., Wu, M. & Zhou, S. B-cell receptor repertoire sequencing: Deeper digging into the mechanisms and clinical aspects of immune-mediated diseases. iScience 25, 105002 (2022).
Pogorelyy, M. V. et al. Method for identification of condition-associated public antigen receptor sequences. Elife https://doi.org/10.7554/eLife.33050 (2018).
Hayden, E. C. Technology: The $1,000 genome (Nature Publishing Group, 2014) https://doi.org/10.1038/507294a.
Pennisi, E. Upstart DNA sequencers could be a ‘game changer’. Science 376, 1257–1258 (2022).
Kralik, P. & Ricchi, M. A basic guide to real time PCR in microbial diagnostics: Definitions, parameters, and everything. Front. Microbiol. 8, 108 (2017).
Kudo, E. et al. Detection of SARS-CoV-2 RNA by multiplex RT-qPCR. PLoS Biol. 18, e3000867 (2020).
Vogels, C. B. F. et al. Analytical sensitivity and efficiency comparisons of SARS-CoV-2 RT-qPCR primer-probe sets. Nat. Microbiol. 5, 1299–1305 (2020).
Shi, W. et al. Bioinformatics approach to identify the hub gene associated with COVID-19 and idiopathic pulmonary fibrosis. IET Syst. Biol. https://doi.org/10.1049/syb2.12080 (2023).
Vaid, A. et al. Implications of the use of artificial intelligence predictive models in health care settings: A simulation study. Ann. Intern. Med. https://doi.org/10.7326/M23-0949 (2023).
Robinson, M. L., Garibaldi, B. T. & Lindquist, M. A. When clinical prediction is steering the ship, beware the drift of its wake. Ann. Intern. Med. https://doi.org/10.7326/M23-2345 (2023).
Hajjar, L. A. et al. Intensive care management of patients with COVID-19: A practical approach. Ann. Intensive Care 11, 36 (2021).
Mazzotti, L. et al. T-cell receptor repertoire sequencing and its applications: Focus on infectious diseases and cancer. Int. J. Mol. Sci. 23, 8590 (2022).
Yaari, G. & Kleinstein, S. H. Practical guidelines for B-cell receptor repertoire sequencing analysis. Genome Med. 7, 121 (2015).
Lee, H. K. et al. mRNA vaccination in octogenarians 15 and 20 months after recovery from COVID-19 elicits robust immune and antibody responses that include Omicron. Cell Rep. 39, 110680 (2022).
Yépez, V. A. et al. Clinical implementation of RNA sequencing for Mendelian disease diagnostics. Genome Med. 14, 38 (2022).
Kebschull, J. M. & Zador, A. M. Sources of PCR-induced distortions in high-throughput sequencing data sets. Nucleic Acids Res. 43, e143 (2015).
Singh, M. et al. High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes. Nat. Commun. 10, 3120 (2019).
Acknowledgements
The authors gratefully thank all the patients and healthy individuals who participated in this study. We would like to thank Anke Kraft and Sebastian Klein for their helpful discussions. Figures were created with Biorender.com. Parts of the figures include icons from Flaticon.com under a paid license. The text was partly rephrased using chatGPT version 4 under a paid license.
Funding
Open Access funding enabled and organized by Projekt DEAL. This work was supported by the Technical University Munich—Institute for Advanced Study, funded by the German Excellence Initiative. This work was supported in part by the Intramural Research Programs (IRPs) of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). JB was partially funded by his VILLUM Young Investigator Grant nr.13154. Partly funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—422216132. This work was supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the CompLS research and funding concept [031L0294B (NetfLID)]. This work was supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the *e:Med* research and funding concept (*grants 01ZX1908A/01ZX2208A* and *grants 01ZX1910D/01ZX2210D*). This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 777111. This publication reflects only the author’s view, and the European Commission is not responsible for any use that may be made of the information it contains.
Author information
Authors and Affiliations
Contributions
M.H., L.L.W., A.D., H.K.L., L.K., P.A.F., L.H., and M.L. contributed to the initial design of the study. M.H. and L.L.W. conducted the data cleaning and analyses. M.H. and A.D. provided supervision of L.L.W. during the analyses. N.T. supported data cleaning, data visualization, and technical work. M.H., L.L.W., and A.D. drafted the initial manuscript. M.H., L.L.W., A.D., M.H., P.A.F., L.H., and M.L. edited the initial manuscript. All authors read and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hoffmann, M., Willruth, LL., Dietrich, A. et al. Blood transcriptomics analysis offers insights into variant-specific immune response to SARS-CoV-2. Sci Rep 14, 2808 (2024). https://doi.org/10.1038/s41598-024-53117-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-53117-w
- Springer Nature Limited