Introduction

Peripheral blood is the tissue of choice in clinical diagnostics and biomedical research due to minimally invasive sample collection. As blood perfuses all organs, it provides insights into various diseases and medical conditions1,2. In general, we can investigate active pathways and organismal responses to stimuli (e.g., a viral infection) on a transcriptomic level3,4 by using well-established sequencing techniques such as bulk RNA sequencing (RNA-seq). A blood sample contains various cell types with different expression profiles. Complete blood counts (CBCs) are routinely assessed in the clinical setting and provide specific information regarding the proportions of cells present5. Within the white blood cell compartment, the percentages of neutrophils, lymphocytes, monocytes, eosinophils, and basophils provide insight into the type and response to infection and underlying disease and/or therapy6,7,8,9,10,11,12,13,14. However, CBCs are frequently unavailable in publicly accessible datasets, limiting insights into the status of the immune system.

Employing either bulk RNA-seq or single-cell RNA-seq (scRNA-seq, i.e., profiling gene expression at the individual cell level)15 can provide a detailed description of cellular composition, all based on the expression levels of genes16,17. Performing scRNA-seq on each patient and cell type is not feasible due to the logistics and high costs of scRNA-seq18. To gain insights into the individual immune reactions to disease from RNA-seq data alone, it is essential to determine the composition of immune cells and gene expression in patient samples. CBCs—if available—provide a fundamental understanding of changes in the immune system; however, they do not specify more fine-grained segmentation into functional subgroups, which often drive disease progression. Hence, computational techniques such as MCP-counter19, xCell20, EPIC21, and quanTIseq22 (see “Methods” and Suppl. Materials 1 and 2 for differences, strengths, and weaknesses of each method) deconvolve bulk RNA-seq using signatures or gene sets of cell-type specific genes. They give a robust estimate of the abundance of various immune cell types within and across patient samples. Recent benchmarks23 of these methods also compared their predictions with experimentally derived flow cytometry fractions and in silico generated pseudo-bulk samples, where ground-truth cell-type proportions are known. Such insights into the status of an immune system are helpful for diagnostics, prognosis, and treatment selection with demonstrated potential in oncology24 and other diseases10,12. Especially in studying tumor microenvironments, deconvolution methods have proven to be effective, as they enable researchers to estimate the proportions of tumor-infiltrating immune cells25,26. Here, we compare four different deconvolution approaches using bulk RNAseq to analyze changes in the white blood cell compartment over time in individuals infected with SARS-CoV-2. Previous works have already profiled the immune cell environment in early COVID-19 patients using scRNA-seq and detected a large decrease in T cells27.

In this study, we (1) compared computationally estimated immune cell abundances to CBC counts, the current gold standard. Moreover, we (2) investigate the immune cell abundances in patients infected with SARS-CoV-2 variants that differed in severity and tracked their progression over time, comparing them to a baseline model (i.e., seronegative samples taken from individuals that reportedly were never infected with SARS-CoV-2) to elucidate immune response differences and their progression over time to a healthy state. Additionally (3), we characterized the BCR and TCR profiles in infected patients. Finally (4), we compare how the performance of these methods is influenced by sequencing depth, i.e., how many reads have been sequenced for each sample.

Methods

Datasets

We utilized publicly available data from human buffy coat white blood cells from four distinct bulk RNA-seq experiments: GSE190680 (variants: Alpha, Alpha + EK (i.e., Alpha with an additional E484K mutation in the spike protein), Gamma)28, GSE162562 (seronegative)29, GSE201530 (variant: Omicron BA.1)30, and GSE205244 (variants: Omicron BA.1 and Omicron BA.2)31 (Suppl. Table 1a). Variants had samplings of days 0–5, 6–10, 11–15, 16–30, and > 30 after hospitalization or onset of symptoms (Suppl. Table 1a,b). All samples were processed by nf-core RNA-seq v. 3.8.1 using default parameters32. All 252 samples were controlled for quality by utilizing the reports of FastQC33 and MultiQC34, and only those (196 samples in total) with sufficient quality were included in the subsequent analyses (Suppl. Table 1b, Suppl. Table 2)35,36. Samples came from different studies but were processed in the same laboratory and with the same staff to avoid technical differences37.

Immune deconvolution methodology

Cell-type deconvolution is a computational method applied to bulk RNA-seq data to estimate the abundance of cell types in a biological sample and is primarily used in the context of immune cells. In this study, we employ several tools bundled in the immunedeconv tool (using default settings established there), as it was previously shown that no single tool generally outperforms all others across all immune cell types23 (for marker genes, see Suppl. Fig. 1a,b and https://doi.org/10.6084/m9.figshare.24442423.v1). Computational cell type deconvolution methods generally produce fractions or scores representing the abundance of specific cell types in the samples, which we use here for inter-sample comparisons between patients infected with different SARS-CoV-2 variants (see Suppl. Materials 119,20,21,22).

BCR/TCR repertoire methodology

BCR/TCR repertoire analysis refers to the study of the diverse collection of BCRs and TCRs present within an individual's B and T cell repertoire (i.e., all unique antigen-specific receptors expressed on the surface of T cells and B cells), respectively. These receptors play a crucial role in the adaptive immune system by recognizing and binding to specific antigens derived from pathogens or abnormal cells, thus serving as biomarkers for past or current infections. Each BCR and TCR has a unique amino acid sequence, which contributes to the vast diversity and specificity of the immune response. We used two methods—MiXCR38 and TRUST439—to investigate bulk RNA-seq data by reconstructing B and T cell repertoires (Suppl. Materials 240,41,42).

We used the Python package scirpy43 to analyze results from both methods. To extract only BCR/TCR sequences that differ from those found in a healthy population, we utilized the following steps: we computed a pairwise distance matrix for input sequences to identify sequences forming clonotypes and likely targeting similar antigens. Our objective was to identify sequences targeting SARS-CoV-2 antigens, enabling us to determine which BCR and TCR sequences respond to the virus. As sequences present in seronegative samples cannot target the SARS-CoV-2 virus, we can disregard them and focus on sequences exclusive to infected patients, further refining our search for the specific anti-SARS-CoV-2 receptor sequence. We use the ClustalW algorithm44 to perform multiple sequence alignment (see Suppl. Materials 345,46,47,48). In the final steps, we employed the protein BLAST (BLASTp) tool49,50 to annotate clusters and sequences (e.g., those surpassing the cutoff and unlinked to sequences originating from healthy samples) to associate discovered sequences with specific viruses.

Subsampling to a lower sequencing depth

RNA-seq always comes with a tradeoff between costs and information gain. Given the high sequencing depth (up to 200 million reads per sample) of the samples investigated here, we were interested in establishing a lower bound for obtaining robust results. To this end, we downsampled the samples to fifty, ten, seven, five, three, and one million reads using samtools51. Subsequently, we repeated both the immune deconvolution analysis and the BCR/TCR (only on fifty and ten million reads) analyses. First, we generated TPMs expression matrices from the downsampled FASTQ files using salmon52 and revisited the immune deconvolution methods, then executed MiXCR and TRUST4 on the downsampled data and performed the subsequent analyses as described above.

Results

In this study, we highlight the potential of RNA-seq data in clinical practice. Typically used for studying gene expression, RNA-seq data offers crucial insights into the status of the immune system via computational cell type deconvolution as well as the analysis of BCR and TCR sequences. While such advanced analysis techniques are increasingly widespread in oncology, we focus here on demonstrating their applicability in infectious diseases by example of SARS-CoV-2 infection. We re-analyzed data from 196 SARS-CoV-2 patients over time from the initial hospitalization through recovery28,30,31. First, we deconvolve the bulk RNA-seq data into immune cell-type fractions that changed as the patients went from the initial hospitalization through recovery. We show that the estimated values of the immune deconvolution methods approximate the CBC information. We further elucidate that with computational immune deconvolution methods, we can reveal changes between patients infected with SARS-CoV-2 variants with differing severity of disease53,54. Next, we illustrate how we can utilize BCR and TCR computational analysis to classify the patients’ cause of disease and investigate the effect of sequencing at various depths on the robustness of our results.

Approximated immune cell abundances by immune deconvolution methods are close to real complete blood count data

In Fig. 1, we observe a consistent positive correlation across lymphocytes and neutrophils using the four deconvolution methods with the CBC data for patients with the Alpha and Alpha + EK (Alpha with an additional E484K mutation at the spike protein) variant infections. Monocytes show overall lower correlation values, likely due to lower abundance overall. The strength of these correlations varies as scores fluctuate in magnitude based on the cell type and method employed (Suppl. Fig. 2). The chart highlights that the EPIC method's outcomes closely align with the CBC data, standing out, particularly in its highest accuracy between all methods for monocytes. While both quanTIseq and xCell yield commendable results for neutrophils and lymphocytes, MCP-counters predictions for both cell types appear to be less reliable. Importantly, when consolidating the findings from all methodologies, the immune deconvolution results consistently align with the CBC data. Additionally, xCell, quanTIseq, and EPIC show highly correlating results in lymphocytes and neutrophils. However, neutrophils and monocytes especially appear to be harder to estimate using deconvolution. Method choice can play an important role here, as only EPIC is able to detect monocytes consistently. This reaffirms that immune deconvolution could serve as an instrument for assessing immune cell levels derived from RNA-seq data, though results do vary between methods and cell types.

Figure 1
figure 1

Pairwise correlation heatmap, comparing cell-type estimates of four deconvolution methods and complete blood count (CBC) values separately for lymphocytes, monocytes, and neutrophils. Pearson’s correlation coefficients are written in each box, with an indication (*) in case the correlation is significant (p-value < 0.05). Comparisons of deconvolution methods with CBCs are highlighted in bold outlines.

Immune deconvolution revealed differences in patients with different severity of disease progression

During the SARS-CoV-2 pandemic, different SARS-CoV-2 variants emerged (e.g., ancestral, Alpha, Alpha + EK, Gamma, Omikron BA.1, and Omikron BA.2), which differed in transmissibility and severity. Variants that emerged during the end of the pandemic were associated with less severe disease55. The Alpha variant was reported to demonstrate an increase in transmissibility due to the N501Y mutation in comparison to the wild-type virus56. The Alpha + EK variant was reported to more efficiently evade a neutralizing antibody response due to the additional E484K mutation but was not associated with more severe disease. The Gamma variant carried both N501Y and E484K mutations and was reported to enhance transmissibility with potential antibody resistance, but, again, disease severity was reported unchanged. The Omicron BA.1 and BA.2 variants, with numerous spike protein mutations, mediated immune escape. However, while their transmissibility increased, these variants, in general, demonstrated less severe disease outcomes. This pattern of viral evolution has been reported previously as a virus adapts to its human hosts over time, favoring transmission over severity55.

We hypothesize that non-hospitalized patients infected with an Omicron variant might demonstrate an immune response closer to healthy, non-infected individuals than to hospitalized patients infected with earlier SARS-CoV-2 variants. To explore this hypothesis, we first compared trends in the abundance of B cells, Neutrophils, T cell CD4+, and T cell CD8+ in the SARS-CoV-2 variants and the seronegative samples using four different deconvolution tools (quanTIseq, MCP-counter, EPIC, and xCell, see “Methods”) (Figs. 2 and 3). We found that all four methods, in general, recapitulated the same trends. The immune cell fractions or scores across all methods and immune cell types evolved over time to more closely resemble seronegative samples as healthy patients. We also observed that non-hospitalized patients infected with an Omicron variant more closely resembled the seronegative patients as compared to the hospitalized patients infected with earlier variants, especially as compared to the time when they were initially hospitalized.

Figure 2
figure 2

The abundance of immune cells (given by percentage or method-specific score) detected by the immune deconvolution methods quanTIseq, MCP-counter, EPIC, and xCell over all time points combined for the immune cells B cell, Neutrophil, T cell CD4+, and T cell CD8+.

Figure 3
figure 3

Cell-type fractions separated over brackets 0–5, 6–10, 11–15, 16–30, and > 30 days after hospitalization or onset of symptoms detected by the immune deconvolution methods quanTseq, MCP-counter, EPIC, and xCell for the immune cells B cell, Neutrophil, T cell CD4+, and T cell CD8+. The Gamma variant has been removed in this analysis due to poor sample size per time bracket (Suppl. Table 1a,b).

We further categorized samples into different time brackets after hospitalization or onset of symptoms (days 0–5, 6–10, 11–15, 16–30, and > 30). Over time, the projected immune cell fractions appeared to progressively align with those observed in seronegative samples, consistent with patient recovery over time (Fig. 3). Patients diagnosed with Alpha and Alpha + EK, variants associated with more severe disease, demonstrated a lengthier time until their immune cell fractions approximated those of seronegative individuals (Suppl. Fig. 3).

B cell and T cell repertoire analysis offers insights into past or current infections

In general, when an infection occurs, an 'immunological footprint' in the form of specific BCR and TCR repertoires can be identified. In this section, we investigated whether bioinformatics BCR and TCR repertoire analysis approaches (i.e., a combination of MiXCR and TRUST4) of transcriptomic data, coupled with a computational tool that associates known BCR and TCR repertoires with causes of diseases (i.e., BLASTp49,50), could be used to classify a disease cause for an admitted patient (see “Methods”).

With the computational tool MiXCR, we identified 534 unique receptor sequences, while we identified 569 sequences with TRUST4 across the variants. Of these, 492 sequences were identified by both tools, while 42 and 77 sequences were uniquely identified by MiXCR and TRUST4, respectively. This means that 81% of the sequences were found by both tools, 7% only by MiXCR, and 13% only by TRUST4 (Suppl. Fig. 4). We decided to use only the sequences identified by both tools for further analyses to ensure more reliable results. In the next step, we eliminated sequences that exhibited homology to seronegative samples to account for BCR and TCR sequences that probably lack specificity for SARS-CoV-2, given that no seronegative sample should possess them (see “Methods”, Suppl. Fig. 5). Among the residual sequences, we discerned fifteen that did not display similarity to any sequence also found in seronegative samples. A subsequent BLASTp assessment of these sequences identified anti-SARS-CoV-2 immunoglobulin hits within the top 100 matches for seven sequences (Fig. 4a, Suppl. Table 3), with most sequences stemming from samples with the BA.2 variant. The residual eight sequences predominantly align with generic immunoglobulin sequences. Two sequences among the seven identified exhibit significant importance, as indicated by their notably low E-values. These values suggest the rarity of achieving a similar score by chance for these sequences. The first sequence (CYSTDSSGNHRGVF), identified in a study by Graham et al.57, was among over 100 mononuclear antibodies (mAbs) characterized for their interaction with epitopes from individuals infected with SARS-CoV-2. This study also demonstrated that some of these mAbs possess the ability to neutralize SARS-CoV-2. The second noteworthy sequence (CQQRSNWPPTWTF) emerged from a study by Jennewein et al.58. In this study, 198 antibodies were identified, with fourteen being distinguished as neutralizing antibodies (nAbs) against SARS-CoV-2. The study further explored how some of these nAbs can block the binding of ACE-2, thereby inhibiting viral entry into cells. The sequence logo derived from all fifteen sequences highlights conserved motifs at the start (S), the end (VF), and a recurring pattern (DSS) in the center. In contrast, the intervening positions exhibit significant variability, underscoring the pronounced diversity among these sequences (see Suppl. Fig. 6).

Figure 4
figure 4

(A) number of sequences, out of fifteen identified at full sequencing depth, that matched anti-SARS-CoV-2 sequences (BLAST hits) and their presence across different variants. Note: The values in the matrix do not add up to fifteen, as one sequence can be present in multiple variants. (B) number of sequences identified at full sequencing depth compared to those identified at a sequencing depth of 50 million reads. We differentiate between sequences present in seronegative samples and those absent in seronegative samples. (C) shows a similar comparison as (B) between sequences identified at full sequencing depth compared to sequences identified at a sequencing depth of 10 million reads.

Sequencing depth analysis reveals differences in the robustness of conclusions between deconvolution and TCR/BCR results

Despite a reduction in sequencing depth, the trends observed in immune deconvolution outcomes remained consistent. Notably, there were still significant discrepancies in the levels of immune cells when comparing Alpha and Alpha + EK infections to seronegative cases with a sequencing depth of 50 million (Suppl. Fig. 7a) and with an even lower sequencing depth of 10 million (Suppl. Fig. 7b). Furthermore, temporal analysis reaffirmed these findings, indicating the recurrent trend where, across all variants, there is a convergence toward the levels observed in seronegative samples (Suppl. Figs. 3, 8a,b). Through this analysis, we demonstrated that a lower and even a very low sequencing depth of 10 million is indeed sufficient to discern the trends in immune cell levels and highlight the differential impacts of various variants on the immune system. Furthermore, when evaluating the immune deconvolution scores alongside the CBC data for patients infected with the Alpha and Alpha + EK variants, we aimed to find a lower bound of sequencing depth that still produces robust deconvolution results by downsampling reads even further to 1 million reads (Suppl. Fig. 9). Interestingly, correlation values between CBCs and deconvolution estimates remained largely stable in lymphocytes and neutrophils down to a sequencing depth of 1 million reads. quanTIseq did not detect any monocytes at lower sequencing depths, and MCP-counter and xCell show large drop-offs for correlation of monocytes at 1 million reads, down to R = 0.22, likely due to the low abundance of monocytes in our dataset (Suppl. Fig. 2).

In the repeated BCR/TCR analysis with a significantly reduced sequencing depth of 10 million, we identified only 95 unique BCR and TCR sequences. This is markedly fewer than in the prior analysis, but the decline is anticipated due to the reduced sequencing depth, which results in fewer overall sequences from the RNAseq experiments. Of the 95 sequences, 18 (19%) were solely identified by MiXCR, seven (7%) exclusively by TRUST4, and 70 (74%) were detected by both tools. This indicates that the majority of the sequences were still identifiable by both tools (Suppl. Fig. 10). After eliminating sequences resembling those in seronegative samples, we pinpointed eight unique sequences. Among these, seven were matched to anti-SARS-CoV-2 immunoglobulin sequences (Suppl. Table 4, Suppl. Fig. 11).

The sequences identified by the two BCR/TCR analyses, with full sequencing depth and low sequencing depth, differ between results. Additionally, there is a variation in the positions of the SARS-CoV-2 specific hits. At greater sequencing depth, these hits are more commonly found within the top ten. In contrast, when the sequencing depth is reduced, they are more likely to be ranked higher, and, as a result, the findings become somewhat less substantiated.

A statistical comparison like comparing the p-values for the immune deconvolution is not possible here as MiXCR and TRUST4 do not generate significance values, and the BLAST E-values represent the number of random hits that can be generated in a database of a certain size and, therefore are not suitable to compare the significance of our results but merely the reliability of each sequence match individually. Instead, we declare sequences detected using full sequencing depth as ground truth and compare their overlaps with sequences at lower sequencing depths (Fig. 4b,c), where sequences that are found in infected samples are considered positive cases (see Suppl. Materials 4 for details). With 50 million reads, we could only detect two sequences in infected samples that were also detected at full sequencing depth, while eight other sequences were, in fact, also present in Seronegative samples at full depth, leading to a sensitivity of 0.2. With 10 million reads, no more true positive cases could be detected, and the sensitivity dropped to 0.

In both analyses, we were able to find seven anti-SARS-CoV-2-related hits that appear in the first one hundred BLAST results. Notably, even though MiXCR and TRUST4 identified fewer sequences overall due to the reduced depth, the count of SARS-CoV-2 specific sequences remained consistent.

In conclusion, a sequencing depth of 10 million was adequate to detect SARS-CoV-2-related sequences, just like with greater sequencing depths. However, the latter produces more robust outcomes, as sensitivity to detect infection-related sequences drops drastically at lower depths.

Discussion

We found that the immune deconvolution tools, including quanTIseq, MCP-counter, EPIC, and xCell, generally predict similar trends in immune cell composition (B cells, Neutrophils, T cell CD4+, and T cell CD8+) across SARS-CoV-2 samples that reflect differences in severity and over time. However, we can also see large differences between individual samples. While the immune cell abundances presented in this manuscript were not validated by flow cytometry, the deconvolution methods themselves were previously evaluated using flow cytometry measurements23,59. Our computational results predict a progressive alignment of immune cell fractions with those of seronegative samples, correlating with decreased disease severity and/or individual disease progression. However, individuals with severe disease courses like Alpha and Alpha + EK show extended recovery timelines before reaching these levels, indicating a potential marker of disease severity. A confounder that should be considered in the analysis could be that SARS-CoV-2 can invade immune cells and could potentially skew the results of the immune deconvolution results60. While computational deconvolution methods are able to robustly estimate trends in immune-cell composition correctly, they do show a large variance in prediction accuracy on a sample level. This drawback is especially important when trying to use such methods in a personalized fashion. Here, prediction accuracy is not high enough to give precise results of immune-cell composition in patients. However, so-called second-generation deconvolution methods25 promise to increase prediction quality by employing scRNA-seq datasets as an additional resource in deciphering the cell-type composition of bulk RNA-seq datasets. Such tools may also reveal changes in the functional state of immune cells and thus surpass information provided by CBC measurements.

We further introduced an approach for diagnosing infections using RNA-seq with bioinformatic analysis of BCR and TCR repertoires. We speculate that patterns of BCR and TCR repertoires could be associated with different disease settings. The current system is built on known BCR and TCR repertoires associated with diseases, which means it can only be used for identifying known infections61,62. As data on BCR and TCR repertoires from different clinical settings is deposited and available for analysis, it is possible the information can be used to improve understanding of immune response in individual patients. At present, ethical considerations of detailed genomic analysis in individual patients can limit the types of information gathered and their distribution. However, anonymized data obtained through clinical trials with informed consent may still be useful in exploring how changes in TCR and BCR repertoires evolve during disease and recovery.

Our analyses demonstrate that a reduced sequencing depth of 10 million is sufficient to identify overarching trends in immune cell levels and anti-SARS-CoV-2 specific sequences, although higher sequencing depths yield more robust outcomes. Despite lower depths resulting in findings of less significance and confidence, the overall trends and correlations with CBC data remain consistent. The BCR/TCR analyses further corroborate these findings, as even at reduced sequencing depths, SARS-CoV-2-specific sequences were still identifiable. These results affirm the feasibility of using lower sequencing depths for meaningful analyses in the study of immune responses and pathogen-specific immunity, making it more feasible in a clinical setting due to lower costs.

Since 2001, genome sequencing costs have significantly decreased from $100 million to the $1000 genome milestone, reflecting similar cost reductions in RNA sequencing63. With the impending expiration of Illumina's key patents, the RNA sequencing market could see heightened competition and further price reductions, a recent article in Science just speculated about the costs being reduced to $10064. This shift might be key to embedding sequencing more deeply into routine clinical practice, making it a more accessible tool for patient care and research.

As RNA-sequencing technologies advance and become cheaper, they hold promise for future clinical utility by providing a more detailed view of global gene expression profiles. For example, quantitative polymerase chain reaction (qPCR) has already been adopted in clinical settings for its high sensitivity and specificity in detecting and quantifying microbial pathogens65 or SARS-CoV-266,67. To our knowledge, RNA-seq combined with immune deconvolution is not directly used in a routine clinical setting59; however, it has been employed in research settings analyzing whole blood sequencing datasets14. Notably, techniques other than RNA-seq, such as DNA methylation microarrays, could be used for immune deconvolution. However, in our opinion, RNA-seq data offers the greatest variety of readouts across gene expression, alternative splicing, and immune status.

Previous work has also identified specific immune cell subsets, including neutrophils, to be associated with more severe SARS-CoV2 infection68. In the future, immune deconvolution and BCR/TCR could potentially guide the decision-making of a physician, e.g., the immediate allocation of a newly admitted patient with potentially severe disease progression to the intensive care unit, recognizing that such a tool would require ongoing updates to maintain its utility for predictive modeling69,70. With more blood samplings after admission, we can also see if the disease course will change, and the medical doctor could, based on this analysis and other factors, advise the patient to be submitted to the intensive care unit71. From the experimental side, RNA-seq offers a comprehensive view of gene expression, including genes related to BCRs and TCRs. Still, it's not specifically focused on analyzing the diversity or clonality of these receptors72. In contrast, TCR-seq, designed to target T-cell receptors (and BCR-seq for B-cell receptors73), thoroughly examines the diversity and specificities within the receptor repertoires74. While RNA-seq is valuable for a broad understanding of the immune response, TCR-seq delivers more focused insights into the T-cell repertoire, crucial for studies of immune dynamics and specificities72. The choice between these techniques hinges on whether the research aims for an overall immune profile or a detailed analysis of receptor diversity and clonality72. However, since TCR-seq or BCR-seq is rare and offers a limited readout, we consider RNA-seq data more informative. From the clinical side, we have to consider that our study relies on data either primarily collected from hospitalized elderly patients (i.e., Alpha, Alpha + EK, and Gamma) or mild disease progression (i.e., Omikron BA.1 and Omikron BA.2) potentially introducing a selection bias. Moreover, differences in local healthcare systems, as well as individual patient factors (e.g., age and preconditions), could influence recovery timelines and should be factored into any broader applications of these findings. Additionally, the predictive methods used for immune cell fraction estimations, while robust and consistent over multiple sequencing depths, are not without their limitations and potential discrepancies. The stable performance in terms of correlation with CBCs across sequencing depths is likely because relative gene expression differences in signature genes are still present even at very low sequencing depths.

In addition to immune cell composition, analyzing immune cell receptors of B and T cells by employing tools such as MiXCR38 and TRUST439 in combination with BCR/TCR databases can provide a rapid determination of the type of a previously discovered virus or infection. However, with our proposed method, we find potential clonotypes but are not able to confirm if they come from a new virus variant. In addition, we noticed large decreases in sensitivity when shallower sequenced samples are available. Genomic data analysis from cell preparation, library generation, sequencing, and quality control is, with the current technology, not feasible in a matter of hours, as is the case for CBCs. Recent advances to introduce RNA-seq into clinical settings describe a complete workflow to finish in about 1 week75. One technology that is able to improve the precision of BCR/TCR detection is Oxford Nanopore sequencing. While not currently implemented in many studies of the transcriptome due to sequencing error limitations and PCR-induced distortions76, it promises to increase clonotype detection and tracking77.

In summary, we employed computational immune deconvolution tools at distinct SARS-CoV-2 data sets, illustrating that they can be used to supplement immune cell abundance estimates for bulk RNA-seq data that is not accompanied by CBC information. Additionally, these tools can be used for discerning trends in immune cell fractions during disease recovery and for comparing differences in immune cell fractions between more and less severe SARS-CoV-2 variants. Using the proposed workflow to utilize BCR/TCR methods combined with alignments and BLASTp could help to pinpoint the type of viral infection. Our presented bioinformatic strategies combined with expert medical judgment, new technologies, and automatizations could promise a path toward precision medicine, where treatment plans are personalized and optimized for each individual in the future based on individualized genetic analyses.