Introduction

COVID-19 is characterized by a complex immune response which explains some of the observed variation in patient outcomes. In patients with a severe clinical course, some may develop a “COVID-19 cytokine storm” [1], though this term has been challenged due to a poor understanding of this response [2]. While previous publications found multiple cytokines and other immune-related proteins associated with COVID-19 outcomes [3,4,5,6,7,8], these associations were measured in small sample sizes, assessed a limited set of proteins, or did not provide a temporal analysis of the changes in cytokines during severe versus mild disease. These limitations may have contributed to contradicting results [9,10,11].

Similarly, it remains unclear if the host immune response explains a large proportion of differences in COVID-19 outcomes between males and females. While reports previously suggested that these differences were correlated with differential levels of cytokines (e.g. IL-8 and IL-18 [12]), these also suffered from small sample sizes and limited adjustment for temporal changes. These studies also likely contained many false positive associations due to multiple comparisons, as most of the sex-related differences were not replicated in other larger cohorts [13].

One way to address some of these limitations is by using high-throughput oligonucleotide-aptamer protein measurement technology [14]. These panels reliably measure thousands of blood circulating protein simultaneously, allowing for comprehensive measurements on larger number of subjects. The increase in sample size allows for better adjustments for time dependent changes in protein levels, providing a more granular understanding of their dynamics during infection. Here, we use the SOMAscan aptamer panel [15] (SomaLogic, Boulder, USA) in two prospectively enrolled cohorts from Canada and the United States (n = 580) to measure 147 proteins associated with the immune response over the first 14 days of COVID-19. This allowed us to clearly describe the temporal pattern of cytokines during COVID-19 disease progression.

By using large-scale protein measurement and accounting for temporal changes over the course of infection, we describe which proteins are likely associated with severe COVID-19, and which ones also underlie sex differences in outcomes.

Methods

Overview of study design

We used the SomaScan assay to measure 147 cytokines and other immune-related proteins in cases and controls in the Biobanque Québécoise de la COVID-19 [16] (BQC19) in Montreal, and in the Mount Sinai Biobank (MSB) at Icahn School of Medicine in New York City. We then combined those results using generalized additive models to identify proteins temporally associated with severe COVID-19.

Population

The BQC19 and MSB are hospital-based prospective cohorts enrolling subjects with PCR proven SARS-CoV-2 infections, as well as individuals who presented with signs or symptoms consistent with COVID-19, but without a microbiological diagnosis of COVID-19. For this study, the BQC19 cohort was limited to subjects enrolled at the Jewish General Hospital and Centre Hospitalier de l’Université de Montréal, both university affiliated hospitals. Demographic characteristics and clinical risk factors were obtained by medical chart review or subject interview performed by clinicians or trained research coordinators in all cohorts. Specifically, time from onset of symptoms used for all analyses were recorded by trained clinical assistants or physicians based on medical records review or patient or relatives interview.

COVID-19 case/control outcome definitions

Severe COVID-19 cases were defined as subjects with a positive SARS-CoV-2 PCR test result who either died or required invasive or non-invasive mechanical respiratory support. Mechanical respiratory support was defined as any one of the following: intubation, new positive airway pressure (CPAP) or bilevel positive airway pressure (BiPAP) ventilation, or high-flow nasal cannula. Controls were defined as any subjects with a positive PCR test who did not require invasive ventilation, or any subject with signs or symptoms consistent with COVID-19, but who had negative PCR tests for the virus. However, we also excluded participants who with severe non-COVID-19 disease (i.e. participants with respiratory support as defined above, but not due to COVID-19). This control definition was chosen to emphasize severe COVID-19 specific immune responses, as compared to a general hospital population.

Protein measurements

We used the SOMAscan (v4) platform to measure 5284 circulating proteins from each participant, and then prioritized 147 immune-related proteins for the analysis. These proteins were selected to include all available interleukins (n = 38), CC motif chemokines (n = 23), CXC motif chemokines (n = 14), interferons (n = 17), toll-like receptors (n = 6), and immunoglobulins (n = 5) available from the SOMAscan panel, as well as 6 other proteins (G-CSF, GM-CSF, M-CSF, MIF, TNF-α, TNF-β) known to be involved in viral immune responses [17,18,19]. We also included all 38 soluble interleukin receptors measured by SOMAscan. These soluble receptors act as decoy receptors for their respective interleukins. Biologically, they bind to their interleukins in the circulation, preventing them from binding membrane-bound receptors, and having their usual biological effect. Their action may predict the effect of pharmacologic interleukin receptor blocking agents [20]. Owing to differences in the choice of aptamers in each SOMAscan panel, of the 147 proteins available in the BQC19 cohort, 15 were not available in the MSB cohort (IL-2, IL-7, IL-9, IL-34, IL-37, IL-12RB2, CCL1, CCL3, CXCL2, IFNB1, TLR2, IgD, IgE, IgG, and IgM). The full protein list in each cohort is available in Additional file 1.

To reflect acute illness, we limited this study to samples collected within 14 days of symptom onset (i.e. one sample per participant). To better control for the effect of COVID-19 treatment on circulating protein levels, we limited our analysis to only the first measurement of circulating proteins per subject, since these samples were less likely to be collected from individuals already starting therapy for severe COVID-19.

Samples were obtained and processed as per the manufacturer’s instructions. Briefly, blood samples were collected in acid-citrate-dextrose tubes (to prevent coagulation) and frozen at − 80 °C until analysis. Protein levels were measured using resonance fluorescence units, and further normalized and calibrated by SomaLogic to remove any systematic bias (e.g. batch effects). For the statistical analysis, we further standardized protein levels by subtracting their mean and dividing by their standard deviation to allow for easier interpretation and analysis.

Statistical analysis

To find clusters of proteins that varied together, we first drew Spearman’s correlation heatmaps within cases and within controls separately. This was also done in both cohorts separately (i.e. 4 times in total). To better visualise Spearman correlation clusters, the proteins were ordered using a hierarchical clustering algorithm with the “complete linkage” method (implemented with the hclust base function in R [21], with default settings).

Second, to adjust for the time of onset of symptoms, which is expected to affect protein levels, we fit generalized additive models [22] (GAMs) on each protein levels during the first 14 days since symptom onset in cases and controls. In short, this analysis aims to model the natural history of protein levels by using measurements done at clinical presentation on different subjects (which have different time from onset of symptoms on presentation). GAMs fit spline between different immunity related proteins from days 1 to 14, it is therefore uniformly more powerful than dichotomizing protein levels in two time periods and comparing their levels. The GAMs were fit using cubic regression splines, the restricted maximum likelihood (REML) method, and with up to 15 knots allowed (the model chooses the optimal number of knots). All models were also adjusted for subjects’ age and sex. The GAMs obtained in the two cohorts were de-identified and meta-analyzed (if measured in both cohorts) using the metagam package [23] (v0.2.0). The resulting meta-analyzed models were then plotted for a 65-year-old male and female (65 was chosen because it was the mean age in the BQC19 cohort).

To check if the protein levels were different between cases and controls, we used GAM ANOVA using a model without case/control status as predictor of protein level as the nested model. Similarly, we used GAM ANOVA with nested models with and without sex variables to check for difference in cytokine levels between sexes. Approximate p-values for this null hypothesis of no difference between cases and controls were obtained using GAM ANOVA. GAMs were fitted using the mgcv package [22] (v1.8–33). Sample code is available in Additional file 2. Finally, GAM ANOVA p-values were meta-analyzed across cohorts using the logistic method with the metap package [24] (v1.4). We considered that protein levels differed between cases and controls if the resulting p-value was below Bonferroni correction (alpha = 0.05/147 = 0.0003). We acknowledge that this correction is overly conservative due to the correlatedness of protein levels.

All analyses were done using R [21] (v4.0.3).

Table 1 Subject characteristics in the two participating cohorts. Numbers presented as count (percentage) except where otherwise 570 noted. Hypertension information was not available for the Mount Sinai Biobank cohort.

Results

Population

Table 1 shows basic characteristic of the participants in each cohort. Mean age was similar between cases and controls in the BQC19 (67.2 vs 66.2 year-old), but cases were slightly older in the MSB (64.8 vs 59.2 year-old). In both cohort, there were less females amongst cases compared to controls: 38.5% vs 55.0% in the BQC19, and 41.2% vs 44.5% in the MSB. There were more diabetic cases than controls in the BQC19 (41.8% vs 29.3%) but a similar proportion in the MSB. In both cohorts, there were more cases with chronic obstructive pulmonary disease: 17.6% vs 11.2% in the BQC-19, and 11.8% vs 6.3% in the MSB. There were also more heart failure diagnoses in cases in both cohorts: 14.3% vs 11.6% in the BQC-19, and 11.8% vs 8.6% in the MSB. Finally, there were less never-smokers amongst cases: 42.9% vs 71.1% in the BQC-19, and 39.5% vs 50.8% in the MSB. These values are comparable to other reported large COVID-19 cohorts [25].

Immune-related protein levels dynamics over time

Many cytokines and related proteins showed statistically significant time-dependent differences between cases and controls (Bonferroni threshold 0.05/147 = 0.00034): 17 of the 38 interleukins, 24 of the 38 soluble interleukin receptors, 11 of the 23 CC chemokines, 6 of the 14 CXC chemokines, 8 of the 17 interferons related proteins, and 3 of 17 other immune-related proteins (Table 2).

Table 2 Immune-related proteins with differences between severe COVID-19 cases and controls in our meta-analysis of the BQC-19 and MSB results (Bonferroni adjusted threshold 0.05/147 = 0.00034)

Hierarchical clustering and Spearman correlation delineated clear clusters of proteins that varied together over the course of infection (Fig. 1 and Additional file 3). Visual inspection of the BQC19 with the MSB heatmaps reveals three large protein clusters whose members show similar changes in levels. Importantly, the cluster with the highest proportion number of proteins showing an association with case and control status (cluster A; 31 out of 49 proteins) is also the one with the highest mean absolute Spearman correlation in both cohorts. Cluster A also showed a clear increase in Spearman correlations between cases and controls (mean increase of 0.163 across cohorts), supporting the fact that an immune overactivation underlies severe COVID-19 (Additional file 3). Proteins in this cluster show increasing levels in cases over the first 14 days of symptoms, whereas they remain stable in controls. In contrast, clusters B and C showed a negative correlation with cluster A, with higher but slowly tapering protein levels in controls. These clusters did not replicate as clearly between cohorts and have a lower proportion of proteins associated with case and control status.

Fig. 1
figure 1

Spearman correlations for three clusters (A, B and C) of proteins in the BQC (left) and the MSB (right). Only correlations with p-values less than 0.05 shown. Proteins with asterisks (***) showed a statistically significant differences between cases and controls (Bonferroni threshold 0.05/147). Full spearman correlation heatmap available in Additional file 3

Each cluster contained a heterogeneous set of proteins (Additional file 4), with many replicating previously published findings [3,4,5,6,7], supporting the robustness of our results. Of note, many of the interleukin-10 family cytokines (IL19, IL20, IL24) or their soluble receptors (IL10RB, IL20A, IL22RA1 and IL22RA2) were present in cluster A. Other notable proteins found in cluster A include two members of the IL1 family (IL1B and IL36B), IL11 (a member of the IL6 family which both act on the same receptor [26]), multiple members of the IL17 family (IL17A, IL17B, IL17D, IL17F, and IL25), and IL4 and IL13 which both act on the same receptor to drive severe asthma [27]. Interestingly, clusters B and C contained some proteins related to those in cluster A which still showed significant differences between cases and controls. These included IL1A, IL4R (the soluble receptor for IL4), and IL10RA (a component of the soluble receptor for IL10) in cluster B, and IL1R1 (a soluble receptor for IL1) in cluster C. Figure 2 shows some representative proteins from each cluster and from the rest of the proteins, and all protein level plots are shown in Additional file 5.

Fig. 2
figure 2

Smoothed curves for cluster-representative immune-related proteins, as a function of days since symptoms onset (x-axis), and separately for severe COVID cases and controls. Estimated curves are shown for 65-year-old. Y-axis is standardized to a mean of 0 and standard deviation of 1. Full results are shown in Additional final 5. Blue: controls. Red: severe COVID-19. Asterisks (***): p < 3.4 × 10–4 for case–control difference in protein levels

Differences in cytokine levels between sexes in cases and controls over time

Of the 147 proteins, 5 showed a difference between males and females: TLR5, CXCL17, CCL28, CCL26, IL1RL2, and IL3RA. However, only the last three were also associated with case and control status (Fig. 3). CCL6 was found in the previously described cluster A of highly associated proteins, while IL3A was in cluster C. Both proteins had slightly different levels on the first day of symptoms (higher in females for CCL26, lower in females for IL3RA) but trended similarly afterwards. ILRL2 did not cluster well with other proteins and decreased towards normal levels more rapidly in females. Of note, IL3RA is the only one of these proteins for which the corresponding genes is located on a sex chromosome (chromosome X).

Fig. 3
figure 3

Smoothed protein level curves showing time-related and sex-related differences as a function of days since symptoms onset (x-axis) in a 65-year-old patient (p < 3.4 × 10–4 for sex differences in cytokine levels). Y-axis is standardized to a mean of 0 and standard deviation of 1 F: female. M: male. Blue: controls. Red: severe COVID-19

Discussion

In this study, we used two large prospective cohorts and a panel measuring 147 circulating immune-related proteins and found that severe COVID-19 was associated with a clear activation in many immune-related proteins, with most protein levels varying together closely overt time. These results also provided three proteins that were importantly different between the sexes: CCL26, IL1RL2 and IL3RA. These 3 sex-specific protein findings were not found in previous reports, which is partly explained by the fact that our panel included more proteins than other studies, but this could also suggest false positive associations due to multiple testing. Hence, while most of the changes in immune-related proteins observed in severe COVID-19 are shared across sexes, it remains possible that disparity in outcomes between sexes may be mediated by differences in immune-related proteins levels.

Our study’s main strengths include its large sample size with strong replication between the two independent cohorts, the large protein measurement panel and the fact that proteins were measured at different times during infection, a feature that was explicitly modelled into our analysis. These provided a granular depiction of time-dependent immune responses to COVID-19 and explain previously discordant reports on the association between different immune proteins and COVID-19. For example, visual inspection of the interferon level dynamics will often reveal that the differences in measurement timing can easily explain previously reported differences in direction of associations [11]. Our large sample and careful adjustments for multiple comparisons also likely avoided spurious associations.

Other studies have assessed at protein levels in acute COVID-19, and we share many of the same observations already made. For example, severe COVID-19 was linked to changes in IL-13 [3], IL6 and IL-1B [4], and multiple chemokines (e.g. CCL20, CCL27, CXCL10 [28]). Hence, while it is clear that immune-related proteins levels are associated with outcomes, differences in methodology led to varying observations. For example, Lucas et al [3] observed differences in IL-1B, IL-6, IL-18, and TNF-α between severe and non-severe individuals using the Eve Technologies (Calgary, Alberta, Canada) Luminex based HD71 assay. However, using a different Luminex-based assay, Wilson et al [29] found no such difference in these 4 cytokines between severe COVID-19 cases and non-COVID-19 sepsis controls. Different results were again obtained from Filbin et al [30] who used the Olink (Uppsala, Sweden) multiplex antibody-oligonucleotide assay to highlight IL6, IL-1RL1, and IL-1RN’s role in severe COVID-19. As mentioned above, these differences can likely be explained by either small sample sizes, insufficient control for time of onset of symptoms, and different choices of cases and controls. Indeed, we replicated the IL6, IL-1RL1, and IL-1RN results from Filbin et al. which is to our knowledge the previously largest proteomics study on acute Covid-19. This study used similar methods to ours but adjusted for day of hospitalization rather than onset of symptoms. Comparisons to other studies should therefore also keep these methodological differences in mind.

Less is known about the role of immune-related proteins in COVID-19 outcome sex differences. A previous report [12] suggested a role for IL8 and IL18, but these were not replicated in other studies [13]. Our study is the first to report on difference in TLR5, CXCL17, CCL26, IL1RL2, or IL3RA levels in sexes during infection. While the mechanism by which they could influence outcomes is unclear, it is worth noting that the gene encoding for IL3RA is located on the X chromosome, providing a plausible explanation for the observed difference. Further CCL26 (also known as eotaxin-3) is known to induce eosinophils tissue infiltration [31], which could influence COVID-19 outcomes [32]. However, multiple studies have shown differences in cellular immune responses, COVID-19 specific antibody levels, and many commonly measured inflammatory markers in clinical practice (e.g. C-reactive protein) [12, 33]. Hence, it remains possible that another immune pathway that was not measured by our panel might be involved in the observed sex differences in outcomes. However, our observations on TLR5, CXCL17, CCL26, IL1RL2, and IL3RA provide clear proteins to explore to explain sex differences in COVID-19 outcomes.

Nevertheless, our study still has limitations. First, while we assayed proteins in the first collected samples, it remains possible that some subjects received immunomodulatory drugs (e.g. dexamethasone) which would have affected protein levels. However, this would likely attenuate the differences between the cases and the controls, and our results would therefore be biased towards the null hypothesis. Second, given that protein time trends were obtained using multiple different subjects, unmeasured confounders could explain some of our findings. While these cannot be easily measured, it is reassuring that our results replicated across two cohorts, arguing against the presence of confounders with large effect sizes. Third, the control group made up of non-severe COVID-19 participants as well as non-COVID-19 disease may have biased some of our results towards the null. However, this specific choice of control arm made our results more specific for severe COVID-19, rather than critical illness in general, and we still found clear associations with many immune-related proteins. Fourth, the use of SOMAscan may make comparisons difficult with other studies using different protein measurement technologies. Despite this, SOMAscan showed great sensitivity, specificity, and reproducibility when benchmarked against mass spectrometry [34], and our conclusions are unlikely to be greatly biased by the choice of protein measuring platform. Lastly, while this is one of the largest panels of immune-related proteins studied for COVID-19, there are multiple proteins that were not measured, and we cannot assess whether other unmeasured proteins may also have important effects on the outcomes.

In conclusion, using two large independent cohorts with broad protein measurements, we showed that severe COVID-19 was associated with clear time-dependent changes in multiple immune-related proteins, and that these may in part explain difference in COVID-19 outcomes between sexes.