Summary box

What is already known on this topic

  • A subsection of the population infected with SARS-CoV-2 go on to experience post-COVID-19 condition or long COVID.

  • Preliminary findings, such as case reports of post-COVID-19 immune-mediated inflammatory diseases, increased autoantibodies in COVID-19 patients, and molecular mimicry of the SARS-CoV-2 virus have given rise to the theory that long COVID may be due in part to a deranged immune response.

What this study adds

  • SARS-CoV-2 infection was associated with a 22% relative increase in the risk of developing certain immune-mediated inflammatory diseases, including type 1 diabetes mellitus, inflammatory bowel disease, and psoriasis.

  • These findings support the hypothesis that a subgroup of long COVID may be caused by immune-mediated inflammatory mechanisms.

Background

Emerging in late 2019, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), the virus causing the coronavirus disease-2019 (COVID-19) pandemic, as of March 2023, resulted in over 6 million deaths worldwide [1, 2]. The acute presentation can range from being completely asymptomatic to sepsis, organ failure and death [3]. The effects of COVID-19 are not limited solely to acute infection but have also manifested in a series of post-acute sequelae commonly referred to as long COVID or post-COVID-19 condition [4, 5].

The World Health Organisation define this as symptoms occurring in people with a history of probable or confirmed SARS-CoV-2 infection three months after the onset of COVID-19 that cannot be explained by an alternative diagnosis [5, 6]. With over a third of people with COVID-19 reporting persistent symptoms and over 1.7 million UK residents self-reporting the condition, long COVID is emerging as one of the major public health challenges of the modern era [7, 8]. Despite this, the pathogenesis behind the condition remains unclear [4, 9].

One theory is that SARS-CoV-2 infection causes an inappropriate immune response that leads to the varied symptoms of long COVID. This arose from evidence of a marked and persistent increase in autoantibodies in patients with COVID-19 compared to uninfected controls and high rates of patients hospitalised with COVID-19 being transiently positive for anti-phospholipid (aPL) antibodies [10, 11].

Some of these autoantibodies were also deemed as potential risk factors for long COVID [12]. Several systematic reviews have collated case reports of patients with a history of COVID-19 who have experienced deranged immune manifestations. Tang et al. found 187 reports and Novelli et al. found 382 reports of autoimmune-like phenomena following COVID-19 [13, 14]. Among those with a history of COVID-19, one review reported thyroid dysfunction in up to 20% of patients, which is linked with B and T-cell autoimmunity [15].

Autoimmunity may be due to the degree of homology existing between some human self-proteins and components of SARS-CoV-2, a phenomenon termed molecular mimicry [16]. Molecular mimicry combined with the immune system dysregulation that occurs during SARS-CoV-2 infection may be the mechanism driving the development of immune-mediated inflammatory diseases. Alternatively, the reaction could arise from tissue damage and the release of autoantigens as a result of SARS-CoV-2 infection.

This preliminary evidence has been derived largely from case series, case reports, small cohort studies, or systematic reviews of these study types, which are weak study designs for ascertaining causal inference. Stronger study designs are needed that include appropriate control groups and large sample sizes. Furthermore, the data were drawn largely from patients with moderate or severe COVID-19, which underrepresents the mild or asymptomatic cases that make up most SARS-CoV-2 infections and that can also go on to develop long COVID [17]. To address these limitations, we conducted a retrospective matched cohort study using data from a large primary care database to assess the incidence of immune-mediated inflammatory diseases (IMIDs) in patients with SARS-CoV-2 infection compared to matched individuals with no record of SARS-CoV-2 infection.

Methods

Study design and data source

A retrospective cohort study was undertaken using data extracted from the Clinical Practice Research Datalink (CPRD) Aurum database between the 31st of January 2020 and the 30th of June 2021. The CPRD Aurum database consists of routinely collected, pseudo-anonymised data from general practices across England [18]. The data were extracted using the data extraction for epidemiological research (DExtER) tool, which facilitates extraction based on predefined parameters [19].

Study population

Patients were eligible to enter the study if they were at least 18 years old at the study start date, had no prior history of the IMIDs included in the primary outcome (see below), had an acceptable patient flag indicating provision of good quality data, and if they were registered with an eligible general practice for at least 12 months to allow sufficient time for recording baseline information.

Exposure

All patients with a SNOMED-CT coded diagnosis of either a positive reverse transcriptase polymerase chain reaction (RT-PCR) or lateral flow antigen test for SARS-CoV-2 were included in the exposed cohort, and the date of coded diagnosis was assigned as the index date. Patients with a suspected COVID-19 diagnosis were not included to increase the specificity of the exposure definition. For each exposed patient, up to four patients were selected who did not have a coded record of a positive RT-PCR or lateral flow antigen test, or a diagnosis of suspected or confirmed diagnosis of COVID-19, and were matched on age, sex and registered general practice. This made up the unexposed cohort. The same index date of the exposed patients was assigned to the corresponding matched unexposed patients to avoid immortal time bias [20]. Data from the COVID-19 Second Generation Surveillance System was not used for this study as it comprised of data from swab testing in Public Health England (PHE) labs and NHS hospitals primarily for hospitalised patients and healthcare workers as opposed to data from the wider population which was required for this study.

Outcomes

The primary outcome was a composite of the incidence of any of the following IMIDS: autoimmune thyroiditis, coeliac disease, inflammatory bowel disease (IBD), myasthenia gravis, pernicious anaemia, psoriasis, rheumatoid arthritis (RA), Sjogren’s syndrome, systemic lupus erythematosus (SLE), type 1 diabetes mellitus (T1DM), and vitiligo. These conditions were selected as they cover a range of different systems and constitute many of the most prevalent IMIDs in the UK. The secondary outcomes were the individual diseases included in the primary outcome, to discern which of these IMIDs, if any, had the strongest association with SARS-CoV-2 infection. SNOMED-CT code lists used for the ascertainment of each IMID, as well as the exposure codes, are given at https://github.com/Umer-Syed/COVIDAutoimmune. In light of the CPRD policy on data governance, we have not reported outcomes that had below five events due to disclosure risk.

Follow-up period

Participants were followed up from the index date to the end of the follow-up. The end of follow-up was defined as the earliest of any of the following: a coded diagnosis of an IMID, date of death, study end date (30 June 2021), date of practice de-registration, and date of the last practice contribution to the CPRD Aurum database.

Covariates

Age, sex, body mass index (BMI), smoking status, ethnicity, previous exposure to relevant viral infections (Epstein-Barr virus (EBV), human cytomegalovirus (CMV), human herpesvirus 6 (HHV-6), human T lymphotropic virus type 1 (HTLV-1), hepatitis C virus (HCV), influenza A virus, and parvovirus B19), and previous prescriptions of selected medications (procainamide, hydralazine, quinidine, and isoniazid) were included as potential confounders. Previous studies found these variables to be associated with at least one of the outcome IMIDs and were thus adjusted for in the analysis [21,22,23,24,25,26,27,28,29,30,31,32,33].

Age was divided into the following bands: 18 to 29, 30 to 39, 40 to 49, 50 to 59, 60 to 69, and ≥ 70 years. Ethnicity was identified through SNOMED CT codes and was classified into the following groups: white, South Asian, black, mixed ethnicity and other. BMI was divided in accordance with the WHO classification: underweight (body mass index (BMI) < 18.5 kg/m2), normal weight (18.5–24.9 kg/m2), overweight (25–29.9 kg/m2) and obese (≥ 30 kg/m2) [34]. Smoking status was categorised as current smoker, ex-smoker and never smoked. A separate ‘data missing’ category was used where data were missing for ethnicity, smoking status, and BMI.

Statistical analysis

Baseline characteristics of patients stratified by their exposure status were summarised using simple descriptive statistics. The number and percentage of each of the outcome events for the unexposed and exposed cohorts were reported and the crude incidence rates per 1000 person-years were calculated. Cox proportional hazards regression models were used to estimate the unadjusted and adjusted hazard ratios (HRs) with 95% confidence intervals (CI), for each of the outcomes among patients in the exposed and unexposed cohorts. P-values below 0.05 were considered statistically significant. In order to ensure our analysis was valid, a calculation to determine the Schoenfeld residual was undertaken. If this test yielded a value of < 0.05, then the data was not normally distributed and thus the proportional hazard assumption would not be met. All analyses were conducted using Stata Version 17, the do-file for this is given at https://github.com/Umer-Syed/COVIDAutoimmune.

Results

Study population

We identified 458,147 patients with confirmed SARS-CoV-2 infection and matched them to 1,818,929 patients who lacked a confirmed or suspected diagnosis of COVID-19. Table 1 shows the baseline characteristics of patients in both cohorts. The mean age was 43.6 years (SD 17.1) in the exposed cohort and 42.8 (SD 18.0) in the unexposed cohort. Both groups had slightly more females than males (54.7% versus 45.3%, respectively). A slightly larger proportion of the exposed cohort were of white and South Asian ethnicity compared to the unexposed group (64.4% versus 59.4%, and 12.2% versus 10.6%, respectively). However, the unexposed cohort had a slightly higher amount of missing ethnicity data (21.6% versus 16.2%, respectively). The mean BMI was similar between groups but there were slightly more current smokers in the unexposed cohort (26.5% versus 22.1%, respectively). Exposure to the selected infections and medications was similar between both groups.

Table 1 Baseline characteristics of the exposed and unexposed cohorts

Primary analysis

Six hundred ninety-six (0.15%) patients in the exposed cohort developed the primary outcome compared to 2230 (0.12%) within the unexposed cohort. The median (interquartile range [IQR]) follow-up was 0.29 years (0.24–0.42) for both groups. The results of the primary analysis are reported in Table 2 and Fig. 1. The crude incidence rate (IR) per 1000 person-years was higher for the exposed cohort than the unexposed cohort (4.59 versus 3.65 per 1000 person-years, respectively). This yielded a crude hazard ratio of 1.26 (95% CI 1.16–1.37) for the composite primary outcome. When adjusted for pre-selected covariates, the HR slightly reduced to 1.22 (1.12–1.33) but remained statistically significant. The proportional hazard assumption was met based on Schoenfeld residuals for the composite outcome. Furthermore, a matched analysis yielded a hazard ratio of 1.25 (95% CI 1.15–1.36). Characteristics of patients stratified by their primary outcome status have been tabulated in Additional file 1: Supplementary Table 2.

Table 2 Incidence rates and HRs for the composite outcome
Fig. 1
figure 1

Forest plot of Adjusted HRs for IMIDs. *aHR = Adjusted Hazard Ratio, CI = Confidence Interval, IBD = inflammatory bowel disease, RA = rheumatoid arthritis, SLE = systemic lupus erythematous, Type1DM = type 1 diabetes mellitus

Secondary analysis

Table 3 and Fig. 1 report the results for each individual IMID as separate outcomes. Of the eleven conditions, SARS-CoV-2 infection was significantly associated with an increased incidence of T1DM, IBD and psoriasis. T1DM was 56% more likely to occur in the exposed cohort compared to the unexposed cohort (aHR 1.56, 95% CI 1.09 to 2.23). IBD was 36% more likely to occur in the exposed cohort compared to the unexposed cohort (aHR 1.36, 95% CI 1.18 to 1.56). This was the most common IMID to be diagnosed during the study period (39.6% of all IMIDs diagnosed in the exposed cohort and 36.6% in the unexposed cohort). Psoriasis was 23% more likely to occur in the exposed cohort compared to the unexposed cohort (1.23, 1.05 to 1.42) and was the second most diagnosed IMID, representing more than 30% of all new diagnoses of IMIDs in both cohorts.

Table 3 Incidence rates and hazard ratios for individual IMIDs

Discussion

Main findings

Exposure to SARS-CoV-2 infection was associated with a 22% relative increase in the incidence of any of the eleven IMIDs considered in our study compared to a matched unexposed group during the same period. This was after adjustment for several important confounding factors and during a relatively short period of follow-up. We also found that this association was specific to an increased incidence of T1DM, inflammatory bowel disease, and psoriasis in the SARS-CoV-2 infected cohort.

Comparison with existing literature

The relatively high incidence of psoriasis in the SARS-CoV-2 infected cohort is supported by other reports from the literature which found increased cases of psoriasis, and flares of existing disease, following COVID-19 [13]. Evidence on the incidence of IBD following COVID-19 is scarcer, although ulcerative colitis has been reported to develop post-infection [13]. A systematic review on T1DM and COVID-19 noted that between 1.77 and 15.6% of newly diagnosed patients, depending on the study, had preceding COVID-19 [35].

SARS-CoV-2 may be associated with IMIDs due to several putative mechanisms that result in the release of autoantibodies following infection. All three conditions that were found to have a significantly increased incidence following SARS-CoV-2 infection in our study have at least a limited association with autoantibodies. T1DM is associated with islet cells and other autoantibodies, psoriasis is linked with anti-nuclear antibodies (ANAs) and inflammatory bowel disease has a limited association with pancreatic autoantibodies (PAB) [36,37,38]. The reason for the increased incidence of these conditions following SARS-CoV-2 infection is unclear as they are not typically the most strongly associated with the presence of autoantibodies. This requires further exploration in future mechanistic studies.

Strengths and limitations

A large sample size was included, which provided sufficient statistical power to assess for differences in the incidence of IMIDs between the exposed and unexposed cohorts over a relatively short follow-up period. This also allowed us to assess the relative incidence of eleven of the more common IMIDs across the two comparison groups. We included IMIDs in our outcome such as T1DM, that are likely to be well-recorded in primary care records. The use of primary care data meant that we were able to adjust for important demographic and clinical risk factors that are known to be associated with the incidence of IMIDs. The use of data from practices across a national database also improved the generalisability of our findings.

The study had several limitations. We had missing data for ethnicity (22% missing), BMI (18%), and smoking status (7%), which we accounted for in our analyses using a missing category variable. However, these missing data could lead to biased effect estimates. We also did not have access to data on socioeconomic status but partially accounted for this by matching patients in the unexposed and exposed cohorts on general practice, which would result in patients from both groups sharing their approximate residential geography, which is associated with socioeconomic status.

There is likely to be a degree of misclassification bias between the exposed and unexposed cohorts. There was little community testing for SARS-CoV-2 infection in the first wave of the pandemic, so some members of the unexposed cohort may have been infected but not diagnosed. IMIDs may also have been underdiagnosed during the study period due to the relative inaccessibility of healthcare services during the early phase of the pandemic. It is possible that only more severely affected patients with IMIDs presented to healthcare services during this period.

The study period was restricted as data availability only covered from 31 January 2020 to 30 June 2021. This encompassed three national lockdowns where reduced healthcare appointments led to a backlog of up to 300,000 patients waiting over a year for treatment [39, 40]. Beyond this period, there was reduced availability of community testing for SARS-CoV-2 infection in the UK at a time when an increasing proportion of the population had experienced at least one episode of COVID-19, thus diminishing future comparator populations.

The short follow-up period may have diluted the effect size and power of the study as IMIDs tend to have a clinical latency period and thus the full scope of the potential impact of SARS-CoV-2 infections is likely to have been underrepresented [41]. It also cannot be confirmed whether the true onset of these conditions preceded SARS-CoV-2 infection or the matched index dates. However, we would expect these issues to equally bias our estimates of disease incidence in both the exposed and unexposed cohorts and would therefore not anticipate it affecting the hazard ratios. There also exists the possibility that patients experiencing COVID-19 may have accessed healthcare services more than those with no prior infection and thus had more opportunities to be diagnosed with IMIDs. Likewise, patients with underlying IMIDs may have had their symptoms exacerbated by COVID-19 which resulted in seeking healthcare services and subsequent diagnosis.

Implications for practice, policy, and research

Our findings provide epidemiological evidence that SARS-CoV-2 infection is associated with an increased risk of a range of IMIDs, including T1DM, IBD, and psoriasis. This provides evidence that autoimmunity may be a potential mechanism that accounts for some of the longer-term symptoms and health impacts of a subgroup of those with long COVID. This is particularly of interest given the finding that women are generally at increased risk of both IMIDs as well as Long COVID, that symptoms of long COVID are diverse and often overlap with those of IMIDs, and that the symptoms of both IMIDs and long COVID characteristically follow a relapsing–remitting pattern over time [42].

Further epidemiological studies with a longer follow-up period are needed to confirm our findings and to test for relevant autoantibodies in the serum of participants to correlate with symptoms and clinical findings. These studies could also include other rarer IMIDs potentially associated with COVID-19 such as Guillain-Barré syndrome [14]. Evidence suggests that those who have been vaccinated against COVID-19 are approximately half as likely to develop symptoms lasting over 28 days than unvaccinated individuals [43]. It would be valuable to know if these differences in long COVID incidence rates are also associated with differences in the incidence of IMIDs.

Conclusions

SARS-CoV-2 infection was associated with an increased incidence of several IMIDs, including type 1 diabetes mellitus, inflammatory bowel disease, and psoriasis. This lends support to the hypothesis that the long-term effects of COVID-19 or long COVID may in part be related to autoimmune mechanisms. Further research is needed to replicate these findings in other populations, over a longer time period and to sample autoantibody profiles in people with long COVID and matched control groups.