Incident autoimmune diseases in association with SARS-CoV-2 infection: a matched cohort study

Objectives To investigate whether the risk of developing an incident autoimmune disease is increased in patients with prior COVID-19 disease compared to those without COVID-19, a large cohort study was conducted. Method A cohort was selected from German routine health care data. Based on documented diagnoses, we identified individuals with polymerase chain reaction (PCR)-confirmed COVID-19 through December 31, 2020. Patients were matched 1:3 to control patients without COVID-19. Both groups were followed up until June 30, 2021. We used the four quarters preceding the index date until the end of follow-up to analyze the onset of autoimmune diseases during the post-acute period. Incidence rates (IR) per 1000 person-years were calculated for each outcome and patient group. Poisson models were deployed to estimate the incidence rate ratios (IRRs) of developing an autoimmune disease conditional on a preceding diagnosis of COVID-19. Results In total, 641,704 patients with COVID-19 were included. Comparing the incidence rates in the COVID-19 (IR=15.05, 95% CI: 14.69–15.42) and matched control groups (IR=10.55, 95% CI: 10.25–10.86), we found a 42.63% higher likelihood of acquiring autoimmunity for patients who had suffered from COVID-19. This estimate was similar for common autoimmune diseases, such as Hashimoto thyroiditis, rheumatoid arthritis, or Sjögren syndrome. The highest IRR was observed for autoimmune diseases of the vasculitis group. Patients with a more severe course of COVID-19 were at a greater risk for incident autoimmune disease. Conclusions SARS-CoV-2 infection is associated with an increased risk of developing new-onset autoimmune diseases after the acute phase of infection. Key Points • In the 3 to 15 months after acute infection, patients who had suffered from COVID-19 had a 43% (95% CI: 37–48%) higher likelihood of developing a first-onset autoimmune disease, meaning an absolute increase in incidence of 4.50 per 1000 person-years over the control group. • COVID-19 showed the strongest association with vascular autoimmune diseases. Supplementary Information The online version contains supplementary material available at 10.1007/s10067-023-06670-0.


Introduction
Numerous research articles have been published to date on COVID-19, the acute disease that results from infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its chronic counterpart long/post-COVID. SARS-CoV-2 is a positive-polar single-stranded RNA (ssRNA) virus that is transmitted directly by respiratory droplets [1]. After the acute phase of infection, some people develop long-lasting symptoms, known as post-COVID. This is defined by signs and symptoms that develop during or after SARS-CoV-2 infection that are consistent with COVID-19, last more than 12 weeks and cannot be explained by an alternative diagnosis [2]. Most studies so far have focused on symptoms that partly resolve over time [3][4][5][6][7]. Many studies examined a small selective sample of patients, and only a few studies included a control group or information on chronic health conditions, such as SARS-CoV-2 infection [8]. The recent evidence is insufficient to estimate the burden of COVID-19 in a population and the resulting future challenges to health care systems. Routinely collected health data, such as claims data from health insurances, offer large sample sizes to investigate rare but potentially severe outcomes in this context.
To date, different respiratory, cardiovascular, neurological, and mental diseases as well as various symptoms in the context of long/post-COVID have been studied with routine health care data [9][10][11][12][13][14][15][16]. The group of autoimmune diseases is less discussed in the literature, although autoantibodies could be found in patients after SARS-CoV-2 infection, e.g., anti-type I IFNs, anti-IFN-α, and anti-nuclear antibodies (ANAs) [17]. So far there is limited evidence on newly manifested autoimmune diseases after an infection based on several case reports and two recent cohort studies [18][19][20][21]. Furthermore, COVID-19 itself shares some similarities with systemic autoimmune rheumatic diseases, which could be a challenge for diagnostics [22,23].
Here, we investigated the association of COVID-19 and a set of incident autoimmune diseases in a large cohort of patients enrolled in German statutory health insurances.

Study design
We conducted a matched cohort study based on routine health care data as applied in a previous study [14]. In the present analysis, we compared the rates of newly diagnosed autoimmune disease between individuals with and without documented SARS-CoV-2 infection. Persons infected with SARS-CoV-2 during 2020 and matched controls were followed until June 30, 2021, for a minimum of three and a maximum of 15 months using the date of COVID-19 onset as the index date for randomly selected match groups. Following the NICE guidelines on long COVID [24] and the clinical case definition of post-COVID-19 conditions proposed by the World Health Organization (WHO) [2], we defined the post-COVID-19 phase starting 3 months after the initial diagnosis of COVID-19. Outpatient services are documented per quarter rather than on a daily basis in the German statutory health care billing system. We therefore considered a diagnosis to have been made in the post-COVID-19 phase if it was newly documented in the second quarter after the index date or later. This operationalization ensures a time interval of at least 3 months between the date of COVID-19 diagnosis and post-COVID-19 outcome incidence.

Cohorts
The COVID-19 cohort included individuals with polymerase chain reaction (PCR)-confirmed COVID-19 diagnosis (ICD-10 U07.1) in 2020. To calculate risk exposure time, we defined the onset of COVID-19 inside the quarter as the index date by using the date of an outpatient PCR test or the date of admission to a hospital with a COVID-19 diagnosis. The control cohort included individuals who were not diagnosed with ICD-10 U07.1 or ICD-10 U07.2 without a documented COVID-19 diagnosis between January 1st, 2020, and June 30th, 2021.
We excluded individuals with COVID-19 diagnosis without laboratory virus detection (ICD-10-GM: U07.2) from the COVID-19 groups and non-COVID-19 controls to reduce distortions due to misclassification. We further excluded individuals who were not continuously registered with the respective health insurance company between 2019-01-01 (or birth) and 2021-06-30 (or death), whichever came first, because relevant outcomes and preexisting health conditions may not be visible in our data. For each individual, preexisting medical conditions could be assessed for at least 12 months prior to the matching point of the COVID-19 and control cohorts. Starting from the index date, which was assigned from the COVID-19 case, matched individuals were jointly followed for a maximum of 15 months. This permitted comparison of two groups over the same period to compare their risk of developing any of the predefined incident autoimmune diseases conditional on COVID-19.

Ethics and registration
The study was approved by the ethics committee of the TU Dresden (approval number: BO-EK (COVID)-482102021) and adheres to all relevant administrative and legal regulations. The study was registered at Clini calTr ials. gov (NCT number: NCT05606198).

Data
The underlying data sources were set up for the "Post-COVID-19 Monitoring in Routine Health Insurance Data" (POINTED) consortium [14] to study the longlasting effects of the COVID-19 pandemic in Germany. The POINTED consortium is coordinated by the Center for Evidence-Based Healthcare (ZEGV) at the TU Dresden and consists of the German National Public Health Authority, the Robert Koch Institute, health research institutes, and statutory health insurances. It is funded partly by the German Federal Ministry of Health (BMG).
We used routine health care data from different German statutory health insurances: Techniker Krankenkasse, BARMER, DAK Gesundheit, IKK classic, AOK PLUS, and several company health insurance funds (InGef [25]). In total, these data cover approximately 39 million individuals, which corresponds to nearly half of the total German population. In addition to sociodemographic characteristics (age and sex) and vital status (via the date of death), we had access to comprehensive information on health care utilization in the outpatient and inpatient health care sectors. The data comprise records on diagnoses (according to the International Statistical Classification of Diseases and Related Health Problems -German Modification, ICD-10-GM), medical procedures (according to the "Operationen-und Prozedurenschluessel," OPS; German modification of the International Classification of Procedures in Medicine, ICPM), information on outpatient medical services (according to "Einheitlicher Bewertungsmassstab," EBM), and prescribed medications (according to the German Anatomical Therapeutic Chemical (ATC) Classification). Only patients from the year 2020 were selected, as this ensured that the effect was not influenced by vaccinations (Vaccination in Germany started as of 27th of December 27, 2020).

Matching
To minimize differences between the COVID-19 and control cohorts in terms of covariates that may confound relationships between outcomes and exposure, we applied 1:3 matching with replacement for COVID-19 to non-COVID-19 patients. For each individual in the COVID-19 cohort, we selected three non-COVID-19 individuals with identical age (in years), sex, and whether or not an autoimmune disease was present before the index date. We chose exact matching on these characteristics to facilitate stratified analysis. In addition, we accounted for the presence of covariates by propensity score matching. The estimation of the propensity score was based on logistic regression including all insured individuals. Given different sets of prevalent medical conditions considered as covariates, we estimated separate regression models for children/adolescents and adults.
After matching individuals with COVID-19 and controls, we excluded individuals from the match groups if they died before the beginning of the post-COVID phase, i.e., within the quarter of the COVID-19 diagnosis or the following quarter. We also excluded individuals with COVID-19 who lacked a matching partner. When analyzing specific health outcomes, we further excluded individuals from the analysis if the considered outcome was documented in two of the four quarters preceding in the outpatient setting or once in the inpatient setting. To maintain balance of cohorts regarding covariates, we excluded a complete match group of COVID-19 and control cases if the outcome was preexisting for the individual with COVID-19 or all of his or her matched non-COVID-19 control cases. For estimation, we weighted data from individuals in the control cohort with the inverse number of individuals remaining in the respective match group (i.e., weights between 1/3 and 1) to ensure that total weights in the control cohort added up to the number of individuals in the COVID-19 cohort.

Health outcomes
Based on the clinical experience of the author team, we defined 64 potential outcomes from 41 autoimmune diseases studied during follow-up 3 to 15 months after documented COVID-19 infection, e.g., assigned index date. Operationalization of these outcomes was based on inpatient and outpatient diagnoses according to ICD-10-GM. In 23 cases, a more specific definition of the outcome with suitable medication was chosen. For type I diabetes, only those cases with an insulin prescription were considered valid. A complete list of the considered outcomes and their definitions can be found in supplementary material S1.

Covariates
We used information on preexisting chronic conditions as available from 2019 health records to adjust for potential confounders in the relationship of exposure (COVID-19) and incident autoimmune diseases. The approach is similar to a previous study [14].
For each individual, we used information on preexisting health conditions in the four quarters preceding the index date. The 13 prevalent morbidities for children/adolescents and 34 prevalent morbidities were based on published evidence and clinical expertise (supplementary material S2). In addition, we included age, sex, and the number of recorded inpatient and outpatient contacts as covariates. In line with previous studies [9], we included the severity of COVID-19 as a stratification feature and differentiated between (1) individuals with outpatient diagnoses of COVID-19, (2) individuals with a hospital visit with COVID-19, and (3) individuals with intensive care and/or mechanical ventilation with COVID-19.

Statistical analyses
The incidence rates (IRs) of autoimmune diseases per 1000 person-years were estimated. Differences in IRs between COVID-19 and non-COVID-19 patients were estimated using Poisson regression models to estimate incidence rate ratios (IRRs). As a prerequisite, we derived aggregated information on each health outcome by counting incident cases of the respective autoimmune disease within the COVID-19 and control groups. Since the number of incident cases for each outcome varied across the match groups, we assigned weights to the remaining control cases that added up to 1. The pooling of individual-level data was not possible due to data protection restrictions. The different insurance datasets were therefore analyzed separately by authorized institutes or the health care research departments within the respective health insurances. Each authorized institute calculated the required aggregate statistics and provided them to ZEGV, where regressions based on combined aggregate data were performed.
To synthesize evidence across datasets, we note that point estimates from aggregate matched data are equal compared to the case of Poisson regression based on individual level data [26]. The characteristics of Poisson regression applied to aggregate count data allowed for consistent estimation of incidence rates regardless of the distribution of the outcome on the individual level when the conditional mean function is correctly specified [27]. However, variance estimates from aggregates tend to be larger, meaning that the statistical significance of the presented effects may be underestimated. Utilizing a main advantage of Poisson regression, we adjusted for differences in individual-specific times at risk (time between the index date and the end of the observation period or death) due to inclusion of these times as offset in the model. Stratified aggregation enabled us to deploy separate estimators for age, sex, prevalent autoimmune disease, and severity of COVID-19. We conducted all analyses using the statistical programming language R version 3.6.2 [28].

Description of the study population
In 2020, 38.9 million individuals were insured in one of the participating insurance companies for at least 1 day. We excluded persons not continuously enrolled in 2019 (n=2,074,654) or between January 1st, 2020, and June 30, 2021 (n=2,051,855), those with a COVID-19 diagnosis without definite laboratory confirmation (ICD-10 U07.2) (n=3,549,324), and those with a COVID-19 diagnosis in the first two quarters of 2021 (n=569,410) from the analyses (Fig. 1). From the remaining sample, 670,301 individuals with a COVID-19 diagnosis were matched 1:3 to controls. For 29 individuals with COVID-19 (0.004%), no suitable matching partner was found. After matching, there were 28,810 individuals with COVID-19 and 20,932 control   (Fig. 1).

Incidence of a new autoimmune disease
The incidence rate (IR) of any autoimmune disease 3 to 15 months after SARS-CoV-2 infection was 15

Incidence of diagnosed autoimmune diseases in subgroups
Regarding population subgroups, the IRR for an incident autoimmune disease for persons with COVID-19 compared to those without COVID-19 did not differ significantly across age groups or between men and women. However, the absolute incidence of any first autoimmune disease per 1000 person-years was considerably higher among older than younger persons, e.g., IR=25.04 and IR=19.55, among persons with documented COVID-19 in age groups 65-79 years and above 80 years of age compared to IR=4.17, among persons with documented COVID-19 below 18 years of age. We also observed a higher absolute incidence of any newly diagnosed autoimmune disease among women with COVID-19 (IR =18.02) than among men with COVID-19 (IR =11.33) (Fig. 2). Among individuals with COVID-19, absolute incidence rates as well as IRR for the COVID-19 group compared to controls for any new autoimmune disease increased according to the severity of the acute COVID-19 disease, ranging from IR=13.96, IRR=1.38, 95% CI=1.32-1.43 among persons with outpatient treatment for COVID-19, to IR =28.39, IRR=1.75, 95% CI=1.54-1.99 among those hospitalized due to COVID-19, and IR =36.96, IRR=2.28, 95% CI=1.80-2.89 among persons requiring ICU/ventilation (Fig. 2).

Discussion
The excess risk for any newly diagnosed autoimmune disease was 4.50 per 1000 person-years in this study. The highest IRRs were found for rather uncommon autoimmune diseases of the vasculitis group. For the more common autoimmune diseases, the highest estimates were found for rheumatoid arthritis, Sjögren disease, Graves' disease, and Hashimoto thyroiditis, with an approximately 40% higher rate compared to a matched cohort without SARS-CoV-2 infection. Those without a prior autoimmune disease and COVID-19 had a 43% higher likelihood of developing an incident autoimmune disease than controls, while those with any preexisting autoimmune disease and COVID-19 had a 23% higher likelihood of being diagnosed with another autoimmune disease. As expected, absolute incidence rates (IR) of any autoimmune disease were higher among women compared to men, among older compared to younger individuals and among those with to those without preexisting autoimmune disease. Comparing persons with and without COVID-19, the IRR increased with the severity of COVID-19 as indicated by hospitalization and particularly by ICU/ventilation treatment versus COVID-19 patients in the outpatient sector. Additionally, a higher IRR for a new-onset autoimmune disease was observed in children and adolescents than in adults with/without COVID-19. However, differences between age groups did not reach statistical significance.
Only two other cohort studies were found to investigate the onset of autoimmune diseases after a SARS-CoV-2 infection. The first investigated 11 endpoints and used medical record data from UK. It reported a hazard ratio of 1.22 (95% CI=1. 10-1.34). Due to the median follow-up of only 0.29 years, only 3 of 11 autoimmune diseases were significant [20]. The second study used routine health care data primarily from the USA and found much stronger and more consistent associations between 14 new onset autoimmune diseases and a SARS-Cov-2 infection. However, this previous study used less stringent case definition criteria as one outpatient diagnosis was considered sufficient to define new onset autoimmune disease. In addition, only individuals with newly diagnosed autoimmune diseases within 30 days after the index date were excluded. Both circumstances could lead to exaggerated effect estimates for autoimmune diseases. However, that study also found differences by ethnicity, while the current study using data from Germany reported results mostly for Caucasian [21].
There are several hypotheses regarding the pathogenesis of post-COVID-19, although different mechanisms most likely underlie the complex clinical picture involving multiple organ systems. Drawing parallels to other post-infectious syndromes, possible mechanisms include persistence of the virus or viral or remnants, latent virus reactivation, long-lasting tissue damage due to microclotting and chronic inflammation, and autoimmunity [29]. According to current knowledge, autoimmunity following viral infection may be triggered by mechanisms such as epitope spreading, bystander activation, molecular mimicry, and cryptic epitopes [30]. SARS-CoV-2 shares characteristics of other viruses associated with the development of autoimmunity. Acosta-Ampudia and Anaya summarized these hypotheses as follows. (1) Superantigen activity: The S protein of SARS-CoV-2 contains sequence and structure motifs similar to those of a bacterial superantigen and can bind directly to the T-cell receptor. (2) Molecular mimicry: Accumulating evidence demonstrates that the virus has structural similarity to host-derived components. (3) Neutrophil extracellular trap (NET) formation. (4) Type I interferon (IFN) response. (5) "Overt immunity" which describes the appearance of multiple autoantibodies and diverse autoimmune diseases that are significantly associated with SARS-CoV-2 [31]. These mechanisms are in line with several serological studies demonstrating the onset of IgG autoantibodies [32,33] or emergence of self-reactive B cells [34] as a response to SARS-CoV-2. Moreover, autoantibodies generated during infection are negatively correlated with SARS-CoV-2 antibodies but positively correlated with hyperinflammation markers during acute illness as well as biomarkers for certain post-acute conditions [35]. These findings highlight a potential link between autoreactivity, severity of COVID-19, and susceptibility to post-acute sequelae. Indeed, serological studies have found persisting patterns of autoreactivity in severe COVID-19 cases even after most autoimmunological markers have subsided after the acute phase [32,34]. This suggests latent autoimmunity acquired by some patients, which may lead to de novo autoimmune diseases in the long run [22,33].
Early clinical case studies reported few cases of onset autoimmune diseases following COVID-19. There is growing consensus regarding the relevance of long-term studies on this matter [24], 38. We also found the overall excess risk for a first autoimmune disease to be 4.50 per 1000 personyears, which is much smaller than previously proposed for other potential chronic sequelae of COVID-19. For cardiovascular diseases, the excess risk was 45.29 [11]; for mental diseases, it was 36.48 [12]; and for neurologic disorders, it was estimated to be 70.69 per 1000 people [13]. One reason for this could be that autoimmune diseases are less frequent and the detection time is much longer than that for other diseases. The much larger IRR for hospitalized patients and patients with ICU/ventilation was also reported elsewhere [9,15].

Strengths and limitations
The main strength of our analysis is its large dataset including more than 600,000 COVID-19 patients and a minimum follow-up period of 3 to 15 months. This unselected sample from all over Germany covers both outpatient and inpatient care and thus constitutes a unique and comprehensive source of evidence. Our analysis is based on confirmed diagnoses documented by ambulatory physicians and hospital discharge diagnosis. Accordingly, our results are not subject to possible distortions resulting from selective, incomplete, or inadequate self-reporting of symptoms but instead rely on information provided by medical professionals. To avoid confounding the relationships between outcomes and exposure, we applied matching on relevant covariates, age, sex, previous autoimmune disease and several prevalent diseases, and utilization of outpatient and inpatient care. The results were confirmed by the fact that estimates of individual outcome definitions were similar with and without additional consideration of disease-specific medication.
Due to the observational nature of our study, we could not determine a causal interpretation of the results. We could not exclude the possibility that our results were affected by unmeasured confounding, although we minimized differences between the COVID-19 and control cohorts by matching. Vaccination status could not be validly assessed in German claims data. Our results may also have been subject to greater symptom awareness of individuals following SARS-CoV-2 infection or detection bias that may have arisen if the health status of individuals after the onset of COVID-19 was more closely monitored and better documented by physicians. Individuals with a mild or asymptomatic course of COVID-19 were likely to be underrepresented in our study because SARS-CoV-2 infections may not have been documented [36], especially in the first months of the pandemic. The resulting selection of more severe COVID-19 cases may have led to higher incidence estimates in this cohort. In addition, individuals with undocumented SARS-CoV-2 infection may have been included in the control cohort. To the extent that post-COVID also occurred in individuals with undocumented infections, this misclassification induced an overestimation of incidence rates in the control group and, thus, a bias toward the null in estimators of incidence rate ratios.

Conclusion
In this large matched cohort study, COVID-19 was associated with an increased risk of being newly diagnosed with autoimmune disease 3-15 months after SARS-CoV-2 infection. The strength of the association with SARS-CoV-2 infection was most pronounced for autoimmune diseases in the vasculitis group. A more severe course of COVID-19 was associated with a higher likelihood of being newly diagnosed with autoimmune disease. Incident autoimmune diseases were significantly more common in the post-COVID-19 period in all age and sex groups. The autoimmunity hypothesis is supported by a body of evidence linking viral infections to the pathogenesis of autoimmune diseases as well as results from recent clinical and basic research demonstrating persisting autoantibodies and serological autoreactivity following SARS-CoV-2 infection in a subset of patients. Further epidemiologic, clinical, and basic science research is warranted to determine whether SARS-CoV-2 infection triggers the onset of autoimmune disease, to identify the underlying mechanisms and persons at risk, and to investigate effective means of prevention or early treatment.
Code availability The R code of the analysis can be made available upon request by the corresponding author.
Funding Open Access funding enabled and organized by Projekt DEAL. This study is supported by a grant from the German Federal Ministry of Health (Bundesgesundheitsministerium) under Grant Number ZMI1-2521NIK705.

Data availability
The raw data used in this study cannot be made available in the manuscript, the supplemental files, or in a public repository due to German data protection laws (Bundesdatenschutzgesetz). The aggregated data is stored on a secure drive at ZEGV.