Introduction

Rheumatoid arthritis (RA) is one of the most common autoimmune diseases, and globally, its incidence is rising [1]. Although RA has been described for thousands of years, its etiology remained elusive until Caplan et al. described the first strong risk factor in the 1950s: coal exposure [2]. Three decades later in 1983, one of the first observations of a possible genetic association was published, showing 22% of RA patients had a first-degree relative with RA [3]. We now know that indeed having a first-degree relative with RA increases risk for RA nearly threefold [4, 5]. In 1987, another respiratory exposure, silica, was also found to be a risk factor for RA [6], with a recent studies confirming it increases odds of RA more than twofold [7, 8]. That same year in 1987, tobacco smoking was first found to increase risk of RA [9]. Multiple subsequent studies have shown tobacco smoking increases odds of RA over twofold, especially for individuals with increased smoking duration as compared to intensity [10, 11].

Compared to this relatively slow progress in understanding RA etiology from the twentieth century, recent years have seen a relative explosion in risk factors and mechanisms for RA etiology. In this manuscript, our goal is to synthesize the key findings and most significant trends from the past 5 years.

Respiratory Exposures

As described above, the first well-established risk factors for RA were respiratory exposures. One of the most significant trends from the past several years has been the discovery of additional respiratory triggers for RA. For example, an increasing number of studies have shown an association between asthma and development of incident RA [12], with a recent meta-analysis confirming this association [13]. The same group performed a meta-analysis of allergic rhinitis, finding a mild association with incident RA when high-quality studies alone were examined [14]. Viral infections were recently shown to increase risk of RA as well, which may be of particular relevance given the ongoing coronavirus pandemic [15]. Mycoplasma pneumonia was also recently linked to incident RA, especially in the elderly and the two years following infection [16]. Indeed, a large study within the Epidemiological Investigation of RA (EIRA) cohort in Sweden showed all types of respiratory diseases (acute and chronic, upper and lower) to be associated with incident RA with odds ratios of two- to threefold in nonsmokers [17].

Not only respiratory diseases but also other types of respiratory irritants have been increasingly shown to be associated with RA. A cohort study of 7600 Canadians showed an association between industrial fine particulate matter and ACPA positivity [18], whereas another cohort study in China showed a significant association between traffic-related air pollutants and RA readmissions [19]. In two large cohort studies, passive smoke [20] and fertilizer exposure [21] in childhood were also linked to RA. Indeed, among adults, inhaled respiratory exposures including not only fertilizers but also solvents and painting were associated with RA [22]. Furthermore, occupations with inhalational exposures such as bricklayers, concrete workers, and electrical workers are also associated with RA [23]. Together, these data provide further evidence for the importance of pulmonary irritation and/or inflammation in the pathogenesis of RA (Table 1).

Table 1 Updated respiratory risk factors for RA with associated risk estimates

Obesity

Another significant trend has been the increasing awareness that obesity contributes to increased risk of RA. A prior study had showed obesity might account for over half the increase in RA in the last several decades [24]. More recent studies have now shown elevated body mass index (BMI) not only increases risk of RA but also decreases time to RA [25]. In fact, among individuals with elevated BMI and at least two anti-citrullinated peptide antibodies (ACPA), odds for RA increased 23-fold compared to controls [25]. Another study also demonstrated that even being overweight (BMI>25) increased risk of RA, especially in adults less than 50 years of age [26]. Others have suggested the association between increased BMI and RA is greatest for women [27,28,29] or smokers [29].

Factors related to obesity may also increase risk of RA. A more recent meta-analysis showed not only BMI but also increased weight circumference increased risk of RA [27]. Along similar lines, a large cohort study within the Nurse’s Health Study showed that physical activity dramatically reduced the risk of later developing RA, with BMI mediating only 14% of this effect [30•]. Although no studies have demonstrated that weight loss reduces RA risk, a recent cohort study of 114,000 individuals showed that adherence to metformin was associated with decreased risk of RA [31]. Together, these findings point towards the exciting discovery that excess weight potentiates RA development. Thus, weight loss may be another modifiable risk factor for RA besides smoking.

Diet

Prompted by the observation that some patients with RA seem to flare with certain foods, studies in the last few decades began to investigate whether dietary factors could influence risk of RA. Indeed, some early studies showed that fish [32], omega-3 fatty acids [33], and modest alcohol consumption [34] all decreased risk of RA. However, recent studies have dampened early enthusiasm that specific foods may potentiate or protect against RA, showing no association with fish [35], alcohol [36], coffee [37], meat [38], or dairy products [38]. A study within the Studies of Etiology of RA (SERA) cohort found one potential reason for discordant results. They showed omega-3 fatty acids were inversely associated with rheumatoid factor (RF) and ACPA positivity, but only in patients with the human leukocyte antigen (HLA) shared epitope alleles [39]. Thus, diet and genetics may interact to incite RA in some fashion.

While specific foods may not yield strong associations with RA, certain dietary patterns have recently been shown to have a modest association with RA. For example, a large cohort study within the Nurse’s Health Study showed long-term dietary quality, as assessed by a scoring system for each food, was associated with reduced risk of RA in women. This association was especially true for women less than 55 years old and for seropositive RA [40]. A subsequent study showed that an inflammatory dietary pattern was again associated with increased risk of seropositive RA in women less than 55 years of age. However, this association was partially mediated by BMI [41]. Studies in other populations have also confirmed Mediterranean diet reduced risk of RA in smokers [42] as well as RA disease activity [43], whereas a Western dietary pattern increased risk of RA [44]. More research is needed to determine which components of these dietary patterns mediate the association with RA, and to what degree diet may mediate the association between obesity and RA.

Microbiome

One factor that may mediate the association between dietary pattern and RA risk is oral and/or intestinal dysbiosis, which has been increasingly implicated in RA pathogenesis. An increased prevalence of periodontal disease and altered oral microbiota profile in early-onset RA has been known for several years [45]. A recent work, however, showed that this alteration occurs before RA onset, with ACPA-positive at-risk individuals having a higher prevalence of periodontitis and P. gingivalis compared to both healthy controls and patients with early RA. Alterations of the gut microbiome in patients with RA have also been demonstrated for several years [46], especially with expansion of Prevotella species [47, 48]. In a recent landmark study, Alpizar-Rodriguez and colleagues expanded these findings to a pre-clinical RA group, demonstrating alterations of the gut microbiome, particularly enrichment of Prevotella species, compared to controls [49]. Observing these microbial changes before RA onset implicates oral/intestinal dysbiosis in the etiology of RA.

An important follow-up question is when and why the microbiome changes. While diet could be one reason as discussed above, antibiotics could be another. Two recent case-control studies showed that previous antibiotic exposure increased risk for RA in a dose-response fashion, with odds ratios of two- to threefold for RA for individuals with ten or more antibiotic prescriptions before RA onset [50, 51]. Importantly, this association was not mediated by the infections themselves, as respiratory infections without antibiotics were shown not to have as strong an association [50]. Chronic diarrhea was recently shown to increase risk of RA in a large cohort study within the E3N-EPIC study, especially in smokers [52].

Together, these findings support the notion that alterations in the microbiome may play a role in RA pathogenesis. More broadly, altered immunity at a mucosal site (e.g., intestines and/or lungs), in the context of a permissive genetic background, may be important for development of RA.

Genetics

A growing theme that has begun to permeate all the above trends is the pivotal role of genetics. Historically, twin studies suggested that the liability to RA was approximately 15% genetic [53]. However, increasing discovery of single nucleotide polymorphisms (SNPs) with genome-wide association studies (GWAS) show that genetics likely explain more, perhaps 30–40% of RA risk [54]. New risk loci for RA continue to be discovered [55, 56], including polymorphisms for interleukin-10 [57], IL1B [58], and T cell immunoglobulin and mucin domain 3 (TIM-3) [59]. Currently, the number of SNPs associated with RA totals over 269 [60]. Another study identified 243 phosphorylation-related SNPs, or missense SNPs that affect protein phosphorylation status [61].

As accessibility of genetic data expands, its usefulness has grown from serving not only as a risk factor for disease but also as a clinical and research tool. Importantly, a recently published genetic probability (“G-PROB”) tool calculates the probability of various types of inflammatory arthritis-causing diseases, improving correct diagnosis at presentation from 39 to 51% [62•]. This tool may be particularly useful for diagnosing individuals with inflammatory arthritis of unclear etiology, such as patients with seronegative RA. Genetic data has become an accessible research tool as well, for example, through Mendelian randomization studies [63]. These are observational studies that leverage the fact that SNPs are randomly assigned and always precede disease onset, thus acting similarly to a randomized controlled trial. For example, a recent Mendelian randomization study of over 850,000 Europeans confirmed that prediction of BMI based on a 806-gene profile did increase the risk of RA [64]. Finally, genetic data availability also enables creation of genetic risk scores [60]. Genetic risk scores are useful tools for performing gene-environment interaction studies, as outlined in the next section.

Gene-Environment Interactions

A significant trend in the last several years has been to study how the various genetic and environmental risk factors interact with each other, or so-called gene-environment interactions. The first to do this in the field of RA was Padyukov et al. in 2004, who identified an interaction between smoking and HLA-DRB1 for seropositive (RF-positive) RA [65]. Klareskog et al. expanded this discovery to the interaction between smoking and HLA-DRB1 shared epitope for ACPA-positive RA, which raised the risk for RA by an impressive 21-fold compared to nonsmokers without the shared epitope [66]. Many subsequent studies have replicated this interaction between the shared epitope and smoking, with recent studies suggesting that aryl hydrocarbon receptor crosstalk [67] and/or DNA methylation of cg21325723 [68] may underlie the mechanism of this interaction. The gene-smoking interaction in RA also varies by serological subset. That is, a recent study showed the effect of smoking on risk of RA varies by rheumatoid factor (RF) and anti-citric citrullinated peptide (CCP) status, as well as genetic status at the shared epitope [69]. Thus, defining an individual’s serological subtype may be important from a clinical and research perspective.

Only recently, researchers have begun to expand investigation of gene-environment interactions beyond simply smoking and the shared epitope. For example, one study found an association between textile dust exposure and HLA-DRB1 for RA risk [70]. Combining Epidemiological Investigation of RA and North American RA Consortium cohorts, another study found significant interactions between HLA-DRB1 SE alleles and SNPs associated with ACPA-positive RA [71]. Increasing the complexity even further, another group demonstrated a three way-interaction between alcohol, smoking, and HLA-DRB1 for RA risk [72]. Examining the interaction between other environmental exposures and genetic susceptibility to RA represents an unmet need in the field.

Mechanisms of Disease

While firm scientific evidence shows that genes, environmental exposures, and their interactions can all increase the risk of RA, a central question has become, how? Numerous mechanisms have been recently discovered that might explain some of the observed risk factors. To explain the association between respiratory disease and RA, one mechanism could be direct effects of respiratory pathogens; for example, EBV DNA was identified in synovial tissue of patients with RA [73]. To explain the association between obesity and RA, another study showed aggregates of three or more adipose tissue macrophages, called “crown-like structures,” were much more abundant in RA patients compared to controls. This finding was especially true in early RA and in patients with ACPA positivity [74]. Studies in the past year have also implicated mitochondrial dysfunction [75] and dendritic cells, especially the cDC1 subset, in initiation of inflammatory arthritis [76].

Mechanisms of Disease: Top-Down

Taking a “top-down” approach to investigate the biological step before RA, called “pre-RA,” has also shed some insights into how RA develops (Fig. 1). Pre-RA is defined as the presence of RA-specific antibodies before clinical disease onset [77]. It confers an increased risk of RA, as high levels of antibodies can increase risk of RA over fourfold [78, 79]. RF was an early RA antibody to be discovered, followed eventually by ACPA, which encompasses all citrullinated peptides (both cyclic and non-cyclic). In fact, single-cell cloning of B cells from RA patients has identified 30 monoclonal ACPAs that are citrulline multi-specific [80]. However, the definition of pre-RA continues to expand beyond RF and ACPA positivity or negativity.

Fig. 1
figure 1

Current research efforts to understand the pathogenesis of RA use either a “top-down” approach, which focuses on pre-RA including anti-posttranslationally modified protein antibodies (AMPAs) and glycosylation, or alternatively a “bottom-up” approach, which focuses on functional genomics

Besides citrullination, other posttranslational modifications including carbamylation (also called homocitrullation) and acetylation may also produce antibodies involved in RA pathogenesis. The antibodies generated by these three processes are ACPAs, anti-carbamylated protein (anti-CarP) antibodies, and anti-lysine acetylated (KAc) antibodies, respectively. In turn, the collection of all these types of antibodies is now called anti-posttranslationally modified protein antibodies (AMPAs). To illustrate their importance, a study of individuals with arthralgia showed that not only RF and ACPA but also CarP increased risk of later diagnosis of inflammatory arthritis [81]. Indeed, a recent basic science study identified numerous carbamylated proteins in RA joints that were recognized by anti-CarP antibodies, supporting their role in RA pathogenesis [82]. In fact, a comprehensive set of antibodies against the entire human proteome, including the citrullinome and homocitrullinome, has been shown to identify close to 92% of all RA cases, compared to only 70% with commercial anti-CCP assays alone [83]. Importantly, monoclonal ACPA from patients was found to also be reactive with anti-CarP and anti-KAc antibodies, demonstrating “multi-reactivity” of these antibodies [84].Thus, the broader term “AMPAs” may be more heavily utilized in research and clinical settings moving forward.

Glycosylation of antibodies may be another form of “pre-RA.” For example, low galactosylation of IgG a median of 4 years before RA onset was associated with increased risk of RA [85]. In addition, glycosylation of IgG ACPA V-domain is also strongly associated with future development of RA, with a hazard ratio over 6 [86]. Like AMPAs, therefore, glycosylation may become a clinical biomarker for RA.

To understand how RA develops, investigators have sought to understand how pre-RA develops. From an epidemiologic standpoint, pre-RA has many of the same risk factors as RA, including female sex, increasing age, and smoking [87, 88]. Genetic predisposition to activation of adaptive immune responses to modified self-antigens may also play a role. For example, HLA-DRB1*14:02 broadens capacity for citrullination, self-peptide presentation, and T cell expansion, increasing risk for ACPA generation and ACPA-positive RA [89].

Finally, innate lymphoid cells are likely crucial for onset of pre-RA, and later RA. A study of individuals with pre-RA showed increased frequency of certain innate lymphoid cells compared to controls [90]. Another study of pre-RA patients showed that having ≥ 5 dominant B cell receptor clones were significantly associated with developing RA and appeared in the synovial tissue, suggesting their role in RA pathogenesis [91]. Furthermore, RA autoreactive B cells were highly multi-reactive, recognizing >3000 peptides modified by either citrullination or carbamylation [92]. This finding suggests that multiple antigen encounters across space and time are involved in the etiology of RA.

Mechanisms of Disease: Bottom-Up

Compared to the “top-down” approach of understanding RA etiology from a pre-RA perspective, another approach is the “bottom-up” approach via functional genomics. Functional genomics is defined as a biological field that attempts to describe gene functions and interactions. The word genomics refers to the genome and RNA sequencing techniques, whereas the word functional refers to the dynamic aspects of gene transcription, translation, and protein interactions rather than static features like DNA. In particular, single-cell RNA sequencing is a powerful technique that sequences genetic information from individual cells to understand their function in the context of their environment. For example, a recent study found that a transcriptome signature especially involving inflammatory pathways, Wnt signaling, and type I interferon increased ability to predict RA [93].

Functional genomics have also shown macrophages may play an important role in RA pathogenesis. Using single-cell RNA sequencing along with fluorescence microscopy, one study performed a spatiotemporal analysis of macrophages in RA tissues, finding that a certain population of tissue-resident macrophages (CX3CR1+) forms an internal barrier at the synovial lining that disappears in RA [94]. A subsequent study using the same technique in humans confirmed that certain macrophage types (MerTKpos) in synovial lining correlated with RA disease activity or remission. That is, they were lost during flare, and regained during remission [95].

Of all cell types however, fibroblasts in particular seem to play a pivotal role in RA etiology based on recent evidence. In a particularly ground-breaking work from the Accelerating Medicines Partnership, Zhang et al. used single-cell RNA sequencing, mass cytometry, and flow cytometry to define the cell populations that drive RA. Out of 18 unique cell populations, they found that sublining fibroblasts and IL1B+ proinflammatory monocytes were particularly expanded in RA synovia. In fact, IL6 largely came from the sublining fibroblasts and IL1B came from the proinflammatory monocytes, implicating these cell types in RA pathogenesis [96•]. A parallel study identified two key fibroblast types: synovial sublining fibroblasts and synovial lining fibroblasts. Proliferation of the former resulted in persistent inflammatory arthritis, whereas proliferation of the latter resembled osteoarthritis [97]. A subsequent study, again using single-cell RNA sequencing, found that the reason the sublining is expanded in RA is NOTCH3 signaling, which drives transcriptional gradients and synovial fibroblast differentiation to mediate inflammation and pathology in RA. They verified these results in a mouse model, where deletion of Notch3 or blocking it with monoclonal antibodies prevented joint damage in inflammatory arthritis [98].

These findings from functional genomics studies not only help explain RA etiology but also have identified discrete targets for its treatment. In particular, fibroblast subtypes may allow for clinical stratification and personalized treatment in RA. These findings may apply to other autoimmune diseases, as inflammatory fibroblasts have been implicated in several other autoimmune diseases including Sjögren’s, idiopathic pulmonary fibrosis, and ulcerative colitis.

Conclusions

In summary, research from the last few years has yielded a revolution in the understanding of RA etiology. In particular, acute and chronic respiratory exposures, obesity, diet and microbiome, genetics, and their interactions markedly increase risk of RA. A combination of these risk factors may be necessary for RA to develop, such as genetic predisposition, compounded by alterations in immunity by obesity, and then ultimately triggered by an insult at a mucosal site such as pulmonary infection or intestinal dysbiosis. Similar risk factors also increase risk of pre-RA, a pre-clinical state characterized by a growing repertoire of anti-posttranslationally modified protein antibodies as well as abnormal glycosylation. Finally, functional genomics approaches have revealed that disruption of synovial macrophages and proliferation of synovial sublining fibroblasts may play a mechanistic role in the pathogenesis of RA. Future epidemiologic, serologic, and transcriptomic studies will likely help to further refine our understanding of RA and its etiology.