, Volume 63, Issue 8, pp 459–466

Genetics of rheumatoid arthritis: what have we learned?

  • Marieke Bax
  • Jurgen van Heemst
  • Tom W. J. Huizinga
  • Rene E. M. Toes
Open Access

DOI: 10.1007/s00251-011-0528-6

Cite this article as:
Bax, M., van Heemst, J., Huizinga, T.W.J. et al. Immunogenetics (2011) 63: 459. doi:10.1007/s00251-011-0528-6


Rheumatoid arthritis (RA) is a chronic autoimmune disease affecting 0.5–1% of the population worldwide. The disease has a heterogeneous character, including clinical subsets of anti-citrullinated protein antibody (ACPA)-positive and APCA-negative disease. Although the pathogenesis of RA is poorly understood, progress has been made in identifying genetic factors that contribute to the disease. The most important genetic risk factor for RA is found in the human leukocyte antigen (HLA) locus. In particular, the HLA molecules carrying the amino acid sequence QKRAA, QRRAA, or RRRAA at positions 70–74 of the DRβ1 chain are associated with the disease. The HLA molecules carrying these “shared epitope” sequences only predispose for ACPA-positive disease. More than two decades after the discovery of HLA-DRB1 as a genetic risk factor, the second genetic risk factor for RA was identified in 2003. The introduction of new techniques, such as methods to perform genome-wide association has led to the identification of more than 20 additional genetic risk factors within the last 4 years, with most of these factors being located near genes implicated in immunological pathways. These findings underscore the role of the immune system in RA pathogenesis and may provide valuable insight into the specific pathways that cause RA.


Rheumatoid arthritis Shared epitope HLA DERAA ACPA 



Anti-citrullinated protein antibody


Non-inherited maternal antigen


Non-inherited paternal antigen


Peptidylarginine deiminase


Peptidylarginine deiminase type 4


Protein tyrosine phosphatase non-receptor type 22


Rheumatoid arthritis


Rheumatoid factor


Rhesus D


Single-nucleotide polymorphism


Signal transducer and activator of transcription 4


Shared epitope


Tumor necrosis factor


Tumor necrosis factor, alpha-induced protein 3


Rheumatoid arthritis

Rheumatoid arthritis (RA) is characterized by the chronic inflammation of synovial joints, and affects 0.5–1% of the adult population worldwide (Silman and Pearson 2002). This inflammation can affect any synovial-lined diarthrodial joint (Imboden 2009), although, small joints of the hands and feet are most commonly affected. Larger joints like the shoulder and the knee can also be involved, but this varies according to the individual. Radiographic joint damage accumulates at a constant rate over time when left untreated (Wolfe and Sharp 1998), and erosion of the joint surface eventually causes joint deformity and loss of function, which is associated with significant disability and premature mortality (Majithia and Geraci 2007; Silman and Pearson 2002). The definition of RA is phenotypic and was developed on the basis of a consensus procedure by clinical experts, resulting in the 1987 and 2010 Rheumatoid Arthritis Classification Criteria (Aletaha et al. 2010; Arnett et al. 1988).

Clinical subsets of ACPA-positive and ACPA-negative RA patients

The clinical course of RA is extremely variable, showing a wide spectrum of clinical manifestations ranging from mild and self-limiting disease to rapidly progressive inflammation, joint destruction, and severe physical disability (Lee and Weinblatt 2001). The heterogeneous character of the disease results in differential responses to a range of therapies, including anti-tumor necrosis factor (anti-TNF) therapy (Plenge and Criswell 2008). A sub-classification of RA could be helpful to predict clinical outcomes of therapies. One way to sub-classify the phenotype RA could be achieved on the basis of serologic factors, such as RA-associated autoantibodies rheumatoid factor (RF) and anti-citrullinated protein antibodies (ACPA). RF is not unique to RA, whereas ACPA are highly specific for RA. ACPA are antibodies directed against citrullinated proteins, while citrullination is a physiologic process occurring under different conditions including inflammation. Citrullination is established by the conversion of arginine into citrullin by peptidylarginine deiminase (PAD) (Makrygiannakis et al. 2006). ACPA are predictive for RA; 75% of the patients with ACPA who presented with undifferentiated arthritis will progress to RA within 3 years of follow-up (van Gaalen et al. 2004a). Furthermore, patients with ACPA-positive RA have a more aggressive clinical course as compared to patients without ACPA (van Gaalen et al. 2004b). ACPA can also induce and aggravate arthritis in mice, suggesting that ACPA is involved in disease pathogenesis in humans (Kuhn et al. 2006; Uysal et al. 2009).

Genetics in RA

The prevalence of RA in the general population is <1%; however, among siblings, the prevalence increases to 2–4% (Seldin et al. 1999). In addition, in cross-selectional twin studies, concordance rates for RA were found between 12.3% and 15.4% for monozygotic twins, compared to 3.5% for dizygotic twins (MacGregor et al. 2000). These sibling and twin pair studies demonstrate that genetic factors have a substantial impact on susceptibility to RA resulting in an estimated genetic contribution to RA ranging from 50% to 60% (MacGregor et al. 2000; Seldin et al. 1999).

HLA and the shared epitope hypothesis

The most important genetic risk factor for RA is the HLA locus, which accounts for 30% to 50% of overall genetic susceptibility to RA (Bowes and Barton 2008; Imboden 2009). In 1969, it was demonstrated that in mixed lymphocyte cultures, peripheral blood lymphocytes from patients with RA were non-reactive to peripheral blood lymphocytes from other RA patients (Astorga and Williams 1969). Seven years later, in 1976, it was discovered that this non-reactivity was due to the sharing of genes in the HLA region (Stastny 1976), indicating a contribution of the HLA region to RA. Stastny noted in 1978 that 78% of Caucasian RA patients were HLA-DRw4 positive compared to 28% of healthy controls (Stastny 1978). A decade later, it was discovered that multiple RA risk alleles within the HLA-DRB1 gene share a conserved amino acid sequence (Gregersen et al. 1986a, b). This led to a new theory: the “shared epitope” (SE) hypothesis (Gregersen et al. 1987). Positions 70–74 in the third hypervariable region of the DRβ1 chain of the RA-associated HLA-DRB1 molecules all contain the conserved amino acids QKRAA, QRRAA, or RRRAA. This sequence of amino acids is called the SE, and the risk alleles carrying this sequence are widely known as SE alleles. These specific amino acid sequences were not only shared between HLA-DRB1*04 alleles but also between the RA-associated HLA-DRB1*01 alleles (Gregersen et al. 1987). The OR for one SE allele is 4.37, whereas two copies of HLA-DRB1 SE alleles have an OR of 11.79 (Huizinga et al. 2005). Surprisingly, the association between SE-encoding HLA-DRB1 alleles and RA was only observed for ACPA-positive disease (Huizinga et al. 2005). Apparently, ACPA-positive and ACPA-negative RA have different genetic risk profiles (De Rycke et al. 2004; Imboden 2009; van der Helm-van Mil and Huizinga 2008), indicating that not only from clinical and pathogenic perspective but also from genetic perspective, ACPA-positive versus ACPA-negative RA represents two forms of RA.

Putative mechanisms behind the HLA SE in ACPA-positive RA

Although it has been known for more than 30 years that the HLA locus carries the most prominent genetic association for RA, the underlying mechanism by which particular HLA-DRB1 alleles predispose to the development of ACPA-positive RA is still not understood, despite progress in understanding the structure and function of HLA-DRB1 molecules. The HLA-DR molecule is a heterodimer consisting of an alpha (DRA) and a beta chain (DRB), both anchored in the membrane of antigen-presenting cells. The function of HLA-DR molecules is to present antigenic peptides to T lymphocytes. For efficient antigen presentation to T cells, the T cell receptor recognizes residues from both the peptide as well as the HLA-DR molecule itself. The part of HLA-DR that binds to the peptide, the peptide-binding groove, is composed of two α-helical walls and a floor of β-pleated sheet (Brown et al. 1993). The SE is situated in the α-helix wall of the peptide-binding groove (Gregersen et al. 1987). In this position, the SE influences peptide binding to the HLA molecule as well as T cell presentation (Fig. 1). Therefore, it has been postulated that the SE motif itself is directly involved in the pathogenesis of RA by allowing the presentation of an arthritogenic peptide to T cells (Gregersen et al. 1987; van der Helm-van Mil et al. 2007). Unfortunately, this hypothesis has not yet been confirmed by the identification of specific arthritogenic peptides that bind to SE HLA-DR molecules in RA, although it has been reported that citrullinated peptides bind to HLA-SE molecules for presentation to T cells (Feitsma et al. 2010; Hill et al. 2003). T cells that recognize citrullinated antigen presented by HLA could subsequently help B cells to secrete ACPA. However, several alternative hypotheses regarding SE have been proposed that could also explain the contribution of the HLA locus to RA. These hypotheses include the idea that the HLA-SE molecules shape the T cell repertoire to permit escape from tolerance or survival of autoreactive clones, or serve as a target for autoreactive T cells owing to molecular mimicry with a pathogen, and even fail to bind an arthritogenic peptide, leading to an inadequate tolerant immune response (Firestein 2003). In ACPA-positive RA patients, SE alleles have been shown to influence ACPA specificity by predisposing to the development of antibodies against citrullinated vimentin but not to the development of antibodies against citrullinated fibrinogen. This indicates that SE alleles act as classic immune response genes in the ACPA response because they influence both the magnitude and the specificity of this RA-specific antibody response (Verpoort et al. 2007). Because of the heterogeneous nature of RA, it is suggested that in addition to genetic factors, environmental factors might be of importance as well for the development of RA (Firestein 2003). Smoking has been established as a relevant environmental risk factor for ACPA-positive disease, especially in subjects carrying one or two copies of the HLA-DRB1 SE (Kallberg et al. 2011; Linn-Rasker et al. 2006). The association of ACPA-positive RA and smoking could be explained by the observation of citrullinated proteins in the lungs of smokers, which are possibly well presented by HLA with SE (Makrygiannakis et al. 2008).
Fig. 1

The HLA-DR4 molecule with the position of the SE sequence. The crystal structure of HLA-DRB1*04:01/DRA1*01:01 complexed with a human collagen II-derived peptide (dark grey circles) shows the peptide-binding groove created by the helical structures of the HLA-DR alpha and beta chain with (indicated in red) the SE at peptide-binding pocket 4 (Polyview-3D)

Genetic risk factors in addition to HLA

Almost 30 years after the identification of HLA alleles as a risk factor for RA, an RA-associated gene outside the HLA region was recognized. It was not until 2003, however, that the gene peptidylarginine deiminase type 4 (PADI4) was identified in a Japanese population as a second risk factor for RA (Suzuki et al. 2003). Interestingly, PADI genes encode the enzymes to change arginine into citrulline, the target of ACPA. The discovery of PADI4 as a risk factor was followed by the identification of protein tyrosine phosphatase non-receptor type 22 (PTPN22) in 2004 (Begovich et al. 2004; Carlton et al. 2005; Gregersen 2005). After 2004, the identification of new genetic risk factors underwent an unexpected acceleration. One year later, in 2005, CTLA4 was found during a candidate gene study (Plenge et al. 2005). In 2007, by means of a candidate gene approach (Kurreeman et al. 2007), a novel genetic risk factor was identified in the 9q33 region of the genome containing TRAF1/C5; it was also detected concurrently in a genome-wide study (Plenge et al. 2007b). Two more additional risk loci were discovered in 2007, the signal transducer and activator of transcription (STAT4) gene region on chromosome 2q, following a mapping of genes under a linkage peak (Remmers et al. 2007), as well as single-nucleotide polymorphisms (SNPs) near the gene TNF-α induced protein 3 (TNFAIP3), following genome-wide association studies (Plenge et al. 2007a; Thomson et al. 2007). The application of this latter technique allowed the identification of new risk alleles for RA to take enormous steps forward (Plenge 2009). In 2008, another seven RA risk loci genes, of which six were identified in a meta-analysis of three genome-wide association study (GWAS) (Barton et al. 2008; Raychaudhuri et al. 2008) were detected, with the most significant finding localized in the CD40 gene region. At the beginning of 2011, more than 30 established RA risk loci were identified (Chen et al. 2011; Gregersen et al. 2009; Raychaudhuri et al. 2009; Stahl et al. 2010; Zhernakova et al. 2011) (Fig. 2).
Fig. 2

Susceptibility genes for RA (Barton et al. 2008; Begovich et al. 2004; Chen et al. 2011; Gregersen et al. 1987, 2009; Kurreeman et al. 2007; Plenge et al. 2005, 2007a, b; Raychaudhuri et al. 2008, 2009; Remmers et al. 2007; Stahl et al. 2010; Stastny 1978; Suzuki et al. 2003; Thomson et al. 2007; Zhernakova et al. 2011)

GWAS studies rapidly expanded the number of SNPs associated with ACPA-positive RA; however, this gave rise to the question as to what could we learn from these associations and whether we could use these SNPs to better understand RA pathogenesis. Most of the identified SNPs are located near genes that are linked to immunological pathways, implicating—not surprisingly—immune-related events as an important component of RA. The associated genes are often linked to T or B cell activation or to differentiation and cytokine signaling (Zhernakova et al. 2009). Some of the associated genes—for instance, PADI4—are only found to be associated with RA. This directly implicates differences in citrullination as a mechanism important for RA development. Other genetic associations are shared between different autoimmune diseases (Zhernakova et al. 2009). PTPN22, for example, is linked to RA, systemic lupus erythematosus, type 1 diabetes, and other autoimmune diseases (Bottini et al. 2004; Jawaheer et al. 2003; Kyogoku et al. 2004). PTPN22 encodes a tyrosine phosphatase and is hypothesized to affect the threshold for B and T cell receptor signaling (Gregersen 2005). For many of the associated genes, the immunological mechanism explaining the genetic association is difficult to demonstrate. For most of the associated loci, the functional SNP is still unknown. Identifying the functional SNP is important for determining the effect of the functional variation on gene expression and function, while another complicating factor is the fact that most of the proteins expressed by the associated genes play multiple important roles in the immune system and affect different immunological pathways. For example, genetic variants near CD40 have been shown to be associated with RA (Raychaudhuri et al. 2008). CD40 is a member of the TNF-receptor family and is expressed on many different immune cells including B cells, macrophages, and dendritic cells. It is important not only for the priming of T cells and activation of B cells but also for the activation of macophages and dendritic cells. Furthermore, the expression of CD40 on RA synovial cells was hypothesized to contribute to joint destruction (Peters et al. 2009). Dissecting these different pathways in a human system to study which pathways are important for RA pathogenesis is a challenge, as it will be difficult to show which CD40-expressing cell associated with certain CD40 genotypes is primarily responsible for the enhanced risk to develop RA. Like CD40, most of the identified genes play complex roles in the immune system, indicating that even if it is elucidated as to how a certain genetic variant influences gene expression or function, it is still not known how it contributes to disease. Thus, although the number of SNPs associated with RA is rapidly expanding, the challenge for the future will be to unravel the mechanisms by which the proteins expressed by these genes contribute to RA pathogenesis.

Genetic risk factors for ACPA-negative RA

Many HLA-DRB1 alleles are described as risk factors for ACPA-positive RA. For ACPA-negative disease, the situation is clearly different, as only HLA-DR3 predisposes to ACPA-negative RA (Irigoyen et al. 2005; Verpoort et al. 2005). Another risk factor for ACPA-negative RA—interferon regulatory factor 5 (Sigurdsson et al. 2007)—has been described, although this association could not be replicated in other study populations (Garnier et al. 2007; Rueda et al. 2006). Recently, it has been suggested that neuropeptide S receptor gene polymorphisms might be implicated in ACPA-negative RA susceptibility and its clinical manifestations (D’Amato et al. 2010). Although many more genetic risk factors have thus far been described for ACPA-positive RA compared to ACPA-negative RA, this does not imply that genetic factors contribute more to ACPA-positive RA. A recent study showed that the heritability of RA among twin pairs for ACPA-positive RA is 68% and for ACPA-negative RA it is 66% (van der Woude et al. 2009). This suggests that the heritability of ACPA-positive and ACPA-negative disease is comparable, although for ACPA-negative disease, the genetic risk factors remain to be identified since cohorts used for genetic risk factors mainly consist of ACPA-positive RA patients.

Protection against the development of RA

Protection against RA is predominantly associated with HLA-DRB1*13:01

In addition to HLA-DRB1 alleles that contribute to RA susceptibility, other HLA-DRB1 alleles confer protection against the disease. These protective HLA-DRB1 alleles are more often present in healthy controls compared to RA patients and have been categorized according to several models. One of the classifications put forward in the literature postulates that protection against RA is conferred by the DERAA sequence at position 70–74 of the HLA-DRB1 allele, which is at the same position on the HLA-DRB1 alleles as the SE sequence. Likewise, other models suggest that protection is conferred by isoleucine (I) at position 67 (I67) (de Vries et al. 2002) or propose an association mainly with aspartic acid (D) at position 70 (D70) (del Ricon and Escalante 1999; Mattey et al. 2001; Reviron et al. 2001). Data from a recent study did not support the hypothesis that I67 confers protection against RA; in this study, only the presence of D70 appeared to confer some degree of protection (Morgan et al. 2008). Due to conflicting results, a meta-analysis was performed involving four European populations with >2,800 patients and >3,000 control subjects to investigate thoroughly which HLA-DRB1 alleles were associated with protection against ACPA-positive RA and ACPA-negative RA. To correct for skewing due to this association, the analysis for ACPA-positive RA was stratified for the SE alleles. Interestingly, this study showed that the only association that conveyed protection against ACPA-positive RA after stratification of SE involved HLA-DRB1*13 alleles. The protective effect of other classifications, including DERAA and D70, was no longer present after the exclusion of HLA-DRB1*13. This indicates that HLA-DRB1*13 rather than DERAA, D70, or I67 is associated with protection. Among the DRB1*13 alleles, only DRB1*13:01 appeared to be associated with protection. For ACPA-negative RA, no associations with protective HLA-DRB1 alleles exist (van der Woude et al. 2010).

Non-inherited maternal antigen

In 1954, a biological effect of non-inherited maternal antigen (NIMA) was reported for the first time. It was demonstrated that rhesus D (RhD)-negative children were tolerant to the RhD antigen if their mother was RhD positive (Owen et al. 1954). In addition to the RhD antigen, HLA is also mentioned as NIMA in a study showing that haploidentical NIMA-mismatched sibling transplants have a graft survival rate similar to that of HLA identical siblings, whereas non-inherited paternal antigen (NIPA)-mismatched sibling transplants did as poorly as the recipients of maternal and paternal grafts (Burlingham et al. 1998). Recently, it was shown that HLA-DRB1 molecules that contain the amino acid sequence DERAA when present as NIMA also have a protective effect on the development of RA (Feitsma et al. 2007). By using a cohort of Dutch RA patients together with their parents, it was shown that the mothers of patients with RA had a significantly lower frequency (16.1%) of DERAA-containing HLA-DRB1 alleles than did the Dutch control population (29.3%). In contrast, the frequencies of DERAA-containing HLA-DRB1 alleles in fathers of the patients with RA (26.2%) were comparable to the frequencies of DERAA-containing HLA-DRB1 alleles in fathers of healthy controls. These findings were replicated in an English cohort (Feitsma et al. 2007). Due to the dominance of HLA-DRB1*13:01 in DERAA-positive HLA alleles, it is not known whether the NIMA effect derives from DERAA-containing HLA-DRB1 alleles or only from HLA-DRB1*13:01. If replicated further, the finding that certain HLA-alleles—when seen as NIMA—can protect offspring is highly interesting, as it suggests directions in which to manipulate the immune system in such a way that it will not be able to precipitate RA.

Conclusions: genetics and RA

The first genetic risk factor for RA was identified in 1978. After the discovery of this association between HLA and RA, it took 25 years until PADI4, the second genetic risk factor for RA, was identified. On the basis of new techniques, the discovery of new genes associated with a high risk of developing RA improved. Since 2004, more than 20 genes have been found to be implicated, mainly with ACPA-positive disease. The identification of disease-associated genes could provide valuable insight into the genetic variations prior to disease onset in order to identify the pathways important for RA pathogenesis. Future challenges will be, among others, the translation of genetic associations into biological pathways that are responsible for RA, as this knowledge may prove to be exceedingly useful for the invention of curative therapies for RA.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Copyright information

© The Author(s) 2011

Authors and Affiliations

  • Marieke Bax
    • 1
  • Jurgen van Heemst
    • 1
  • Tom W. J. Huizinga
    • 1
  • Rene E. M. Toes
    • 1
  1. 1.Department of RheumatologyLeiden University Medical CenterLeidenThe Netherlands

Personalised recommendations