Introduction

The pathogenesis of rheumatoid arthritis (RA) is multifactorial, involving both genetic and environmental factors. Although associations between some HLA-DRB1 alleles and RA were reported nearly three decades ago, the biological mechanism underlying this association remains unknown. The presence of the RAA sequence at positions 72–74 of the HLA-DR β-chain molecule for all HLA-DRB1 alleles known to be associated with RA led to the shared epitope (SE) hypothesis [1]. This hypothesis received support from numerous case-control association studies in both Caucasian and non-Caucasian populations. However, studies testing the SE hypothesis have rejected this simple model, which stipulates that each SE allele confers the same risk [25].

Recently, Tezenas du Montcel and coworkers proposed a model of the SE component in RA [6]. Those investigators reconsidered the SE hypothesis and generated a new classification of HLA-DRB1 alleles, based on their investigation using the MASC (Marker Association Segregation Chi Square) method [7], which was conducted in 100 trio families (one case and both parents) and 132 index cases from affected sibling pair families, all from the French Caucasian population. They proposed that the risk for developing RA depends on whether the RAA sequence occupies positions 72–74 and, if this is the case, on the amino acids at positions 71 and 70. For those RAA alleles, lysine (K) at position 71 conferred the highest risk, arginine (R) an intermediate risk, and alanine (A) or glutamic acid (E) the lowest risk. Glutamine (Q) or arginine (R) at position 70 conferred greater risk than did aspartic acid (D). This resulted initially in five allele groups, which were simplified to three allele groups defining six genotypes with different RA risks. This study was the first to model the HLA component in RA taking into account both association and linkage data, resulting in a reshaped SE hypothesis.

Here, we tested this classification for validity by replication in a new, independent sample of 100 French Caucasian trio families, evaluating the risk hierarchy of the proposed classification for homogeneity with that of the initial sample.

Materials and methods

Study design and study population

An association study using conditional logistic regression was performed to investigate the hierarchy of risks associated with HLA-DRB1 genotypes in an independent sample of trio families. The new independent sample (sample B), similar to that used to generate the new classification (sample A), included 100 trio families (one RA patient and both parents) of French Caucasian origin (criteria fulfilled for each of the four grandparents). DNA from all of the trio families included in samples A and B was collected between 1994 and 1998, as were initial clinical characteristics of the RA index patients. RA diagnosis met the 1987 American College of Rheumatology (formerly, the American Rheumatism Association) criteria [8]. All individuals provided written informed consent, and the study was approved by the Hospital Bicêtre ethics committee (Kremlin-Bicêtre, Assistance Publique-Hôpitaux de Paris).

Clinical characteristics were updated in 2001 and 2002 for sample A and in 2004 for sample B. Four RA index patients in sample A and two RA index patients in sample B died between the time of DNA collection and the present study. The updated clinical characteristics of sample B were similar to those of sample A (the initial sample): 90% of RA patients in sample B were female versus 87% in sample A; the mean (± standard error) age at RA onset was 31 ± 9 years versus 32 ± 10 years; the mean (± standard error) disease duration was 16 ± 8 years versus 18 ± 7 years; erosions were present in 79% versus 90%; 76% were positive for serum rheumatoid factor versus 81%; and nodules were present in 19% versus 31%. Rheumatoid factor was considered positive when there was at least one positive rheumatoid factor finding during the course of the disease, as determined using latex fixation, Waaler Rose assay, or laser nephelometry.

HLA-DRB1 genotyping

Blood samples were collected for DNA extraction and genotyping. HLA-DRB1 typing was performed using the polymerase chain reaction-sequence specific primer (SSP) method using Dynal Classic SSP DR low resolution and the Dynal Classic high resolution SSP (Dynal Biotech, Lake Success, NY, USA) for subtyping of HLA-DRB1*01, *04, *11, *13 and *15 alleles. Sequencing of exon 2 of HLA-DRB1 was performed for all four HLA-DRB1*04 alleles, ambiguous with the Dynal Classic method. HLA-DRB1 allele frequencies of control genotypes (obtained by combining untransmitted parental alleles for each family) were similar between samples and were comparable to the allele frequencies reported for the French population in the 11th Histocompatibility Workshop [9].

HLA-DRB1 allele classification

HLA-DRB1 alleles were divided in two groups according to the presence or absence of the RAA sequence at positions 72–74, defining S and X alleles (Table 1). The S alleles were then subdivided into three categories, according to amino acid at position 71, as follows: S1 when an alanine or a glutamic acid was present at position 71 (A-RAA or E-RAA sequences; A-RAA alleles were too infrequent not to be pooled, as described previously [6]); S2 when a lysine was present (K-RAA sequence); and S3 when an arginine was present at position 71 (R-RAA sequence). Then S3 alleles were subdivided according to amino acid at position 70: S3D alleles encoding the D-R-RAA sequence and S3P alleles encoding the Q or R-R-RAA sequence. Because the S2 alleles had either Q or D at position 70, they had – by this '70-71-72/74' nomenclature – the Q or D-K-RAA sequence.

Table 1 Classification of HLA-DRB1 alleles

Statistical analysis

We first investigated transmission of the five alleles (S1, S2, S3D, S3P and X) using a χ2 test with one degree of freedom for each allele. Alleles with significant over-transmission from heterozygous parents to RA patients (>50%) are linked to and associated with RA. Alleles with significant under-transmission (<50%) exhibit no RA association and could be pooled for further analysis.

Then, for each genotype 'I', the odds ratio 'ORi' relative to a reference genotype and 95% confidence interval (CI) were calculated by conditional logistic regression. In this analysis, the genotypes observed for the RA patients were conditioned to the parents' genotypes [10, 11]. The RA patient genotypes were compared using a likelihood ratio test with the pseudo-controls (i.e. the three other genotypes that could be formed by parental gametes). Given reference genotype with baseline risk termed β0, each OR βi (i = 1 ... n) was estimated by the maximization of the log likelihood (L):

ln(L) = β0 + β1X1 + β2X2 + ... + β n X n

Where Xi is an indicator taking value 1 for genotype 'i' and 0 for the other genotypes, and βi = log ORi, with β0 being the baseline risk for reference genotype. Likelihood computations and estimation were performed using the program developed by Clayton [12]. All the results were produced using STATA software (David Clayton, Cambridge, UK).

In case of replication of the genotype risk hierarchy, a homogeneity test on genotypic ORs was performed between the two trio family samples. In this test, we considered that, if homogeneity was present, then Q = -2(ln(maxLAB) - (ln(maxLA) + (ln(maxLB)))) would follow a χ2 distribution with n degrees of freedom (n being the number of βis estimated). LA, LB and LAB were the maximum likelihood over βi in sample A, sample B and pooled samples A and B, respectively.

If homogeneity between the two samples was confirmed, then the classification was considered validated, and OR (95% CI) were estimated by conditional logistic regression on the entire sample (samples A and B combined).

Results

Test of the shared epitope allele classification in the new independent sample

We first observed significant over-transmission of S2 alleles (53 S2alleles transmitted versus 33.5 alleles expected; P = 1.9 × 10-6) and of S3P alleles (47 S3P alleles transmitted versus 33.5; P = 0.001), as was previously reported [6]. S1, S3D and X alleles were under-transmitted: 28 S1 alleles were transmitted versus 40 expected (P = 0.007), 11 S3D alleles was transmitted versus 18 expected (P = 0.02), and 30 X alleles were transmitted versus 44 expected (P = 0.003). These three low-risk alleles (S1, S3D and X) were pooled as L alleles, as reported previously. Thus, in subsequent analyses we considered only three alleles (S2, S3P and L alleles), with six corresponding genotypes.

The conditional logistic regression analysis provided the following hierarchy of genotype risks: S2/S3P and S2/S2 genotypes were associated with greatest risk for RA, with ORs of 19.5 and 18.0, respectively; these were followed by S3P/S3P, S2/L and S3P/L genotypes, with ORs of 8.7, 5.3 and 3.1, respectively (with the reference genotype being L/L; Table 2). This hierarchy was precisely the same as observed previously [6].

Table 2 Results of the odds ratio calculation on the replication sample (sample B)

Results of the homogeneity test

The homogeneity test on genotypic ORs between the new sample and the initial one resulted in a χ2 with five degrees of freedom of 1.3 (P = 0.80). Because this test was not statistically significant, we considered the two samples to be homogeneous and the new classification to be valid.

Odds ratio estimation on the pooled sample of 200 trio families

Because the two samples were homogeneous, ORs were estimated, by conditional logistic regression, for the pooled sample of 200 trio families (Table 3).

Table 3 Results of the odds ratio calculation on the global sample (samples A and B combined)

Discussion

In the present study we validated the classification of HLA-DRB1 SE alleles in RA proposed by Tezenas du Montcel and coworkers [6]. This is the first study to validate a model of the HLA-DRB1 component of RA based on the SE hypothesis [1], with detailed investigation of the SE through the contribution of SE single amino acids to RA susceptibility, taking into account both linkage and association data. This work results in a risk genotype hierarchy, for which we provide OR estimates. The ORs were obtained exclusively from trio families, providing unbiased estimates for the sample investigated; this contrasts with estimations derived from case-control studies, for which the population matching between cases and controls can be questioned.

Further studies in other Caucasian and non-Caucasian populations are required to validate this new classification fully and investigate population-specific effects. The ORs reported here relate to relatively early onset RA, as is found in trio families. Because the mean duration of RA in both samples was long (18 years in sample A and 16 years in sample B), selection (survivor) bias would be possible even if we had considered those RA index patients who died between the time of DNA collection and the present study. Investigation of a population with common, sporadic RA is needed to assess the potential clinical relevance of this new classification. Studies with larger sample size would be able to refine the 95% CI of the OR. In the present study non-overlapping 95% CIs were observed only between the S2/S3P highest risk genotype (OR 22.2, 95% CI 9.9–49.7) and the S3P/L lowest risk genotype (OR 4.4, 95% CI 2.3–8.4). A significant difference between other associated genotypes remains to be established. This would provide major clues that may help in deciphering the genetic component of RA, if significant differences could be correlated with distinct pathophysiological mechanisms. It was recently reported that the SE-RA association was confined to rheumatoid factor positive patients [13] or to anti-citrullin positive RA patients [14]. The precise relationship between the HLA risk genotypes and rheumatoid factor or anti-citrullinated peptide antibodies should therefore also be determined. The interaction between HLA-DRB1 genotypes and any new RA gene established by association and linkage, such as PTPN22 [15, 16], could be investigated taking this new classification into account. Ultimately, this could help in identifying other RA genetic factors that may specifically interact with only one of the HLA-DRB1 genotypes. Several previous studies indicated that other genes within HLA, such as the HLA class III region, probably contribute to RA risk [17, 18]. The search for interactions between additional HLA class III genetic variants, not considered in the present study, and HLA-DRB1 genotypes taking this new classification into account would be of great interest.

Large sample size studies could refine the classification for infrequent alleles. In the present study we were unable to examine rigorously the amino acid at position 71 or at position 70, particularly for the S1 allele group, in which small sample size prevented study of the role played by the different alleles encoding the D-E-RAA motif. This D-E-RAA motif has been reported to be protective in the literature and constitutes an alternative SE hypothesis, although we obtained no support for it during our initial study [19]. The different S2 sequences Q-K-RAA (*0401) and D-K-RAA (*1303) should be evaluated separately, because the presence of an aspartic acid at position 70 has been reported to influence susceptibility to RA [20]. Similarly, the S3P sequences Q-R-RAA (*0101, *0102, *0404, *0405, *0408) and R-R-RAA (*1001) should be differentiated. The investigation of other amino acid positions from the third hypervariable region of the HLA-DR β-chain would be interesting, especially for positions 67 (for which the presence of an isoleucine might be important [21]) and 86 (as proposed by Gao and coworkers [22]).

Because numerous association studies have suggested that the primary role played by the SE might lie in the development of severe RA [23], the relevance of this classification should be evaluated for RA prognosis in prospective cohorts. A first investigation with the new classification already provides some support for a correlation with progression of radiographic damage [24]. Indeed, it would be of great help to be able to identify those RA patients at risk for development of more severe disease, who may require more aggressive therapeutic management than patients with better prognosis.

Conclusion

In the present study we validated a first model of the effect of HLA-DRB1 on RA, reshaping the SE hypothesis and providing initial estimates for the resulting risk genotypes. Building on this new HLA genotype classification could lead to improvement in our understanding of the genetics, pathophysiology and potential clinical use in management of RA.