Introduction

Genetic association studies are a well-established method for investigating genetic contributions to disease. In rheumatoid arthritis (RA) [1] and small-vessel vasculitis [2], genetically distinct subsets have been identified that have different associations with the major histocompatibility complex (MHC) region that encodes the HLA-DRB1 alleles. Comparison of HLA-DRB1 associations with RA in different ethnic groups helped to support the original “shared epitope” hypothesis of RA susceptibility [3] based on an amino acid risk motif at positions 67–74 in the third hypervariable region (HVR3) of the class II MHC molecule, encoded by HLA-DRB1. The group of RA “shared epitope” alleles now includes HLA-DRB1*01:01, HLA-DRB1*04:01, HLA-DRB1*04:04, HLA-DRB1*04:08, and HLA-DRB1*10:01; other alleles provide weaker protective effects, additional to the risk effects of the “shared epitope” [4]. Recently, it has been demonstrated that amino acid residues 11 and 13 in the first hypervariable region (HVR1) of class II MHC display the strongest associations with RA susceptibility [5].

Giant cell arteritis (GCA) incidence is highest in populations with Scandinavian ancestry [68] and this has led to suggestions that this might be due to genetic factors [9, 10]. Susceptibility to GCA has been reported to be associated with carriage of HLA-DRB1*04, but not all studies have shown an association and there are conflicting data as to whether there is an association with specific HLA-DRB1*04 alleles [11]. Under-representation of HLA-DRB1*01 in GCA patients from Rochester, Minnesota, led to a suggestion that the risk of GCA may be due to a DRYF motif at positions 28–31 in the second hypervariable region (HVR2) of MHC class II [12], but a Spanish study failed to replicate this finding [13]. To date, however, formal meta-analysis has not yet been performed to determine the major susceptibility and protective HLA-DRB1 alleles. The relative contribution of genetic and environmental factors as an explanation for geographical differences in GCA incidence also remains disputed [6, 7]. When genetic diversity within Europe is subjected to principal component analysis, the MHC is one of several genetic regions that are strongly associated with a component that runs along a north-south gradient from Norway/Sweden to Spain [14]. We therefore hypothesised that variations in the frequency of HLA-DRB1 GCA susceptibility alleles may partly explain geographical variations in GCA incidence.

Here, we report new GCA susceptibility data, combine these with the published data using meta-analysis, and propose a new hypothesis regarding a possible amino acid GCA 11-13-33 risk motif in HVR1 and HVR2 of class II MHC. This hypothesis fits the observed data better than previously proposed models.

Methods

Patients

The UK GCA Consortium was designed to support genetic association studies. Investigators, all experienced rheumatologists, recruited cases with a firm clinical diagnosis of GCA, based on all available information. Recruitment was retrospective. A positive temporal artery biopsy was not required as it was not always undertaken in classic presentations or could not be performed within an optimal time window or both. In some centres, the erythrocyte sedimentation rate was unavailable and so fulfilment of the 1990 American College of Rheumatology (ACR) criteria [15], which should not be used for clinical diagnosis of GCA [16], was not a requirement for inclusion. Clinical data on a subset of this cohort have already been published [17]. In this analysis, we included all patients who agreed to give a blood sample for genetic studies up to 2012 and where a sample was available. Written informed consent was provided by all patients, and the study was approved by the York Research Ethics Committee (reference 05/Q1108/28).

DNA extraction and genotyping

DNA was extracted from peripheral blood. HLA-DRB1 genotyping was performed by either single-stranded oligonucleotide polymerisation [18] or allele-specific polymerase chain reactions (standard primer sequences (HLA DRBplus Typing Kit, Amersham Biosciences, now part of GE Healthcare, Little Chalfont, UK), except for the forward primer of HLA-DRB1*10 which was redesigned as 5'-GCG GTT GCT GGA AAG ACG CG-3'). Direct sequencing was also performed to enable four-digit genotyping of HLA-DRB1*04 subtypes [18] because of previous reports of a HLA-DRB1*04 association of GCA at the two-digit level. The HaplotypeViewer program was developed to facilitate rapid four-digit genotyping from sequence electropherograms and is freely available [19].

Analysis of genotyping data

Control data from the UK Rheumatoid Arthritis Genetics (UKRAG) Consortium were used for this analysis. Initial logistic regression analyses were undertaken by assuming additive genetic models to estimate the effect of each potential susceptibility/protective allele. Adjustments for genetic effects already proposed in the literature (HLA-DRB1*04) were also performed.

Meta-analysis of giant cell arteritis susceptibility data

To identify case-control studies of HLA-DRB1 association with GCA susceptibility, a literature search was conducted in PubMed, without language restriction, by using the terms “HLA” and “(giant cell arteritis) OR (temporal arteritis)”. Reference lists of studies identified were also scanned. Publications were included if they provided sufficient detail on cases and controls to perform a meta-analysis. Where there were multiple publications with overlapping datasets, the report with the most complete dataset was chosen. Meta-analysis of the published summary carrier frequency data was performed (i.e., assuming a dominant mode of inheritance) because allele frequency and individual-level patient data were mostly unavailable from the authors of the studies. A random-effects model was used; the overall estimate was calculated by using as weights 1/(v i + τ), where v i is the variance of the estimated effect from the ith study and τ is the estimated between-study variance [20].

Worldwide giant cell arteritis incidence in relation to HLA-DRB1*04 population carrier frequencies

To identify reports of the incidence of GCA in different countries, a second literature search in PubMed was conducted with combinations of the medical subject heading terms “giant cell arteritis”, “temporal arteritis”, and “epidemiology”. Hand-searching was also performed in the reference lists of retrieved articles, review articles, and textbooks. Studies were included if they were available in full-text and included an estimate of the annual incidence, time period of the study, method of case definition, population studied, and geographical location of the study. Where necessary, the incidence figure was recalculated as number of new cases per 100,000 of the over-50 population per year. Studies that appeared to report duplicate or overlapping populations were excluded. Studies completing recruitment before 1980 were excluded in case of time trends in the incidence of GCA and because the quality of the reporting was generally lower for the older studies. Where more than one report existed for a single country (unless in ethnically distinct populations), the one with a later average period of recruitment was preferred. Where a single report included two separate sub-studies (regions or time periods), a weighted mean of the two sub-studies was used to arrive at an overall incidence figure.

We then sought data on ethnically matched HLA-DRB1 population allele frequencies at the two- and four-digit levels for each geographical region identified in the second literature search. We considered HLA-DRB1*04 alleles and those identified as being potential susceptibility/protective alleles in our own UK dataset. Methods have been reported elsewhere [21]; briefly, we first consulted the Allele Frequency Net Database [22] and then, if necessary, Ovid Medline and Embase. Carrier frequencies for control populations were converted to estimated allele frequencies by using the Hardy-Weinberg equation. Finally, the following predetermined rule was used to generate an estimate of population HLA-DRB1 allele frequencies: reports with over 500 (four-digit typing) or 1000 (two-digit typing) controls were identified and a weighted mean calculated. In the absence of large studies, studies with more than 100 (four-digit typing) or more than 200 (two-digit typing) were identified and a weighted mean calculated. Determination of geographical latitude and linear regression analysis were performed as previously described [21].

Development of amino acid risk motif model

After determination of susceptibility and protective HLA-DRB1 alleles, amino acid residues in HVR1, HVR2, and HVR3 were obtained from the IMGT (International ImMunoGeneTics Information System) database [23], accessed 31 January 2012). For samples with only two-digit typing, we estimated amino acid residues on the basis of geographically relevant population frequencies of the four-digit subtypes from the Allele Frequency Net Database [22], assigning a probability to residues when they varied within the four-digit subtypes (only necessary at positions 28, 32, 37, 67, 70, 71, and 74). For each of these, the expected misclassification rate when assigned by using population frequencies is less than 1 %, apart from positions 67 (2 %) and 71 (3 %). For each polymorphic position, samples were assigned a dosage (i.e., the expected number of copies) for each residue. Logistic regression was then used to test for association at each position separately, and degrees of freedom were equal to one less than the number of distinct residues. For the most significant positions, forward stepwise regression was used to identify the residues associated with disease risk at that position.

We used population HLA-DRB1 frequencies for inferring amino acid residues in both cases and controls. Under the null hypothesis of no association, the frequencies would be the same, and any bias introduced by using population frequencies for cases would be toward the null. Using HLA-DRB1 four-digit frequencies that have been observed in patients with GCA to infer the amino acid residues in the GCA cases could lead to a biased analysis with inflated false-positive rate.

Results that reach a nominal significance level of 0.05 are highlighted. For the exploratory hypotheses, these should be interpreted in the light of multiple testing. Analyses were performed in SPSS 15 (IBM Corporation, Armonk, NY, USA) and Stata SE (StataCorp LP, College Station, TX, USA).

Results

Patients

Two hundred twenty-five patients with GCA from 7 UK centres consented to analysis of genetic material for this study (125 from Leeds hospitals, including 38 from Otley; 33 from Harrogate; 23 from Southend; 17 from York; 16 from Dewsbury; 10 from Pontefract; and one from Ipswich). Their demographics and disease characteristics, including fulfilment of 1990 ACR criteria, are shown in Table 1. Of the 183 temporal artery biopsies performed, 140 (77 %) were positive. Patients were all European Caucasian.

Table 1 Patient characteristics

Analysis of genotyping data

Allele frequencies in cases and 1378 UKRAG controls are shown in Table 2 with per-allele odds ratios with and without adjustment for HLA-DRB1*04, the previously proposed susceptibility allele. Initial analysis was performed at the two-digit level. Four-digit analysis was also performed for the common *04 subtypes, but statistical analysis was not performed on the rarer *04 subtypes.

Table 2 Allele frequencies and per-allele odds ratios in giant cell arteritis cases and controls

In 225 patients with GCA and 1378 controls in the novel UK cohort, a susceptibility effect of HLA-DRB1*04 carriage was confirmed (odds ratio (OR) 2.69, 95 % confidence interval (CI) 2.02 to 3.58, P = 1.5×10−11) (Table 3). Possible protective effects from HLA-DRB1*01 and HLA-DRB1*15 were noted, but only HLA-DRB1*01 retained significance after adjusting for HLA-DRB1*04 (Table 2). The data were consistent with a dominant effect of HLA-DRB1*04 (OR for one copy 2.78, 95 % CI 2.07 to 3.72, OR for two copies 1.94, 95 % CI 0.95 to 3.96, compared with no copies).

Table 3 Meta-analysis of HLA-DRB1 giant cell arteritis associations in the literature

The effect sizes for HLA-DRB1*04 carriage were similar when restricting analyses to biopsy-positive GCA cases (OR = 2.83, 1.99 to 4.03, P = 7.7×10−9); the number of biopsy-negative GCA cases was too small for a separate analysis.

Meta-analysis of giant cell arteritis susceptibility data

Meta-analysis of previously published data from 14 studies (691 cases, 4038 controls; Table 3) gave ORs of 2.45 (P = 9.2×10−24), 0.78 (P = 0.11), and 0.71 (P = 0.0019) for HLA-DRB1*04, *01, and *02, respectively (*02 is now reclassified as *15 and *16). There was more heterogeneity noted for some alleles than for others; the HLA-DRB1*04 meta-analysis had an I2 statistic of 0 % whereas the meta-analysis for HLA-DRB1*01 had an I2 statistic of 39.4 %. When our UK data was included, the meta-analysis still demonstrated protective effects for HLA-DRB1*01 and HLA-DRB1*02 (P = 0.037 and P = 8.2×10−6, respectively) (Table 3). In the six published articles with information on ethnicity, cases and controls were always stated to be either “white” or “Caucasian”. Therefore, subgroup meta-analysis by ethnic group was not possible.

Worldwide giant cell arteritis incidence in relation to HLA-DRB1*04 population carrier frequencies

Reliable population HLA-DRB1 data were not always available, notably for small, native tribes of Alaska and Saskatoon, where extrapolation from small and physically or genetically (or both) isolated communities was felt to be unwarranted. Table 4 summarises the GCA incidence articles included, together with the estimated population allele frequencies and the number of individuals on which these estimates are based. Substantial clinical heterogeneity was identified in the GCA incidence studies, including variations in the methods used to identify GCA cases and confirm GCA diagnosis.

Table 4 Incidence of giant cell arteritis and population HLA-DRB1 allele frequencies in different countries

At the two-digit level (Table 4 and Fig. 1a and b), 17 studies were included in the analysis of worldwide GCA incidence in relation to population HLA-DRB1 allele frequencies. In view of our meta-analysis findings, we extracted data for HLA-DRB1*04, HLA-DRB1*01, and HLA-DRB1*15 population frequencies (Table 4). The majority of these were from Europe and the Mediterranean area. Within this small dataset, HLA-DRB1*15 was more common in the general population at more northerly latitudes (r = 0.52, P = 0.038) whereas no significant association with latitude was seen for population HLA-DRB1*04 or HLA-DRB1*01 (r = 0.47, P = 0.057; r = 0.39, P = 0.133).

Fig. 1
figure 1

Giant cell arteritis (GCA) incidence in relation to population HLA-DRB1 allele frequencies (a) and to latitude (b). Significance levels from a global test of difference in distribution of amino acid frequencies between cases and controls at specific positions (c)

Predictors of GCA incidence in univariable analyses were HLA-DRB1*04 population allele frequency (P = 0.001, adjusted R2 = 0.51) and latitude (P = 0.004, adjusted R2 = 0.40), whereas HLA-DRB1*15 was non-significant. In multivariable analysis, both were significant predictors and each made independent contributions to the explanatory power of the model (HLA-DRB1*04, P = 0.008; latitude, P = 0.036; adjusted R2 of the model = 0.62). HLA-DRB1*15 made no additional contribution to the explanatory power of the model.

Development of model with 11-13-33 amino acid risk motif

Tests for association showed that the most significant position was at 13 (P = 1.2×10−9), followed by 11, 33, 37, and 9 (Fig. 1c). At position 13, the most significant residue was H (OR = 2.11, 95 % CI 1.61 to 2.77, P = 5.5×10−8, equivalent to H at 33 and also to the 04 allele). However, stepwise regression found additional contributions from residues S (OR = 1.38, 95 % CI 1.07 to 1.77, P = 0.014) and F (OR = 0.66, 95 % CI 0.44 to 0.99, P = 0.038) (Table 5). Similarly, multiple residues were found at positions 11 and 37. There is very strong linkage disequilibrium in this region, and so many of these residues at different positions almost always occur together, as illustrated in Additional file 1. For example, we did not have the power to distinguish between the risk effects of V, H, and H at positions 11, 13, and 33, respectively, since the *10 allele, which differs at residue 13, is very rare. The previously proposed DYF motif (positions 28, 30, and 31) in HVR2 (OR = 1.54, 95 % CI 1.21 to 1.96, P = 0.00038) did not explain the observed data as well as simple HLA-DRB1*04 carriage. Similarly, variation in amino acid residues within HVR3 is unlikely to explain the observed GCA susceptibility data (Fig. 1c), especially since the other alleles comprising the “RA shared epitope” were not over-represented in GCA.

Table 5 Results of tests for association of amino acid at the most significant positions: 9, 11, 13, 33, and 37

Discussion

In this study, which includes both new UK data and the first formal meta-analysis of published data on HLA-DRB1 associations of GCA, we not only confirm a strong association of GCA with HLA-DRB1*04 allele carriage, including within our own UK data, but also identify possible protective effects of HLA-DRB1*01 and HLA-DRB1*15, supported by the meta-analysis of previous studies. We were able to impute amino acid residues quite reliably from published allele frequencies, enabling us to analyse amino acid residues even though four-digit typing was not available for every HLA-DRB1 allele. Based on this, it was the amino acid residues 11, 13, and 33 in the first and second hypervariable regions that best explained the observed HLA-DRB1 susceptibility and protective effects, rather than the previously proposed DRYF amino acid motif in the second hypervariable region [12]. We also observed that some non-HLA-DRB1*04 amino acid residues had additional effects (individual amino acid residues that were retained by a multivariable regression model for each separate amino acid position are shown in the last two columns in Table 5), suggesting additional genetic complexities that we did not have the power to investigate in depth. We then systematically extracted data on population prevalence of the identified susceptibility and protective HLA-DRB1 alleles and compared this with reports of GCA incidence in different countries. We found a significant and independent relationship of GCA incidence both with HLA-DRB1*04 and with latitude. Conversely, HLA-DRB1*15 was, if anything, protective and did not contribute to incidence of GCA in the geo-epidemiological study.

Strengths of this work include the presentation of the first UK HLA data in GCA, its presentation in the context of the international literature, the first meta-analysis of HLA-DRB1*04 GCA susceptibility studies, and the novel approach combining a traditional genetic association study with a geo-epidemiology approach. Using logistic regression for the UK, we could control for already-known HLA-DRB1 susceptibility effects in the per-allele analysis, which also has not been performed in other datasets, which mostly reported only carrier frequencies not allele frequencies. Based on this, we were able to suggest HLA-DRB1 amino acid residues that best fit the observed susceptibility/protective allele effects. This is the first synthesis of the literature on reported GCA incidence in relation to population HLA-DRB1 allele frequency and geographical latitude.

Our analysis is based on certain assumptions. Firstly, because many clinicians in the UK do not always request temporal artery biopsy except in cases of diagnostic doubt [24], we had prespecified in the analysis that GCA would be defined clinically rather than limiting inclusion to biopsy-positive cases only. The clinically diagnosed patients, however, had to be firmly diagnosed by an experienced consultant, and there had to be unequivocal clinical features and no alternative explanation for the symptoms after follow-up. Temporal artery biopsy is not 100 % sensitive for GCA; possible reasons for false-negative biopsies in our cohort included delays in obtaining biopsies resulting in resolution of inflammation, suboptimal biopsy length, and biopsy reporting based on the classic pathologic criteria, which may be overly stringent [25]; sometimes the temporal artery is spared in patients with GCA, particularly in those with predominant disease of the aorta and its proximal branches [26]. We conducted a sub-analysis of the biopsy-positive subset and found no difference in the observed effect size for HLA-DRB1*04 association compared with the whole group; with such a small number of biopsy-negative cases, no meaningful statement can be made about the effect size in that group. Our meta-analysis showed that the effect size in the cohort overall was also comparable to that observed in previous reports, some of which included only biopsy-positive cases. If not all the cases truly had GCA, this would have reduced the power of the study (“diluted out” the genetic association) but would have been highly unlikely to introduce artefactual genetic associations because the differential diagnosis of GCA is so wide. Similar pragmatic approaches to case definition for genetics studies, accepting a small, finite rate of misclassification in order to maximise recruitment, have been successfully used in other genetic association studies [27]. We also did not have the power to study whether there are differences in the effect size between regions of the UK, but regional variations in the incidence of diagnosed GCA have been described [28]; it remains unclear how far this is influenced by regional variations in population HLA-DRB1 frequency [29].

In regard to the HLA-DRB1 typing, it is recognised that HLA-DRB1 represents only a small part of the whole MHC and also that not having complete sequence-based four-digit typing may have resulted in some important information being missed. This study focused on HLA-DRB1 and we did not set out to analyse variation elsewhere in the MHC [30]. However, our finding that non-HLA-DRB1*04 residues also contributed significantly to GCA susceptibility/protection (Table 5) suggests that other alleles may also be involved. The MHC is a complex locus with extensive linkage disequilibrium, and an MHC-wide analysis requires larger datasets and specialised analysis methods. A concurrent international, collaborative large-scale genetic analysis of GCA (including samples from this study), using a different genotyping platform (Immunochip) with more extensive coverage of the HLA region [31], shows evidence of wider involvement of the MHC region while confirming the strong association with DRB1*04. Lastly, the literature reviews and meta-analyses are limited by the small number of studies in the literature, many of which were published some years ago, with corresponding variations in case ascertainment and in genotyping assays. Larger datasets using modern genotyping and statistical analysis methods will reveal further GCA susceptibility alleles within the whole HLA locus and allow their pattern of linkage disequilibrium to be analysed.

The P values reported here should be considered in the light of multiple testing, but owing to the a priori suggestion of HLA-DRB1*04 association and lack of consensus as to how to adjust for multiple testing at a multi-allelic locus where the different alleles are not independent of each other, we did not consider a Bonferroni correction to be appropriate here. Nevertheless, model over-fitting is a possibility, and it is essential that our findings be replicated in an independent dataset.

Conclusions

In summary, we report a novel approach to studying genetic influences of disease by combining traditional genetic association studies with geo-epidemiology methods that capitalise on publicly available data. Our new UK data and a synthesis of the published literature suggest that HLA-DRB1*04 might explain part of the observed geographical variation in GCA incidence. This is consistent with an autoimmune aetiology for GCA [32]. However, we found additional variation in susceptibility (Table 5) and incidence (Fig. 1a, b) that is not fully explained by HLA-DRB1*04 and is likely to relate to additional, unknown genetic and environmental factors. Previous studies of GCA have also demonstrated an association between HLA-DRB1*04 and visual loss [33] and also with glucocorticoid resistance [34]. Of interest, in Japan (where HLA-DRB1*04 population frequency is low), large-vessel vasculitis (Takayasu arteritis) is relatively more common than GCA. Takayasu arteritis was associated with alleles containing the 11-13-33 V-H-H motif (HLA-DRB1*0405) in a Turkish population but was not associated with another allele also containing V-H-H (HLA-DRB1*0401) in a European-American population; HLA-DRB1*1502, which was associated with Takayasu arteritis in both populations [35], does not contain the V-H-H motif. Very few patients in our dataset had large-vessel imaging, but genetic characterisation of the subset of GCA patients who have large-vessel involvement or temporal artery sparing or both [26] would be of interest in future studies. From a clinical perspective, further study of well-phenotyped cohorts is required to determine whether HLA-DRB1*04 may serve as a biomarker of pathophysiologically relevant phenotypic disease subsets in order to develop better risk stratification, prediction of response to glucocorticoids, and ultimately targeted therapies.