Jacq and colleagues [1] have presented a candidate gene association study in French and European Caucasian populations, showing evidence that the rs3738919-C variant (major allele) of the gene ITGAV may be associated with rheumatoid arthritis (RA), with an overall odds ratio (OR) for C-containing genotypes of 1.94 and a 95% confidence interval (CI) of 1.3 to 2.9 (P = 0.002). In the light of difficulties surrounding studies on candidate gene associations, how did the authors arrive at this result, what needs to be done further, and how does the discovery fit in the quest for solving RA genetics?

ITGAV was selected as a candidate gene for RA for two reasons. First, it is localized 194 centimorgans from the p-telomere of chromosome 2, within a region stretching from 193 to 202 centimorgans that has been implicated by an RA genome scan [2]. Second, there is a strong functional hypothesis: ITGAV encodes the protein αv (CD51 antigen) of the integrin family, which combines with β3 to form the vitronectin receptor [3] and has a central role in angiogenesis [4]. Angiogenesis, in turn, is involved in hyperplasia of the synovial membrane in the RA pannus [5]. Modulation of angiogenesis by ITGAV variants is supported by the association of ITGAV with priapism [6].

The genomic region of ITGAV does not show any apparent functional implication of rs3738919. However, some SNPs are predicted to be located inside functional elements of ITGAV and part of the region seems to show variation in copy number. It is possible that rs3738919 reports on these polymorphisms as a result of linkage disequilibrium. Without haplotype analyses and functional studies it remains unknown whether rs3738919 is itself a disease-modifying variant or whether it is a signpost for a causative variant that has yet to be discovered. Identification of the causative variant, or of a comprehensive haplotype carrying it, would greatly facilitate replication studies needed to verify this new association.

In their candidate gene association study, Jacq and colleagues employed a family-based, multistage design. The use of patients and their parents inherently avoids problems with unknown population stratification. For the alternative case-control design, identifying stratifications requires the typing of many markers and extensive data analysis [7] and is therefore especially suitable for genome-wide studies [8].

In the first stage of a multistage study, associating markers and initial genetic models are identified without correction for multiple testing. ITGAV rs3738919-C association was found with an allelic OR of 1.5 and a significant increase in the C/C genotype. Further stages are to refine genetic models and to weed out false positives that may occur as a result of random differences in allele frequency in smaller cohorts or inhomogeneous phenotypes. For complex, heterogeneous diseases such as RA, even clinically very similar patients may represent pathomechanistically different disease subtypes not always discernible by current scores and laboratory parameters. If a genetic variant is relevant in only some subtypes (defined by sex, erosion status, auto-antibodies, or other parameters), its effect may be present in one cohort but not in another and may appear with smaller effect size in large, multi-center cohorts.

In a recent hypothesis-free, genome-wide association study for RA in a British population [8], rs3738919 was imputed from surrounding genotypes. Association of rs3738919-C did not emerge with genome-wide significance. However, the test of the single hypothesis of association of rs3738919-C with RA is significant (allelic OR 1.12, CI 1.02 to 1.22, P = 0.01). In comparison, from the data by Jacq and colleagues an allelic OR of 1.3 (CI 1.06 to 1.59, P = 0.01) can be calculated. Joint analysis of both studies results in an allelic OR of 1.14 (CI 1.06 to 1.24, P = 0.001; Mantel-Haenszel test, fixed effects [9]). This strongly supports a common role of ITGAV in RA, at least for populations with European ancestries. It also shows that both genome-wide studies and candidate gene studies will be important in the ongoing quest to identify genetic factors in complex diseases such as RA.

Association of HLA-DRB1 alleles and PTPN22 alleles with RA are well established [8, 10, 11]. For other potential RA loci, reported effects are usually small and replication studies were inconclusive. Notably, one well-powered candidate gene study in a Caucasian population verified associations between PADI4 and CTLA4, but not with other associations [12]. With OR in the range 1.1 to 1.2, variants of PADI4 and CTLA4 confer a similar risk to that of the new ITGAV variant. Because confirmed associations account for only about 50% of RA genetics, more candidate loci conferring similar risks are to be expected. They will probably be found on the basis of functional insights, computational modeling, animal models, expression analysis, and genome-wide association studies. Verification of newly found associations would be greatly aided by the deposition of genotypes and detailed phenotype data in public databases such as dbGaP [13] for use in (disease sub-type-specific) meta-analyses.

ITGAV is the latest example of a candidate gene that may be relevant to RA. Increased significance in the combined samples in the study of Jacq and colleagues, corroborated by data from a genome-wide study, suggests that this association may indeed be true and common to different European ancestries. Further research into the associating variant will require detailed haplotype analysis, verification in further studies, and research involving intermediate phenotypes or direct functional experiments. The newly found gene variant may not provide utility as a diagnostic marker for RA because of its high frequency in healthy controls. However, it may be another important step on the way to a better understanding of RA etiology and pathomechanisms, in particular the role of angiogenesis.