Background

Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disorder of unknown etiology. Although environmental influences may trigger a response leading to the development of this autoimmune disease, both genetic and environmental factors are implicated in its pathogenesis [1]. It affects approximately 1% of the adult population with a female:male ratio ranging from 2:1 to 4:1 [2]. RA typically has an onset of symmetric joint swelling and reaches a peak incidence in the fourth and fifth decades of life [2]. RA-induced inflammatory response in the synovial membrane is typically chronic and destructive [3]. The main presenting symptoms of RA are pain, marked morning stiffness, impaired physical function, swelling, and tenderness of the joints. Constitutional symptoms of RA are fever, weight loss, and fatigue.

RA is a clinically heterogeneous disease and most likely has complex genetic involvement. The presence of underlying genetic heterogeneity of a trait often masks the effect of genetic markers with disease predisposing variants; hence, there may not be linkage in families in which the marker is not involved in the disease etiology [4]. One method used to address genetic heterogeneity and strengthen linkage findings is to incorporate phenotypic subsetting of the data [5]. Most phenotypic stratification approaches require that subsets be identified before linkage studies. We have applied this technique to detect linkage in another autoimmune disease, systemic lupus erythematosus (SLE) [6]. Alternatively, one can account for disease heterogeneity is by incorporating trait-related covariate data. Therefore, to map genes for complex trait, genetic analysis methods should acknowledge the presence of genetic heterogeneity when appropriate. In the present analysis, we used ordered-subset analysis (OSA), a powerful technique for linkage analysis of traits characterized by genetic heterogeneity [7]. In OSA, using different covariates based on clinical features of the phenotype or on environmental exposures, one can identify more homogeneous subsets of families. Linkage that would otherwise be missed may then be apparent. Therefore, the goal of OSA is to identify regions with increased linkage in a subset of families. Additionally, by increasing genetic homogeneity, OSA can also reduce the linkage interval as exemplified by other complex, diseases including Alzheimer disease [8].

The aims of our present analysis are to: 1) identify homogeneous subset of families and assess linkage and its location, 2) rigorously analyze the homogenous subsets of families with statistically significant chromosomal locations to find a parsimonious genetic model.

Data and methods

We analyzed data from the North American Rheumatoid Arthritis Consortium (NARAC) study as part of the Genetic Analysis Workshop 15 (GAW15). Only Caucasian families were used for these analyses. Initially, out of the original 637 families, 31 families were removed due to mixed ethnicity or because they were uninformative for linkage analysis (single affected member per family). In larger families, ungenotyped individuals were trimmed to facilitate computation that otherwise was not possible due to time and memory constraints on the computer hardware used. We performed genome-wide linkage analysis of 809 Illumina SNP markers in 5713 individuals from 606 Caucasian rheumatoid arthritis families. Analyses were performed using FLOSS (Flexible Ordered Subset Analysis), MERLIN, GeneHunter, GeneHunter-Modscore, and Genehunter-Plus with the ASM (allele sharing model) module. We used several complementary programs to compare the accuracy of our results.

To date, several clinical and epidemiological factors have been identified as potential trait-related covariates for RA. Among them, increasing age of onset has been associated with worse outcome in RA, with evidence that there has recently been a shift towards an older age of onset [9]. There are also age differences in the strength of the association with risk factors like HLA, which might suggest that age has an effect on disease phenotype [10]. Recently, anti-CCP antibodies have been identified as highly specific for RA. These antibodies have also demonstrated prognostic utility with regard to radiographic outcomes [11, 12]. Therefore, we selected covariates 'age of onset' and 'anti-CCP level' (anti-cyclic citrinullated peptide) and used them in OSA to identify homogeneous subgroup of families for linkage analysis. These covariates were used to assign linkage scores to each family using MERLIN. Mean covariate value for the family members was specified for each family and the families were ordered according to their covariate score. Multipoint linkage analysis was performed on all subsets of families with k smallest or k largest covariate scores. Thus, the subset type used here was extreme. The FLOSS program was used to create a covariate file for family covariate scores for all families and all covariates, and to calculate nonparametric linkage (NPL) scores. Permutation tests were used to assess the null hypothesis of independence of family linkage scores at each locus and family covariate scores. Each subset of homogeneous families that generated a statistically significant linkage was analyzed with GeneHunter to further confirm the NPL score.

Once we identified the linked genomic region, we then attempted to identify the most plausible parametric model (allele frequency, penetrance, and mode of inheritance) at that linked location. For each subset, parametric LOD scores were maximized using GeneHunter-Modscore. These allele frequencies and penetrance values were utilized in GeneHunter/GeneHunter-Plus with ASM module, which provides NPL, nonparametric LOD, parametric LOD, and heterogeneity LOD (HLOD) scores. In addition, information content was provided, which gave an index of the inheritance information extracted at each point in the genome by the marker genotyped. The LIN function (linear model to evaluate the evidence for linkage as defined by Kong and Cox [18]) of allele sharing method was used to calculate nonparametric LOD scores.

Results

The results of OSA along with the other relevant statistics are provided in Table 1. A significant increase in the evidence of linkage was observed at five chromosomal regions (Fig. 1). Using the covariate 'age of onset', statistically significant evidence of linkage was observed at chromosomes 4 (NPL = 4.5, p = 0.000003, peak at 102.03 cM, 472 families) and suggestive evidence was observed at chromosome 9 (NPL = 2.85, p = 0.002, peak at 0.59 cM, 27 families). With covariate 'anti-CCP level', statistically significant evidence of linkage was identified at chromosome 18 (NPL = 3.81, p = 0.00007, peak at 26.29 cM, 40 families), and suggestive evidence of linkage was observed at chromosome 2 (NPL = 3.66, p = 0.0001 peak at 154.11 cM, 219 families) and chromosome 19 (NPL = 3.28, p = 0.0003, peak at 52.21 cM, 10 families). The information content extracted at the linked region ranged from 46% to 80%.

Table 1 Summary of ordered subset linkage analysis
Figure 1
figure 1

Results of NPL analysis across the SNP marker positions. Results of NPL analysis across the SNP marker positions in chromosomes (2, 4, 9, 18, and 19) showing the evidence of linkage in ordered subset of the families (solid line) based on covariate scores compared to all 606 families (dashed line) using GeneHunter.

For each linkage region, the NPL score was significantly increased (p < 0.05) when we used all families versus subset of families. Further, the results with the GeneHunter program using the ordered subset families produced a statistically significant linkage that confirms the nearly identical NPL score obtained by the FLOSS program. Table 2 shows the results for parametric and nonparametric LOD scores obtained by incorporating the allele frequencies and penetrance of the best fitted model into GeneHunter-Plus with ASM module. Interestingly, these LOD scores are very similar at each linkage peak.

Table 2 Parametric and Non-parametric linkage analysis under the best fitted model

With the covariate 'age of onset', the age range in the ordered subset of families on chromosome 4 is between 31.5 and 83 years, whereas on chromosome 9 it is shifted more toward old age (59.5 and 83 years). The optimal range of 'anti-CCP level' in the ordered subset of families was greater in chromosomes 2 (133 to 413) and 18 (234 to 413), but lower for chromosome 19 (0.800 to 3.50). The values of peak maximized LOD (MOD) score and HLOD scores are nearly equal (which is expected). After using the allele-sharing model, not much difference was seen between the peak MOD score and nonparametric LOD scores produced by ASM except on chromosome 19.

Discussion

We have identified five linked chromosomal regions (2, 4, 9, 18, and 19) that may harbor the susceptibility genes for RA. Previous studies [13, 14] had identified linkage at chromosomes 2, 4, and 18. Our results also support the possibility of RA susceptibility gene in chromosomes 4 and 9 using the covariate 'age of onset' and in chromosomes 2, 18, and 19 using the covariate 'anti-CCP level'. It is interesting to note that the optimal range changes very little for 'anti-CCP level' on chromosome 2 and 18 linkages, but is quite different for chromosome 19, with absolutely no overlap. This would suggest an easily identifiable subset of family. However, we have only 10 families in this group, therefore, another independent replication is required to assess the validity of this finding.

We considered both nonparametric and parametric linkage analysis in this study. Both parametric and nonparametric results are very similar in terms of detecting the peak linkage locations. If we use the nonparametric LOD score then we have evidence for three statistically significant linkages at chromosomes 2, 4, and 18 that exceed the Lander and Kruglyak criteria (LOD score of 3.3) [15]. However, this threshold is not corrected for multiple testing (at least four different tests were performed: two different covariates and two different linkage methods, nonparametric as well as parametric). To maintain the overall genome-wide significance level (5% level), we have used an ad hoc correction procedure that raised the threshold of LOD score to 3.9. [This is calculated as: LOD(corrected) = LOD(conv) + log10(#test) [16, 17], where LOD(conv) is conventional LOD score to be significant = 3.3.] Interestingly, all three linkages remain significant after correcting for multiple testing.

For a complex trait like RA, successful identification of genetic risk loci has relied on the ability to minimize disease and genetic heterogeneity to increase the power to detect linkage. One way to account for disease heterogeneity is by incorporating covariate data. Phenotypically similar families may be genetically more homogeneous as well, in which case OSA can greatly improve the power of linkage analysis. Our results clearly show that 'age at onset' and 'anti-CCP level' are potentially two clinical markers that can be useful to detect linkage for RA and that OSA is an important technique to identify the linkage in the presence of heterogeneity. Such linkage studies could now be used for candidate gene as well as and fine mapping studies to identify the actual RA susceptibility genes.

Conclusion

A genome-wide OSA was performed to identify the linkage for RA. We used two continuous covariates, 'age of onset' and 'anti-CCP level' to identify a more homogeneous group. We have identified two statistically significant regions with evidence of linkage at chromosomes 4 and 18 and three regions with suggestive evidence of linkage at chromosomes 2, 9, and 19. Our results clearly demonstrated that OSA is a useful technique to detect linkage under heterogeneity.