Introduction

The history of genome-wide mapping of disease-causing genes began in 1980, when linkage analysis by use of anonymous genetic markers was suggested as a method for conducting 'forward genetics' analyses (hypothesis-free mapping starting from a trait of interest) [1]. This soon led to successful identification of several disease-causing genes, often providing the first information on disease mechanisms.

In principal, there are two approaches to genetic mapping: linkage and association analysis (reviewed in [2]). Linkage analysis is based on inheritance of chromosomal fragments within families with affected and unaffected individuals. It allows genome-wide mapping with limited resources, but it can generally only map loci into large genomic regions that span hundreds of genes and, despite great success in monogenic diseases, linkage analysis seems to be of limited use in mapping of complex traits. Association studies compare large unrelated groups of patients with the healthy population to find regions that are overrepresented in patients. This increases mapping precision dramatically but it requires large repositories of patient materials and very closely spaced genetic markers, creating a need for correction for multiple testing, which raises the threshold for claiming statistical significance. Until recently, candidate gene studies were the only realistic way to utilize patient materials for association studies. The major disadvantage of candidate studies is the need for a starting hypothesis to choose candidates. The most interesting prospect of gene mapping, however, is that hypothesis-free mapping can point to previously unknown and unexpected disease pathways.

Neither of these strategies has been successful in mapping genes that control complex diseases, such as rheumatoid arthritis (RA), in humans. Mapping in animal models therefore emerged as an attractive alternative. Choosing candidates identified by positional cloning in animal models combines the high power of candidate studies with the benefits of hypothesis-free mapping.

The traditional strategy to map genes in animals is to intercross two inbred strains that differ in the trait of interest for at least two generations, thereby allowing chromosome regions to segregate, and permitting linkage analysis in a setting with minimal genetic and environmental variation (Figure 1). Not only is the mapping power superior to that in human linkage analysis, but also the identified loci can be isolated on a fixed genetic background to confirm the position of the locus by backcrossing to one of the parental strains for several generations to create a congenic strain (an inbred strain with only a defined genetic region originating from another strain). The congenic region can then be minimized by further backcrossing, checking each generation to make sure that the quantitative trait locus (QTL) is still within the congenic fragment, until only the causative gene remains.

Figure 1
figure 1

Strategies in animal models. Presented are the most common strategies employed to identify and validate a candidate gene using animal models. GWA, genome-wide association; QTL, quantitative trait locus.

As in the tale of the tortoise and the hare, human genetics has been regarded as fast but unreliable, whereas animal genetics is slow and laborious but likely to find the gene sooner or later. However, even though a few victories have been won by the tortoise, thanks to denser genotyping and considerably larger patient cohorts that allow near genome-wide association (GWA) mapping, human genetics has also started to produce strong candidate genes for complex diseases. In light of this success, we must consider how best to use animal models in the future; is there still value in identifying the genes that affect susceptibility to disease in these species as well?

Clearly, major challenges remain in human genetics that can be resolved in animals. Most genes with medium or small effects still need the focused and strategic work of animal geneticists to reveal their secrets, and only animal genetics studies allow controlled, repeated experiments that can determine causality without doubt. Most important, however, is that although human genetics often faces dead ends because the function of the identified gene is unknown, animal models allow us to investigate the role played by the genes and to perform conclusive experiments to investigate disease mechanisms and develop more precise treatments.

Current status of human genetics research

The advent of GWA in humans ushered in a new era in disease genetics. GWA studies have been very successful in identifying with statistical rigour the genes that are responsible for several complex diseases, including arthritis, which is reviewed in detail in other articles in this series (for another review, also see [3]). However, at this stage the human GWA studies still wrestle with severe problems and limitations; this is particularly apparent in arthritis studies, where success has been more moderate than for many other complex diseases.

The major problem is the strict correction for multiple testing needed to exclude false positives after performing hundreds of thousands, or even millions, of tests. It is therefore estimated that materials from tens of thousands of patients and control individuals are needed to identify the majority of genetic effects [4]. Studies combined with retesting in other materials is likely to allow confirmation of the strongest of these associations in the near future, but most are likely to elude mapping. This will be especially true for diseases such as RA, for which studies thus far suggest that the patient population must be stratified into smaller patient groups, resulting in smaller bodies of patient materials and even larger numbers of tests [5, 6]. This problem will be even worse if interactions are to be addressed. This is an important issue because it is likely that much of the genetic influence is through patterns of interacting genes.

Another issue is the limited possibilities for follow-up experiments in humans. Many loci found by association mapping are located in intergenic regions, including two of the strongest loci for RA, namely TRAF1-C5 and TNFAIP3-OLIG3, making it difficult to establish causality [7, 8]. TRAF1 and TNFAIP3 have been favoured as candidates based on previous knowledge of their function in tumour necrosis factor signalling [9, 10], which is known to be important in RA (reviewed in [11]). Although it is likely that these genes truly are involved in the pathogenesis of RA, this remains to be proven; as for candidate studies, this type of reasoning is counter to one of the main aims: hypothesis-free generation of new knowledge. Interestingly, C5 has already been implicated, based on studies conducted in mice [1214], and it should therefore be considered an equally likely candidate. Similar problems have been apparent for half a century in elucidating the major histocompatibility complex (MHC) region, in which the genes may operate as linked units, haplotypes. More precise phenotypic information and biological knowledge is needed to understand these genetic regions.

Animal models and their relevance to rheumatoid arthritis

The value of mapping in animals is dependent on there being good models of human diseases. In this review we focus on RA, a highly heterogeneous autoimmune disease that is known to depend on multiple genes and environmental factors. The disease models should therefore preferably be correspondingly polygenic and dependent on environment. There are a number of available animal models for RA that all mimic various aspects of the disease, possibly reflecting disease pathways that operate in different subgroups of RA patients. Thus, all of these models can be valuable under certain conditions, depending on the question that is to be addressed.

Induced arthritis models

If an antigen is known to induce disease, then this permits studies of the antigen-specific response and allows mapping of the genes involved. Collagen-induced arthritis (CIA) is induced by the major collagen found in cartilage, namely collagen type II (CII), emulsified in adjuvant [15, 16]. Disease develops 2 to 3 weeks after immunization in susceptible strains (H-2q or H-2r) [17]. CIA is the most widely used model for studying arthritis pathology and for testing for novel anti-inflammatory therapeutics [18].

Proteoglycan (aggrecan)-induced arthritis (PGIA), characterized by a progressive disease course, is induced by cartilage proteoglycans. PGIA presents with 100% incidence in BALB/c mice (H-2d), which are normally resistant to CIA [19], and manifest in substrains of C3H (H-2k) [20]. CIA and PGIA are the two most commonly used RA models for QTL mapping in mice. Both models are complex highly polygenic diseases that are dependent on both B and T cells [2124] and are both associated with MHC class II molecules (MHCII) and a large number of both common and unique non-MHC loci (Figure 2) [17, 25]. Both CIA and PGIA are believed to have relevance to human disease because antibodies to both CII and proteoglycan in RA patients have been identified [2628].

Figure 2
figure 2

Overview of CIA, PGIA and STIA loci mapped in mouse. CIA, collagen-induced arthritis; PGIA, proteoglycan (aggrecan)-induced arthritis; STIA, serum transfer-induced arthritis.

Other cartilage structures that can induce arthritis include cartilage oligomeric matrix protein [29, 30] and type XI collagen [31].

Collagen antibody-induced arthritis (CAIA) is induced by injection of specific monoclonal CII antibodies [32]. The model was developed based on the finding that serum from arthritic mice or RA patients could transfer arthritis to naïve mice [33, 34]. CAIA resembles CIA but is more acute and has a rapid onset, a few days after injection. Normally, the disease heals after a month and mice remain healthy. The CAIA model is unique because it is independent of MHC and T and B cells [35, 36]. Instead, neutrophils and macrophages are recruited and activated independent of the adaptive immune system, as a result of antibodies binding to the cartilage surface and fixing complement [36]. This allows investigation of effector mechanisms without involvement of the priming phase.

A number of bacteria also have the capacity to induce arthritis in animals. Mice infected with Borrelia develop a disease similar to RA (B. burgdorferi associated arthritis) [37] and Staphyolococcus aureus causes septic arthritis in both rats and mice [38, 39]. Bacterial components, such as cell wall fragments, DNA and heat shock proteins, can also induce arthritis by themselves, one example being the streptococcal cell wall induced arthritis model [40]. In rats, exposure to heat-killed Mycobacterium tuberculosis in adjuvant results in Mycobacterium induced-arthritis, often referred to as adjuvant-induced arthritis [41]. This model was developed in 1947 when it was found that a mixture of mineral oils, emulsifier and mycobacteria – namely complete Freund's adjuvant – was a potent immunological adjuvant. It was later found that a similar mixture but excluding mycobacteria (incomplete Freund's adjuvant) also had arthritogenic capacity (oil-induced arthritis) [42]. In addition, some mineral oils by themselves had the capacity to induce arthritis, including squalene [43] and pristane [44].

Pristane-induced arthritis (PIA) in rats highly resembles many aspects of the human disease because it is chronic, symmetrical, and serum rheumatoid factor is present and radiographic changes are apparent [44, 45]. Even though pristane does not contain peptides that could bind to MHC, PIA has been shown to be T-cell driven and dependent on MHCII [46], suggesting that the arthritogenic T cells recognize a self-antigen on the MHC complex, but thus far no such antigen has been identified.

Genetically altered mice as models of arthritis

There are also animal models that are produced using transgenic techniques, and develop arthritis spontaneously, which can be used to map modifier genes. Examples are IL-1 receptor antagonist knockouts, IL-1 over-expressing mice, gp130 knock-ins and human tumour necrosis factor-α transgenic mice [4750]. K/B×N mice express a transgenic T-cell receptor (KRN) and the NOD-derived Ag7 MHCII allele, and develop severe arthritis spontaneously [51]. The autoantigen is the ubiquitously expressed enzyme glucose-6-phosphate isomerase [52], but inflammation is restricted to the joints, and the disease exhibits many of the characteristics of human RA. Autoantibodies play a pathogenic role in this model, because arthritis can be transferred to a wide range of recipients with serum from K/B×N mice (serum transfer-induced arthritis) [53]. Arthritis can also be induced by injection of recombinant glucose-6-phosphate isomerase in mice [54].

In addition, there are spontaneous models that develop arthritis because of a single mutation. These models can be derived as a result of a spontaneous mutation or following N-ethyl-N-nitrosurea mutagenesis. The causative mutation can then be positionally cloned by means of linkage analysis (Figure 1).

Genetic modifications of animals

With emerging knowledge of the major genes that underlie human disease and improved animal models, it seems straightforward to investigate the in vivo function of these genes in the animal models. To this end, the particular genes can be humanized or modified in mice and the effect of the specific mutations on disease development investigated (Figure 1). Of particular use will be new technologies to modify the genome, which will allow researchers to introduce genes, mutate genes in specific tissues and express proteins flagged with various markers. There are, however, some significant drawbacks that have thus far limited the use of this technology, and these need to be highlighted. First, it is essential that the modifications are dependent on the genetic context (the new genetic modifications will interact with other genes in the genome, specifically mouse genes). Second, to conduct conclusive experiments and compare them between different laboratories, the genetic background must be inbred and standardized. Finally, modifications to the genome lead to artifacts that interfere with interpretation of the results. Clearly, to use genetic modifiactions we must obtain better knowledge about the genomic control of the disease in question in mice. We first discuss some of the problems that genetic modifications may cause.

Although transgenic or genetic knockout strategies are appealing, being relatively fast and cost efficient, it is important to appreciate that they carry a high risk of artifacts. Despite the efficiency of inserting a mutation that completely disrupts the function of a gene, most genetic factors in common complex diseases are expected to be noncrucial, coding single nucleotide polymorphisms or expression differences [55]. Complete elimination of a gene does not necessarily have the same effect as a smaller change that affects, for instance, expression kinetics or binding to a target molecule. Accordingly, studies of knockout mice have identified phenotypes that are fundamentally different from what was expected from the naturally occurring locus. This is clearly seen in the case of the Ncf1 gene. Mice with a spontaneous mutation in this gene, resulting in a truncated protein, exhibit increased susceptibility to models of arthritis and even develop arthritis spontaneously [56], whereas knockout of Ncf1 results in chronic granulomatous disease with severe infections as a consequence [57]. The same problems apply to other types of transgenes in which a construct is expressed outside its normal context, possibly with dramatic effects on gene regulation and protein expression. This can also be true in humanized mice, in which human genetic variants have been introduced in an artificial genetic interactive environment. Nevertheless, these mice can be extremely useful in clarifying specific questions. For example, humanized mice have successfully been used to investigate the individual roles of MHC class II molecules (MHCII) in arthritis and were proven to be useful in identifying T-cell epitopes (reviewed in [58]).

Another important issue when studying polygenic diseases is that transgenics can normally not be made directly in the strain that will be used for experiments. Transgenic mice are instead made in embryonic stem cells, usually from the 129 or C57BL/6 strains, and backcrossed to the strain of interest, thus creating a mixed genome with a 129 or C57BL/6 region surrounding the insert. Even after 10 generations of backcrossing, there is almost 40% risk that a locus 10 cM from the targeted gene is still within this fragment, a region that could contain hundreds of genes [59]. Based on findings from mappings of CIA in mouse, it is quite likely that this congenic fragment will contain QTLs that affect the trait, making it impossible to know whether the phenotype truly originates from the transgene (Figure 2) [6062].

Such linked QTLs have proven to be a problem in several studies. For example, the osteopontin (Opn) gene was suggested to be involved in autoimmunity based on phenotyping of a knockout strain, but it was later revealed that another Opn knockout had no such phenotype, and that the effect was probably due to liked genes in the 129 fragment [63]. More recently, contradictory data about the role of IL-21 in autoimmunity and differentiation of T-helper-17 cells have led to a similar discussion. In fact, none of the studies using IL-21 or IL-21 receptor knockout mice were set up such that the influence of other genes could be excluded [64]. This is especially problematic if the aim is to confirm the mapping of a candidate gene. Random insertion may affect the usage of the gene whereas targeted insertion will place it within a congenic region that might contain the QTL studied, yielding false-positive confirmation (Figure 1). Most importantly, there is a risk that only hypothesis-confirming results will be reported, without any correction for multiple testing.

Gene findings in animal models

Linkage analysis of segregating crosses between inbred strains with different susceptibilities to arthritis has proven to be very efficient and informative. It has confirmed polygenicity and shown that some, but not all, loci are shared between models and strain combinations. Figure 2 shows loci controlling CIA (40 loci) and PGIA (29 loci) in mice [65]. The majority of these loci were mapped in genome-wide F2 crosses. However, parts of chromosomes 3, 6, 7, 14 and 15 have been fine mapped in partial advanced intercrosses and subcongenic strains, and in all regions studied loci have appeared where nothing was detectable in F2 crosses, suggesting that the locus density could be as high on all chromosomes [6062, 66]. Similar numbers of loci have been mapped in rat models of arthritis: 29 for CIA, 39 for PIA, eight for oil-induced arthritis and five controlling adjuvant-induced arthritis [67]. These fine-mapping studies suggest that multiple arthritis loci on a chromosome is the rule rather than the exception; it is especially important to bear this in mind when designing experiments in genetically modified strains.

Another important accomplishment of animal genetics is the study of gene-gene interactions. Studying interactions is statistically challenging because of the enormous number of tests that must be conducted. Animal crosses allow mapping and modelling of multiple locus interactions, which has turned out to be of fundamental importance in some phenotypes. The Cia21 and Cia22 loci increase susceptibility to arthritis in mice only in the presence of RIIIS/J alleles in the Cia32 locus, which also interacts with Cia31 and Cia26 [61]. Including interactions in the analysis has also allowed mapping of several other loci, including Cia41 and Cia42 in mouse and Cia26 in rats [60, 68]. Performing this type of study in humans would require even larger patient populations and computation resources, and will remain unfeasible for many years yet.

Positioning of the underlying genes has, as expected, not been achieved with similar ease. Initial expectations of rapid gene identification have been based on an underestimation of the complexity of the disease, even if it is bound to be less extensive than in the human situation. Another problem has been to find relevant recombinations that split the strongly linked genetic fragments controlling disease. The genetic effect may in fact be dependent on haplotypes rather than on single genetic polymorphisms. In spite of this, a number of genes – for example, MHCII [17, 69, 70], Ncf1 [56, 71] and Hc (C5) [1214] – have been successfully identified as arthritis regulating using animal models. Furthermore, the Oia2 locus in rats has been shown to be caused by variation in a gene complex encoding C-type lectin-like receptors (APLEC), but thus far it has not been possible to establish which of the genes is responsible for the effect [72].

The MHCII region was the first locus found to be associated with arthritis in both mice [17, 69] and humans [73], and it remains the strongest association in both species. It was recognized early on that CIA susceptibility was almost exclusively seen in inbred strains that had either H2q or H2r haplotype at the MHC locus [17, 69]. The H2p protein, which renders mice nonsusceptible to CIA, differs from H2q only by four amino acids in the peptide binding groove, and changing these to the corresponding amino acids in the H2q sequence makes the H2p mice susceptible to CIA [70]. Interestingly, the binding groove of the H2q MHC strongly resembles that of the human HLA-DRB1*04 and *01 shared epitope haplotypes, which are associated with increased risk for development of RA. Furthermore, transgenic mice expressing the human risk haplotypes are susceptible to CIA [74].

The C5 gene is a very strong candidate gene for the Cia2 locus, which has been identified in two different F2 crosses, including the NOD.Q and SWR/J strains [12, 13]. It has also been confirmed in an advanced intercross and in congenic lines, although in these situations there is evidence for additional contributing genetic influences closely linked to C5 [14]. These strains are C5 deficient because of frame shift deletion and early termination of translation [75]. The C5 polymorphism is not found in wild mice, however, although it is widespread in inbred strain, possibly because of a bottleneck effect during domestication. The suspected role of C5 and complement in RA has been confirmed in numerous animal experiments and models (reviewed in [76]). Importance in humans has been suggested by increased complement activity in RA joints compared with joints afflicted with other arthritides [77, 78] and was also supported by the TRAF1-C5 association [7].

The Ncf1 gene, which encodes the p47phox protein of the phagocytic NADPH (nicotinamide adenine dinucleotide phosphate) oxidase complex, has been positionally cloned as the major gene underlying the Pia4 locus in rats. Surprisingly, the mutation – resulting in low production of reactive oxygen species (ROS) – rendered the animals more susceptible to severe arthritis [71] as a result of altered oxidation status of arthritogenic T cells [79]. This finding was reproduced in a mouse strain carrying another spontaneous mutation in Ncf1 and with nearly absent ROS production [56, 80]. Based on knowledge from the animal studies, we conducted a candidate association study in a human case-control study of RA. Because NCF1 is more complex in human than in mouse, with pseudogenes and copy number variations [81, 82], we limited our study to the other subunits of the NADPH oxidase complex. We hypothesized that single nucleotide polymorphisms in any of the other subunits could cause the same reduction in ROS production and thereby affect disease. Accordingly, we found an association with NCF4 (p40phox) in rheumatoid factor negative men [82]. This proves that although not all genetic findings in animals can be directly translated to humans, we can identify pathways in mice that are likely to operate similarly in humans.

A success story for mapping of spontaneous mutations is the SKG mouse, derived from a BALB/c breeding. The SKG mouse strain develops severe chronic arthritis at around 8 weeks of age, because of a mutation in the ZAP70 gene. The SKG model presents with high titres of rheumatoid factor and anti-CII autoantibodies, suggesting that it resembles RA both clinically and serologically [83]. ZAP70 is a key signal transduction molecule in T cells [83, 84] and the mutation alters sensitivity to thymic selection, resulting in positive selection of otherwise negatively selected autoimmune cells. Interestingly, even though autoreactive T cells are present in the periphery, an infectious agent is necessary for disease development [85].

The future of animal genetics

Like genetics research in humans, that in animals has progressed in recent years. A wealth of resources has been developed as a result of collaborative efforts, including bioinformatics tools, sequence and expression databases, and designer animals (for an extensive review of available resources, see [86]). New mouse resources, such as outbred stocks and advanced intercrosses, have been put to use to facilitate QTL mapping, and the first studies have reported breathtaking results on the number of QTLs and interactions between genes and environment [87, 88].

Outbred strains have high-density recombinations that can allow mapping to subcentimorgan levels in one generation, by combining the advantages of association mapping with the power of mapping in animal models. One such resource is heterogeneous stocks, in which several founder strains have been intercrossed for numerous generations, resulting in a fine mosaic of founder strain haplotypes [89, 90]. The known ancestry of the alleles increases mapping power compared with natural populations. Furthermore, compared with crosses of only two strains, heterogeneous stocks mice also have a large number of alleles, making it more probable that a QTL segregates in the cross. A number of genes and loci controlling other complex traits have already been mapped in outbred stocks, and studies on arthritis in both mice and rats are on the way [87, 91, 92].

Another resource that is under development, the collaborative cross, can make the process even more efficient by minimizing the cost of genotyping. By creating 1,000 recombinant inbred lines from eight founder strains that are first intercrossed to mix the genomes and then inbred, a permanent resource of homozygous mice will be generated that can be carefully genotyped once and then used by research groups all over the world [93]. Production of congenic strains for definite determination of causality will be facilitated by starting from genome tagged or chromosomal substitution strains (inbred strains in which part of or an entire chromosome has been exchanged for that of another inbred strain by the same methods used for making congenics) [94]. Large-scale projects are working at generating transgenic mouse lines for all genes, which can be used in confirmatory studies. Furthermore, the increasing access to sequence information from more and more inbred strains will facilitate the identification of causative polymorphisms and strengthen the power of in silico methods for QTL analysis [86]. Unfortunately, the use of many of these resources is limited by the strict MHC dependency of most arthritis models.

Another interesting prospect is the use of microarray data, to identify expression QTLs [95]. By considering gene expression levels as a quantitative trait, expression QTLs can be mapped directly in crosses, both to identify candidate genes and to indicate the key pathways affected. Of course, animal models have a huge advantage compared with humans because samples can be taken from any tissue or time point in the disease course.

By combining these new resources, mapping in animals could approach the speed of mapping in humans while retaining the advantages of animal experiments.

Relevance of findings made in animal models

It is sometimes argued that findings made in animals are not necessarily relevant to human disease. Naturally, there are several major differences between human disease and animal models. However, it is likely that the majority of genes will operate in a similar way in humans as in animals. A gene identified in animals might not be associated with disease in humans (for example, because it is not polymorphic in the human population), but it could still be part of a pathway that operates similarly in both species, as in the case of NCF4. This gene would not have been picked up by conventional association studies, because the effect is weak and the subpopulation small. However, thanks to the identification of Ncf1 as a disease-regulating gene in rats and mice, we were able to investigate a completely novel pathway in humans.

Even in the odd case in which the animal model operates through completely different pathways than the human disease, important information is gained, because animal models are central to the development and testing of new therapeutic strategies, and a discrepancy in disease mechanics can lead to catastrophic consequences if the therapy is transferred to the human situation after being proven safe and efficient in animals. This was seen when an anti-CD28 monoclonal antibody unexpectedly induced a life-threatening cytokine storm in volunteers when taken to phase I trials, a tragedy that might have been prevented by a better understanding of the immune system of the model organisms [96].

Another difference is the effect of the environment. Animal studies allow environmental factors to be limited to a minimum by fixed living and eating conditions. Furthermore, the inducing environmental factor is unknown in humans, whereas it is defined in animal models. Although this facilitates experimentation and increases power for the mapping, it can also be limiting in that it excludes environmental factors, some of which may be human specific, that can be pivotal in the pathogenesis of human disease. For example, smoking has been shown to play a role in susceptibility to arthritis and to interact with genetic factors [97].

Conclusion

It is clear that both human and animal genetics have benefits: human genetics in its certain relevance and relatively fast identification procedure; and animal genetics in its ability to limit complexity and so allow identification of loci with smaller effects, its benefit of allowing conclusive confirmation of findings, and its immense advantage in allowing further investigation and manipulation of the genes and pathways identified. In the same way, transgenic animals and congenic strains have advantages and disadvantages that make them more or less suited for each specific question considered. Attempts to elucidate the tight nest of interacting genetic effects that seem to make up the genetic background of truly complex diseases such as RA will greatly benefit from a joint attack along all avenues of research.

The different strategies should therefore not be regarded as competing options, but rather as complementary strategies that, together, could provide a true understanding of the genes and pathways that affect human diseases. They may also permit improved understanding of the animal models that we are so dependent on in the development of safe and efficient drugs.

Note

The Scientific Basis of Rheumatology: A Decade of Progress

This article is part of a special collection of reviews, The Scientific Basis of Rheumatology: A Decade of Progress, published to mark Arthritis Research & Therapy's 10th anniversary.

Other articles in this series can be found at: http://arthritis-research.com/sbr