It is a little known fact that Gregor Mendel, remembered for his studies of trait inheritance in pea plants, also experimented with breeding mice to understand coat color traits. Had it not been for the disapproval of Bishop Anton Ernst Schaffgotsch, who led the Augustinian monastery where Mendel studied, he might well have been credited as the father of mouse genetics [1]. Instead, CC Little started generating inbred lines of mice half a century later, driven by a desire to understand cancer biology and recognizing the importance of reproducible genetic crosses [1]. From these beginnings more than 300 strains of laboratory mice have been developed; each line has been faithfully replicated and cryo-preserved, making them a renewable genetic resource. Most are the result of the blending together of Mus musculus sub-species, including domesticus and musculus, with some contribution of castaneous and molossinus, resulting in a distinctive genetic mosaic of these progenitors in each inbred line [2].

Today's geneticists usually turn to one of these inbred mouse strains when attempting to model human disease because mice offer advantages that few species can match. Importantly, the mouse genome can be easily manipulated with greater speed, scale and sophistication than that of other mammals, and the efforts of the International Mouse Genome Sequencing Consortium has resulted in a high quality reference genome sequence that is the envy of other model organism users [3]. The future for mouse genetics promises to be even more exciting now that high-throughput sequencing of mouse strain genomes has started, and efforts are under way to systematically disrupt every gene in the mouse genome and phenotype the resulting mutant animals [4]. Here, we outline the tools and technologies that have emerged for using mice to discover and characterize disease genes, and the resources that are being developed to accelerate these discoveries.

Sequencing mouse genomes

In 2002 the International Mouse Genome Sequencing Consortium released the first draft of the genome from C57BL/6J, an inbred strain of the laboratory mouse [3], and a finished genome was released in 2009 [5]. As one of the most globally used lines, C57BL/6J was a wise choice for the reference mouse strain, but it is by no means the only strain used in research. Therefore, subsequent efforts were initiated to generate genomic sequence of other inbred strains. Firstly, four different strains of the laboratory mouse were included by Celera in a whole-genomic shotgun sequencing project: A/J, DBA/2J, 129X1/SvJ and 129S1/SvImJ [6]. This resulted in 27.4 million sequencing reads, giving a total of 5.3x coverage of the mouse genome. Secondly, more than 150,000 short insert clones were sequenced from the 129S5SvEvBrd strain covering 4.7% of the reference genome [7]. Thirdly, Perlegen Sciences used hybridization to re-sequence 15 inbred mouse strains [8]; this set included 11 classical strains and four strains derived from the wild. Unlike the other resources, Perlegen's approach did not generate sequence reads, and their hybridization sequencing technology queried only 1.49 Gigabases of the reference genome (equivalent to about 58% of the C57BL/6J sequence that is non-repetitive). Furthermore, to generate high accuracy calls, high stringency cutoffs were used, resulting in a false negative rate estimated to be as high as 50% [2]. Therefore, available sequence data lacked the coverage and breadth of strains to make it a widely used resource.

The first non-reference mouse chromosomes to be sequenced were A/J and CAST/EiJ chromosome 17, revealing significant variation at the nucleotide level and also considerable structural differences [9]. Building on that work, we commenced the Mouse Genomes Project, which has sequenced the genomes of 17 key mouse strains using next-generation sequencing on the Illumina platform (Box 1). At the last data freeze in December 2009 an average of 25x sequence coverage of each strain had been generated, and a deep catalog of variants [10]. These data provide a comprehensive insight into the genomes of the 17 strains, allowing immediate access to background genetic information for most mouse models of disease in addition to facilitating the analysis of the molecular basis of complex traits with unparalleled resolution.

Genetic manipulation of mice in the post-genomic era

Technologies for modifying the mouse genome can be split into two broad classes: those for gene-driven analyses and those for random mutagenesis.

The collection and propagation of mice harboring spontaneous mutations with striking phenotypes, such as the obese mouse, served mouse geneticists well for most of the 20th century. When it became clear that the rate of random germline mutation can be significantly increased by exposure to radiation or to chemical mutagens such as N-ethyl-N-nitrosourea (ENU) [11], large-scale mutagenesis programs followed, resulting in an explosion in the number of mutant lines. Phenotypic screens of these lines led to the identification of many hundreds of new mutations and candidate disease genes [12, 13]. One notable example of a successful forward genetic screen, reviewed in [14], identified 89 ENU-induced mutants that influence the immune system, of which at least 69 have now been characterized at the molecular level. However, mapping random mutations and identifying the affected gene can be an arduous process, often taking years; therefore, causal mutations for only a fraction of mutant lines have been identified thus far. Screening DNA from archived mutagenized lines for mutations in a specific gene of interest is a parallel 'gene driven' strategy that has proven successful [15, 16]. With the advent of new sequencing technologies it is now cost effective to sequence mutagenized mouse exomes in their entirety, enabling the rapid identification of candidate disease genes from existing resources and meaning that mutagenesis-driven approaches may return as a powerful tool for studying disease genes. Other methods of random mutagenesis include retroviruses, transposons (reviewed in [17]), and 'gene traps' [18]. These DNA-based mutagens can be easily mapped using approaches such as splinkerette PCR [19] and are discussed in more detail below.

The genome of the mouse can also be manipulated by pronuclear injection of DNA into oocytes or by modification of embryonic stem (ES) cells, which can then be injected into blastocysts to make chimeras, allowing modified alleles to be transmitted through the germline. Direct pronuclear injection results in random integration of the injected DNA [20]; consequently, transgene copy numbers and integration sites differ between lines, potentially resulting in very different phenotypes. Large genomic fragments such as bacterial artificial chromosomes (BACs) may also be injected (reviewed in [21]); these have proven particularly useful in complementation studies or 'rescue' experiments for identifying genes contributing to a genetically mapped disease trait of interest [22]. By contrast, DNA introduced into ES cells in culture can undergo site-specific, homology-directed recombination [23], thus enabling the generation of targeted gain- and loss-of-function alleles as well as the engineering of large-scale rearrangements of entire mouse chromosomes (Figure 1) [24, 25]. Other recently developed techniques include transgenic small hairpin RNAs (shRNAs), which are often delivered by lentiviral transgenesis [26, 27], single-stranded oligonucleotides (ssODNs; reviewed in [28]), and zinc-finger nucleases (ZFNs) [29], which can be used to generate subtle sequence-specific genomic modifications. Here we will address in more detail a few of these technologies, focusing on recent advances uniquely available to mouse geneticists.

Figure 1
figure 1

Gene targeting strategies used in mouse ES cells. Targeting is achieved by recombination (black crosses) between homology arms (red lines). (a) A knockout vector replaces an entire gene with a selection cassette containing drug resistance (DR), enabling the selection of successfully targeted ES cell clones. (b) A knock-in vector allows the expression of a transgene, such as LacZ or Cre, by the promoter (gray arrow) of the targeted gene. (c) Insertion vectors can interfere with splicing by disrupting a target gene by the introduction of an exon with an early termination codon or a 5' splice acceptor site (SA). They typically target the genome with a single crossover event. (d) A conditional allele with directional DNA sequences (LoxP, green triangles) either side of a critical exon. Recombination between the sites will result in a null allele. (e) LoxP sites can also be targeted megabases apart, either side of a larger cluster of genes, enabling chromosome engineering. (f) Heterospecific Lox sites, such as LoxP and Lox511, are targeted by the site-specific recombinase Cre. Recombinase-mediated cassette exchange (RMCE) enables the efficient swapping of one targeted cassette containing incompatible target sites for another cassette flanked by an identical pair of sites. This enables the rapid generation of new alleles, such as introducing a point mutation in a critical exon.

ES cell gene-targeting

ES cell technology has been a profound advance in mouse genetics (detailed in [30]). Historically, the majority of manipulations have been performed in ES cells derived from 129 sub-strains (Table 1). Recently, robust and highly germline-competent ES cells derived from the popular C57BL/6 strains have been developed, such as JM8 and C2 (Table 1). To assist in tracking the contribution of these ES cells to chimerism, and to identify mice that have transmitted their genome through the germline, a dominant Agouti (yellow) coat color allele was engineered in JM8 cells [31]. This now enables the study of mutant alleles on a common, controlled genetic background without the need for generations of backcrossing.

Table 1 Commonly used ES cell lines for generating genetically modified mice

Gene targeting in mouse ES cells can be achieved by homologous recombination, using replacement, insertion or knock-in vectors, all of which contain a region of homology with the locus to be targeted. In replacement vectors, crucial exons (or entire genes) are replaced by a selection cassette to generate a null knockout allele (Figure 1a). Knock-in vectors are designed such that a transgene or reporter is transcriptionally regulated by the endogenous promoter of the locus (Figure 1b; reviewed in [32]). By contrast, insertion vectors rely on gene rearrangement by interfering with splicing to disrupt a target gene (Figure 1c). Significant resources are available for obtaining suitable genomic DNA for targeting vector construction, including genome-wide end-sequenced BAC libraries for C57BL/6J-derived [33] and 129-derived strains [7]. Homology arms (the part of the vector that aligns with the genome to facilitate recombination) were typically generated by restriction digest of large DNA fragments or by PCR amplification, but increasingly 'recombineering' technologies are being used [34], which make it possible to engineer virtually any mutation into the mouse genome with base pair resolution. In addition, customized targeting vectors can be generated on a contract basis by several companies.

Gene modification with conditions

Conditional gene modification is used to enable spatial and/or temporal control over the modification of the gene of interest. To this end, site-specific recombinase (SSR) systems are used, including Cre-LoxP, Flp-FRT, φC31 integrase-attB/attP and most recently Dre-rox [35]. For a comprehensive review of the use of site-specific recombinases for manipulation of the mouse genome, see [36]. The DNA sequences that the SSRs recognize are typically directional and can either flank the target DNA for excision from the genome or be used to invert segments of DNA. SSRs can be used for the generation of single gene knockouts or rearrangements, and for chromosome engineering on a megabase scale (Figure 1d,e) [37, 38].

SSRs can be expressed from endogenous promoters (as shown in Figure 1b) and in a tissue- or cell-specific manner. This is particularly useful when studying the organ-specific function of genes that are widely expressed and essential for embryonic development. For example, a conditional allele of Sox9, a gene implicated in campomelic dysplasia in humans, is necessary to study its function in cartilage in mice because germline deletion of Sox9 results in perinatal lethality [39]. For somatic mutagenesis, inducible gene-modification systems may be used. These systems allow temporal 'inducible' control of SSR expression. There are several inducible expression systems available, including tetracycline [40], LacZ [41], and the tamoxifen-inducible systems [42]. These systems have been invaluable in studying genes and neural circuits involved in learning and memory, by turning genes and cellular markers 'on' or 'off' during controlled time periods (reviewed in [43]), and in a range of other biological systems.

There are now over 500 tissue- or cell-specific Cre recombinase mice (some of which are inducible) documented in databases such as Cre-Zoo and Cre-X-mice (Table 2) [44]. However, as conditional modification technologies become increasingly sophisticated, the potential for non-specific effects, from mis-regulation of the targeted gene to incomplete recombination by the SSR, must remain a consideration [45, 46]. For example, a recent study highlighted the potential for protein expression from episomal products of Cre recombinase-excised genes, particularly when deletion occurs in cells that have a low population turnover [47].

Table 2 Resources generated from large-scale mouse genetics projects

Recombinase-mediated cassette exchange

Using homologous recombination to introduce genetic material into a desired genetic location in the mouse genome is not always straightforward. The efficiency is often dependent on the nature of the genomic target site and on the design of the targeting vector. Therefore, the ability to efficiently introduce secondary modifications to already successfully targeted cassettes is advantageous. Recombinase-mediated cassette exchange (RMCE) is a process in which site-specific recombinases exchange one gene cassette, flanked by a pair of incompatible target sites, for another cassette flanked by an identical pair of sites (Figure 1f) [48]. Apart from the naturally occurring heterotypic SSR sites (attB and attP for φC31), several variant sites have been developed for Cre and Flp, providing the required heterospecificity crucial for RMCE (for example, LoxP/Lox511 and FRT/FRT3; see [49] for a complete list). In RMCE, typically one cassette is present in the host genome, whereas the other cassette (and the recombinase) is introduced into the host ES cell by electroporation, chemical-mediated or adenoviral-mediated gene transfer [50]. Transient expression of the recombinase will direct integration of the SSR site-flanked cassette, which can then be selected by drug resistance. RMCE-based techniques are proving to be useful in the rapid production of custom allelic series [51]: they have recently been used to compare the impact of different tumor-associated mutations in p53 [52], and to study the effect of multiple enhancer elements on the expression of a targeted cassette [53].

Transposons for mutagenesis

Unlike most of the methods described so far, which allow manipulation of the genome with base pair precision, transposable elements provide the power to molecularly tag, and therefore rapidly map, random mutagenic events. The application of transposons to the field of mouse genetics has become possible only in the past decade. So far, four distinct DNA transposons have been shown to function in mice: Tol2, Minos, Sleeping Beauty (SB) and PiggyBac (PB) (reviewed in [17]), with the latter two being the most widely used. DNA transposons use a 'cut-and-paste' transposition mechanism. When both the transposase enzyme and a transposon vector are present in the same nucleus, the transposase can mediate excision of the transposon from the donor site and integration into another target site in the host cell genome. RNA-mediated transposition, driven by a 'copy-and-paste' mechanism, has also been introduced into mice for mutagenesis [54].

Transposons can be used for germline mutagenesis in mice (reviewed in [55]). However, this technique is inefficient for genome-wide forward genetic screens, owing to the low rate of transposition (one to three de novo insertions per gamete) and the tendency for local hopping exhibited by most of the transposons; though some researchers have taken advantage of this observation to saturate smaller genomic regions [56, 57]. So far, the most common use for transposons has been in the field of cancer genetics [17]. Retroviral insertional mutagenesis has traditionally been used to study the genetics of hematopoietic and mammary cancers (Box 2), but the study of other tumor types has been limited by viral tropism. Initial studies demonstrating the validity of transposon-mediated insertional mutagenesis (using SB) identified both known and novel cancer genes involved in sarcoma and lymphoma [58, 59]. Since then, transposons have been engineered to produce gain-of-function mutations in epithelial cells resulting in the development of a wide variety of carcinomas [60]. In addition, Cre-inducible SB transposase alleles can restrict mutagenesis to specific tissues, permitting studies into colorectal cancer and hepatocellular carcinoma [61, 62]. More recently, PB has been used for somatic mutagenesis, representing another tool for cancer gene discovery in the mouse [63].

Transposons can also be used to generate transgenic mice by loading them with genetic cargo. SB, PB and Tol2 are all efficient in delivering large transgenes, up to 70 kb in size [64]. PB has also been used together with SSR technology to generate large-scale rearrangements of the mouse genome, including duplications, deletions, and translocations [65]. Recently, transposons have been used to deliver the reprogramming factors required for generating induced pluripotent stem (iPS) cells [66, 67].

Gene trap mutagenesis

Gene trapping in mouse ES cells is an efficient method for mutagenesis of the mammalian genome. Insertion of a gene trap vector can disrupt gene function and/or report gene expression, and because these vectors integrate into the genome they provide a convenient tag that facilitates the identification of their insertion site. A typical strategy involves electroporating into ES cells a vector containing a 5' splice acceptor that splices to the upstream exon of the trapped gene, and thus the endogenous promoter of the trapped gene is used to drive the expression of the reporter gene [18]. However, the vector can also be delivered by retroviral infection, or transposon-mediated insertion and identification of the trap insertion sites in the resultant ES cell clones performed by splinkerette PCR (detailed in [68]).

Recent developments in trapping technology involve the use of 'conditional traps', which enable the induced modification of trap alleles, in vitro or in vivo, using SSRs, and using RMCE to exchange trapped vectors with other functional cassettes [69]. Gene trapping strategies have also been successfully developed to screen for genes that have specific expression patterns ('enhancer traps' [70]) or are acting in specific biological pathways ('induction trapping' [71, 72]). Another approach to direct gene trapping toward genes in a specific pathway is to perform a phenotypic screen in ES cells. However, most insertions will cause heterozygous mutations (which will generate detectable phenotypes only for haploinsufficient genes). One strategy to overcome this has been to use ES cells that have a deficiency in the Bloom (Blm) DNA helicase. These cells show high levels of mitotic recombination, which facilitates the generation of homozygosity in cell lines from colonies carrying heterozygous mutations [73].

'ES cell-driven' mouse production

Another advantage that the mouse has over other model organisms is in the rapid generation of mutant mice using ES cell-driven approaches. These enable the production of mice that are entirely, or almost entirely, derived from ES cells without the requirement for germline transmission. These approaches involve the injection of ES cells into eight cell embryos or a process called tetraploid complementation and allow the generation of mutant mice in weeks rather than months [74, 75]. By combining these approaches with shRNA-mediated knockdown, several groups have shown that it is possible to rapidly generate knockdown mice for the analysis of somatic gene function [76, 77]. Mice somatically overexpressing genes in an inducible and regulated way have also been developed using these approaches [78].

Mouse genetics on a grand scale

The success of the genome sequencing consortia over the past two decades established a model for further large-scale, collaborative projects aimed at functionally characterizing genomes (Table 2). Examples include the International Knockout Mouse Consortium (IKMC), and its constituent regional projects, which collectively aim to generate mutant alleles for every protein-coding gene in the mouse genome and to make the resources available to the scientific community [4]. Researchers can now search the IKMC website and acquire, at minimal cost, mice or ES cells that lack a gene of interest [79], thereby accelerating the path from a gene of interest to mutant mouse line. By May 2011, IKMC had over 16,000 ES cell lines with mutations in protein coding genes. Many of these alleles are 'knockout first' alleles, which are designed to introduce a LacZ expression marker into a target gene, and the allele can be tailored by using Cre and Flp to generate null and conditional alleles, respectively [80].

In parallel, a number of past and ongoing standardized phenotyping projects have documented traits in inbred strains and mutant lines for phenotypes relevant to human disease, including the Mouse Phenome Project, the European Mouse Disease Clinic (EUMODIC) and the Mouse Genetics Project (MGP; based at the Wellcome Trust Sanger Institute); see also Table 2[8184]. The results from many screens are made available online, enabling researchers to identify potentially interesting phenotypes for detailed analysis (Figure 2). For example, primary MGP analysis of mice lacking the gene Slx4 identified a number of developmental and DNA instability phenotypes. Detailed secondary analysis revealed the mouse to phenocopy a new sub-type of the human genetic illness, Fanconi anemia [8587].

Figure 2
figure 2

Phenotypic screening of genetically modified mice. Mutant mice available to the community are systematically screened for a range of phenotypes by the Sanger Mouse Genetics Project and the data published online [10]. Some examples of observed phenotypes are shown. (a) An outwardly protruding xiphoid process in Fam73btm1a(KOMP)Wtsihomozygous mice (top), compared with wild-type controls (bottom). (b) The formation of uroliths (bladder stones) in Clnd16tm1a(KOMP)Wtsihomozygous mice (top), not present in controls (bottom). (c) Underdeveloped molars in Sparctm1a(EUCOMM)Wtsihomozygous mice (top), compared with control mice (bottom). (d) Abnormal skeletal muscle in Zfp106tm1a(KOMP)Wtsihomozygous mice (top), compared with controls (bottom). (e) A shortened, upturned snout in Smc3tm1a(EUCOMM)Wtsiheterozygous mice. (f) Metacarpophalangeal joint fusion in Dnase1l2tm1(KOMP)Wtsihomozygous mice. (g) Spots of retinal hyperpigmentation in Slc9a8tm1a(KOMP)Wtsihomozygous mice. (h) LacZ reporter gene expression in the adult mammary gland of Myh9tm1a(EUCOMM)Wtsiheterozygous mice.

In an effort to identify quantitative trait loci (QTLs), large stocks of genetically heterogeneous (HS) mice have been generated [88, 89]. Individual mice have been phenotyped and genotyped to facilitate high-precision QTL mapping. The Collaborative Cross (CC) is a resource that is using a similar strategy by interbreeding eight strains of mice to generate around 300 new inbred lines [90], which, unlike HS mice, are being cryopreserved for posterity. It is estimated that the CC will capture approximately 90% of the genetic variability in laboratory mice and will allow the mapping of genetic networks that underlie complex diseases. Moreover, the progenitor strains of the CC were selected for sequencing in the Mouse Genomes Project (Box 1), which should allow the QTLs identified by phenotyping CC mice to be rapidly resolved into a list of candidate variants. When complete, the CC will mark a new era in the discovery of the molecular basis of complex traits in the mouse. Meanwhile, large-scale phenotyping of the strains developed so far is well under way. Finally, EuTRACC is a project to generate ES cells that carry a targeted tandem affinity purification tag (TAP-tag). Initially this will be several hundred transcription factor genes, but this is an effort that is likely to extend genome-wide. This resource will facilitate mass spectrometry of native protein complexes to better understand the mouse 'interactome' [91].

Towards the future

Mouse genetics has a bright future. Genome-wide association studies have identified hundreds of alleles statistically associated with human disease, which now demand detailed functional analysis. Early examples suggest the mouse will be the ideal model in moving from genetic association studies to understanding molecular mechanisms leading to complex disease [92]. The ablation of a large proportion of the coding mouse genome within the next 5 years, at least in ES cells, should rapidly accelerate these studies.

The modular design of modern gene targeting cassettes, together with SSRs and RMCE, makes for an incredibly flexible system of genetic engineering in mice. This is establishing the mouse as a leading model in scientific disciplines previously dominated by work in simpler organisms. For example, gene targeting combined with channelrhodopsin, which allows the control of neural activation using light [93], allows the visualization, and fine manipulation, of precise neural circuits in the mammalian brain that was until recently only possible in Drosophila and C. elegans [94, 95].

However, there are also challenges ahead. A significant number of regions across the mouse genome, typically those containing clusters of highly homologous, tightly arrayed genes, are not amenable to efficient gene targeting. Moreover, the same loci are often difficult to sequence, with some lacking complete coverage even in the high quality reference genome [96]. Thus, as much as 5 to 10% of the functional mouse genome may fall through the cracks of the present large-scale projects unless new technologies, or clever combinations of current technologies, are developed and used to investigate these genes. Nevertheless, the mouse is likely to remain the non-human vertebrate with the most sequenced, and best studied, genome for the foreseeable future. Together, the advances described here will underpin an understanding of mouse genetics within the current decade unthinkable to CC Little when he first began generating inbred lines over a century ago [1].

Box 1: A genome for all reasons

The 17 strains being sequenced as part of the Mouse Genomes Project were carefully selected to support other major mouse genetics resources. Three 129 strains were chosen because they serve as the background for thousands of existing gene knock-outs. The C57BL/6N strain is the origin of the highly germline-competent JM8 ES cells that are being used in large-scale gene targeting programs [31]. Nine common lab strains were chosen because of their historical utility, and also because they include the progenitors of the heterogeneous stock and Collaborative Cross mice that are used in dissecting complex traits [88, 89]. Finally, four wild-derived strains have been sequenced because they represent some of the founder sub-species of many inbred laboratory lines, and are also important models of cancer and infection resistance [2].

Box 2: Exploiting viruses in mouse genetics

The first transgenic mice were generated by infecting embryos with viruses [97], and today viral vectors remain an integral part of the mouse genetics toolkit. Lentiviruses integrate their genome into the host's DNA, making them an effective transgene delivery vector. The lentiviral genome, derived from immunodeficiency viruses, has been deconstructed and distributed across multiple plasmids to minimize the potential formation of replication-competent viruses [98]. A transgene of interest may be included in a plasmid containing a viral packaging signal. This is co-transfected into a cell line (typically human embryonic kidney HEK293T cells) with other plasmids expressing proteins required for viral production, such as envelope proteins. Viruses produced in this way can be introduced into oocytes for transgenesis (reviewed in [99]). In its simplest form, this method necessitates only a few weeks between target selection and phenotypic analysis, offering a distinct advantage over other approaches. To enable pooled loss-of-function screens to identify complex genetic interactions, lentiviral short hairpin RNA (shRNA) libraries targeting most mouse genes have been generated [27]. Some groups have recently used ultrasound-guided microinjections of lentiviruses to deliver genes to organs and tissues of early mammalian embryos in utero [100].

Slow transforming retroviruses have been widely used to generate mouse models of cancer [101]. They can re-infect the same cell, randomly inserting their genome into the host DNA multiple times, resulting in an accumulation of mutations. This process of progressive mutagenesis recapitulates the multi-step progression of human tumorigenesis (reviewed in [102]). The development of next-generation sequencing technologies has dramatically enhanced the process of identifying retroviral insertion sites, and databases, such as the Retroviral Tagged Cancer Gene Database, have been developed to map insertion sites to the reference genome [103].