Molecular Characterization of Mutant Mouse Strains Generated from the EUCOMM/KOMP-CSD ES Cell Resource
The Sanger Mouse Genetics Project generates knockout mice strains using the EUCOMM/KOMP-CSD embryonic stem (ES) cell collection and characterizes the consequences of the mutations using a high-throughput primary phenotyping screen. Upon achieving germline transmission, new strains are subject to a panel of quality control (QC) PCR- and qPCR-based assays to confirm the correct targeting, cassette structure, and the presence of the 3′ LoxP site (required for the potential conditionality of the allele). We report that over 86 % of the 731 strains studied showed the correct targeting and cassette structure, of which 97 % retained the 3′ LoxP site. We discuss the characteristics of the lines that failed QC and postulate that the majority of these may be due to mixed ES cell populations which were not detectable with the original screening techniques employed when creating the ES cell resource.
The EUCOMM/KOMP-CSD collection, along with those generated by Regeneron, and the Canadian NorComm programme form the International Mouse Knockout Consortium (IKMC) resource (Collins et al. 2007; Ringwald et al. 2011; Bradley et al. 2012) and are the main source of ES cells used for mouse production by the International Mouse Phenotyping Consortium (IMPC) (Brown and Moore 2012).
The goal of the IMPC is to generate knockout strains for all protein-coding genes in the mouse on a pure C57BL/6N genetic background, and to elucidate gene function by use of a broad-spectrum high-throughput primary phenotyping screen. These phenotypes can then be studied in more depth by the scientific community at large within specialized areas of interest.
The aims of the IMPC overlap with the Wellcome Trust Sanger Mouse Genetics Project (Sanger MGP) (White et al. 2013) which was formed in 2006 to generate and phenotype 200 mutant mouse strains per year using a battery of tests designed to detect changes in a variety of systems, including metabolism, dysmorphology, behaviour, cardiovascular, immunity, visual and auditory response, viability, and homozygous lethality (Ayadi et al. 2012). Strains are available to the scientific community directly from Sanger Institute while colonies are actively breeding, and from the European Mutant Mouse Archive (Wilkinson et al. 2010) or KOMP Repository (Lloyd 2011) once archived. The primary phenotypic data are also readily available at the Sanger Mouse Portal (http://www.sanger.ac.uk/mouseportal).
At the time of writing, the EUCOMM/KOMP-CSD ES clone collection consisted of targeted clones for 12,350 genes, 56 % of the 22,147 CCDS (Pruitt et al. 2009) gene models present in Ensembl (Flicek et al. 2012). The resource was generated by use of a high-throughput modular gateway-based vector construction and positive–negative selection for high-efficiency targeting in ES cells (Skarnes et al. 2011). Clones were then screened by long-range PCR and sequencing to confirm targeting and the presence of the 3′ loxP site that is required for the conditionality of the mutant allele. Although this approach is appropriate for a high-throughput pipeline in terms of cost and speed, it does have its limitations. For example, long-range PCR is likely to miss mutations within the cassette and is not able to detect mixed ESC populations. As the resource is exploited to generate mouse lines, it will be important to ascertain the molecular structure of the alleles transmitted to mice.
Here we present a detailed and extensive molecular characterization of the mutant alleles in mouse strains generated from the resource. We demonstrate that although the majority of the mouse lines produced by Sanger MGP from the EUCOMM/KOMP-CSD collection are correct, some problematic events were detected. We have developed a set of quality control (QC) criteria and assays to screen out affected strains as early as possible following germline transmission of the incorrect alleles.
Materials and Methods
The care and use of all mice in this study were in accordance with the UK Home Office regulations, UK Animals (Scientific Procedures) Act of 1986, and were approved by the Wellcome Trust Sanger Institute Ethical Review Committee.
A Minimum Standard for Mouse QC
IKMC minimum allele QC standards
QC test (at least one per category)
Confirm targeting of the allele
Southern blot with neo or external probe
ESC or mice
Loss of wild-type allele (LoA) qPCR
5′ and 3′ LRPCR
Absence of a WT-specific short-range PCR (srPCR) product in homozygous mice
Gene expression analysis on mRNA or protein
Confirm structure of the cassette
srPCR on various parts of the cassette (e.g., mutant-specific srPCR, lacZ, neo, cassette ends, neo, or lacZ count by qPCR
Confirm conditionality of the tm1a allele
Gene-specific or universal srPCR to detect the loxP site 3′ to the CE
Confirm absence of additional insertions
Southern blot with neo probe
ESC or mice
neo or lacZ count by qPCR + vector backbone PCR
High-Throughput Genotyping and QC Tests used by the Sanger Institute MGP
Further details of the tests used at the Sanger Institute, including primer sequences and reaction conditions, can be found in Supplementary Information S1 and also in the IKMC knowledge base (http://www.knockoutmouse.org/kb/2).
During the period between September 2006 and November 2011, a total of 731 EUCOMM/KOMP ESC clones were microinjected (582 MGP, 94 EUMODIC, and 48 KOMP2-funded) and subsequently achieved germline transmission, of which 632 mouse colonies (86 %) passed QC.
Analysis of Lines that Failed QC
Correct Gene Targeting (Gene id and Mutation Structure)
QC failures mouse colonies
Reason for QC failure
No. of lines
% total lines
5′ end of cassette missing and incorrect targeting
Incorrect neo count
Incorrect neo count and incorrect targeting
5′ end of cassette missing
Most cases of QC failures involving the cassette structure were due to a deletion of the 5′ end. To investigate whether the size of the deletion was variable or from a fixed point, a tiling PCR assay covering the length of the L1L2_Bact_P cassette (the most frequently used in the EUCOMM/KOMP-CSD resource) was designed and tested. We found that the amount of genetic material deleted was not constant between the QC-failed lines tested; for example, the 2210012G02Rik EPD0131_3_F05 line showed a deletion of the splice acceptor and most of the IRES element (Supplementary Information S2), whereas Myo10 EPD0272_4_C10 had a deletion of the entire cassette up to the neo selection marker. Some issues were also observed internal to the cassette; Btbd11 EPD0463_1_A11 was shown to carry a deletion of 929 bp located 940 bp 3′ of the lacZ gene initiation site. The original “final vector” DNA and four alternative ESC clones were checked and did not carry this deletion, suggesting that it occurred during electroporation and subsequent homologous recombination (Supplementary Information S3).
We found that the cell line used also had a significant effect on the subsequent QC status. Colonies from the JM8.F6 cell line showed 50 % fewer QC failures (12/159, P = 0.0216), whereas those from the JM8A1.N3 cell line produced over twice as many failures than the average failure rate (13/43, 30 %, P = 0.0182). This suggests that the JM8A1.N3 cells may have a greater proportion of mixed populations of targeted and nontargeted clones compared to the other lines, and the nontargeted cells then go on to constitute the germ cells of the chimeras. The JM8A3.N1 and JM8.N4 cell lines did not show a significant difference (P = 0.202 and P = 0.582, respectively).
No significant difference (P = 0.123) was detected with the type of mutation used for the allele from conditional-ready designs (91/608 lines) compared to deletion-based designs (8/27 lines).
Loss of the 3′ loxP Site
The 3′ loxP site can be lost from the mutant allele during homologous recombination as it is embedded within the 3′ homology arm and at a distance from the selection cassette. From a total of 600 of “knockout-first conditional ready” tm1a lines tested, we found 18 (3 %) that did not carry the 3′ loxP site that was detected during the original ES cell production screen, possibly due to mixed ESC colonies. As expected, this event is ESC clone-specific, and, thus, differences in presence/absence of the 3′ loxP site in different clones from the same electroporation for the same gene were observed in some cases where duplicate microinjections were performed. For example, the genes Mlec, Smyd5, and Pabpc4: mice derived from clones EPD0600_1_A06, EPD0027_5_G01, and EPD0025_3_C07 did not possess a 3′ loxP site, whereas those derived from EPD0600_1_H03, EPD0027_5_A02, and EPD0025_3_C08 did. These results highlight the need to reconfirm the presence of the 3′ loxP site in the mice generated from the ESC resource if conditional mutants are needed in a downstream research. Lines that do not possess the 3′ loxP site are still useful and are made available to the scientific community by Sanger MGP, but as tm1e “targeted nonconditional” mutants.
Evidence for Mixed ESC Populations
The discordance between the targeting screens performed at ESC clone production and the subsequent failure rate in mouse colonies may be due to (1) a mixture of targeted and nontargeted clones in the ESC population (where the nontargeted cell contamination preferentially contributes to the germline in the chimera), (2) a higher than expected false-positive PCR rate in the ESC screening during production, or (3) incomplete assessment of the ES cells resulting in structural and targeting issues being missed even if the cell population was pure.
An additional long-range PCR QC step on the ES cells based on either the 5′ or the 3′ homology arm of the mutant allele prior to microinjection did not reduce the subsequent failure rate in mouse colonies. This was unexpected and suggests that mixed-cell populations are a major factor; the end-point-based LRPCR reaction detects the targeted cells but does not give information that nontargeted cells are also present.
Further evidence for mixed populations of ESC colonies was detected in a small number of mouse colonies, where transmission of two different alleles was detected by analysis of the G1 (chimera × C57BL/6N Taconic) animals. In most cases, these originated from the different chimeras [examples include Tmem126a EPD0409_3_A09 (Supplementary Information S4), Slc25a21 EPD0085_1_D04, G3bp2 EPD0598_4_D01, Ide EPD0158_4_G09, and Mtap2 EPD0416_2_A02]. These incorrect alleles were not selected for further expansion. Multiple-targeting events were also observed originating from the same chimera [e.g., Srrm4 EPD0538_3_A07 (Supplementary Information S4), Bai1 EPD0675_3_C01, and Rftn2 EPD0176_4_A01], where some offspring showed correct targeting and cassette structure and other heterozygous littermates did not.
Evidence for incorrect targeting events of the mutant allele is exemplified by Tcf7l2 EPD0130_2_C06 and Crtc2 EPD0197_3_C08; both passed the 5′ and 3′ LRPCR QC assays in the mouse line but failed to detect a loss in copy number of the WT allele by qPCR. An additional copy of the floxed CE region was also detected, suggesting that the mutant allele had targeted the correct locus but not completely replaced the endogenous form.
These results underline the need to carefully check each G1 individual used for expanding the colony, as transmission of the incorrect allele may seriously affect the utility of the mouse line or give misleading phenotyping results. With a few additional QC steps, however, any issues discovered at this early stage can easily be filtered out and the correctly targeted mice then used to expand the colony. Although these mixed events were a small percentage (~2 %) of the overall numbers of lines produced, they can result in a disproportionate amount of effort and costs needed to correct them once the colony has expanded, if they are detected at all.
However, one incorrect clone does not mean all clones for that gene are incorrect; in some cases where lines had failed QC, alternative clones were microinjected and subsequently passed. For example, the gene Trim66: mice derived from ESC clone EPD0027_3_D06 failed targeting QC (LoA qPCR failed, homozygotes by qPCR not confirmed by srPCR), whereas mice derived from clone EPD0155_5_A11 using an alternative design passed (LoA qPCR passed, homozygotes by qPCR confirmed by srPCR). Another example is the gene Twf1; mice derived from ESC clone EPD0127_5_C07 failed targeting (homozygotes by qPCR not confirmed by srPCR) and neo qPCR QC, but the line derived from EPD0127_5_E05 passed (5′ and 3′ LRPCR amplification, homozygotes by qPCR confirmed by srPCR). These experiments help validate the resource as a whole and show that even if one clone may be incorrect, others in the collection for that gene may be correctly targeted.
With all high-throughput projects there is an expected degree of trade-off between the accuracy of the resource and the rate of generation (Gerhard et al. 2004; Ryder et al. 2004). The main method used for the EUCOMM/KOMP resource in screening the ES cell clones during production was by long-range PCR and sequencing, using one primer in the cassette and one beyond the limit of the homology arms of the construct design (most frequently at the 3′ end). Although this method allows rapid detection of correct targeting, it cannot detect a mix of targeted and nontargeted clones, which would require a quantitative PCR approach or Southern blot analysis.
We found that the use of additional long-range PCR assays across the 5′ homology arm performed on ESC colonies did not provide any improvement in the transmission of correctly targeted events, which suggests that mixed ESC clones may be the cause of most of the targeting issues observed. To estimate the frequency of potentially mixed clones, we selected the subset of clones that passed additional LRPCR QC (by either the 5′ or the 3′ end) prior to microinjection and calculated how many then failed QC at the mouse stage (Supplementary Information S5). This method, of course, would not detect mixed clones which then contributed the correct cells to the mouse embryo, so this calculation may be an underrepresentation of the true value.
The reason for mixed-cell populations is most likely the practical limitations of the very-high-throughput nature of the ESC generation of the EUCOMM/KOMP-CSD project, where colonies are manually picked from culture plates; e.g., the JM8A1.N3 cells were much harder to culture and process in the laboratory, which may account for the higher percentage of mixed clones compared to the other lines. However, the contribution of this particular cell line to the total number of targeted alleles in the EUCOMM/KOMP-CSD collection is less than 15 %, compared to 60 % from the JM8A3.N1. More quantitative, yet practical, pre-microinjection QC methods such as loss-of-allele assays (Valenzuela et al. 2003) are required to reduce the transmission of incorrect alleles. QC failure does not represent a problem for the resource since in a great majority of cases there are alternative clones that can be injected for each allele. If alternative clones are not available, however, mixed clones may be rescued by subcloning. When the presence of the 3′ loxP site in “conditional ready” mutants in the collection was analysed, 97 % of strains’ genes tested displayed the expected results. The small number of conflicts with the loxP results could be due to a mixed colony of conditional and nonconditional targeted clones or a low rate of false-positive PCRs during the screening.
Our results highlight the importance of confirming the structure of the targeted mutation in strains derived from the EUCOMM/KOMP-CSD resource. Ideally, this can be achieved with Southern blot analysis of the targeted mutation using external probes. In a high-throughput environment we have replaced this technique with a suite of PCR and qPCR assays that yield the same level of QC. All QC assay results performed on mouse lines are displayed on the IKMC (www.knockoutmouse.org) and EMMA (www.emmanet.org) websites. It is important to note that genotyping mice purely by short-range PCR without reconfirming the targeting is risky; nontargeted lines may appear to be homozygous-lethal, as the WT-specific assay will always amplify a product.
Proposal for a serial code for rapid and comprehensive display of mouse QC
No confirmation beyond ESC screen/QC
No verification beyond ESC screen/QC
No verification beyond ESC screen/QC
No verification beyond ESC screen/QC
Either 5′ or 3′ LRPCR amplification of a band
Amplification using qPCR-based universal assay
srPCR based assays at various points along cassette (e.g., lacZ, neo, 5′ FRT)
Vector backbone PCR
Both 5′ and 3′ LRPCR amplification of a band
Amplification using srPCR-based universal assay
qPCR based assays at various points along cassette (e.g., lacZ, neo, 5′ FRT); exclusive or in combination with step B
neo or lacZ count qPCR plus step B
Step 3 plus end sequence confirmation
Amplification using gene-specific srPCR-assay
Amplification of PCR tiling array across whole cassette
Loss of WT allele qPCR and/or srPCR confirmation of homozygotes
Sequencing of PCR product from C or D
Genome sequencing of mouse
Southern blot or steps 3 and 5
No loxP in design or no loxP detected
Full sequencing of cassette
Steps 3 and 5 (or step 6), and gene expression analysis showing knockout/down of targeted allele
Genome sequencing of mouse
The EUCOMM/KOMP-CSD mutant ES cell collection is an extremely valuable resource for the scientific community. Our data suggest that, in the absence of any additional pre-microinjection QC, 86 % of the ESC clones that achieve GLT produce strains with correctly targeted events, and that a few simple QC assays at the G1 chimera progeny stage can rapidly screen out the majority of incorrect events (for scientists ordering ESC clones from repositories, requesting three clones should give a 99.7 % chance that at least one is correctly targeted). This will not only save money and effort, it will also help reduce the number of experimental animals used, in compliance with the 3Rs (Fenwick et al. 2011; National Centre for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs) Mission and Strategy 2012).
We thank staff from the Sanger Institute’s Research Support Facility, Mouse Genetics Project, and Mouse Informatics Group for their excellent support. Alternative ESC clones for the Btbd11 analysis were kindly supplied by Wendy Bushell and Jackie Bryant. This work was supported by the Wellcome Trust under Grant No. WT098051.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.