The Collaborative Cross at Oak Ridge National Laboratory: developing a powerful resource for systems genetics
- First Online:
- Cite this article as:
- Chesler, E.J., Miller, D.R., Branstetter, L.R. et al. Mamm Genome (2008) 19: 382. doi:10.1007/s00335-008-9135-8
- 989 Downloads
Complex traits and disease comorbidity in humans and in model organisms are the result of naturally occurring polymorphisms that interact with each other and with the environment. To ensure the availability of resources needed to investigate biomolecular networks and systems-level phenotypes underlying complex traits, we have initiated breeding of a new genetic reference population of mice, the Collaborative Cross. This population has been designed to optimally support systems genetics analysis. Its novel and important features include a high level of genetic diversity, a large population size to ensure sufficient power in high-dimensional studies, and high mapping precision through accumulation of independent recombination events. Implementation of the Collaborative Cross has been ongoing at the Oak Ridge National Laboratory (ORNL) since May 2005. Production has been systematically managed using a software-assisted breeding program with fully traceable lineages, performed in a controlled environment. Currently, there are 650 lines in production, and close to 200 lines are now beyond their seventh generation of inbreeding. Retired breeders enter a high-throughput phenotyping protocol and DNA samples are banked for analyses of recombination history, allele drift and loss, and population structure. Herein we present a progress report of the Collaborative Cross breeding program at ORNL and a description of the kinds of investigations that this resource will support.
History of the Collaborative Cross
The Collaborative Cross (CC) is a large, multiparental, recombinant inbred (RI) strain panel that was motivated by the need among the mouse genetics community for a high-precision genetic resource that could serve as a common integration point for the multitude of mouse genetic studies that were sure to follow in the wake of the complete sequencing of the mouse and human genomes. The concept for this common mouse genetic reference population was first proposed at the Edinburgh meeting of the International Mouse Genome Conference in October of 2001 and in print by founding members (Threadgill et al. 2002) of the Complex Trait Consortium (CTC).
An RI strain panel provides significant advantages as a resource because it is a reproducible population for cumulative data integration. Existing RI panels have limited statistical power due to their small size and capture only limited allelic diversity because all current RI sets originate from only two inbred progenitor strains. A large RI panel derived from multiple strains could capture significantly more genetic diversity and would provide sufficient power and resolution for genetic dissection of polygenic traits and construction of systems genetic networks. The CC breeding design was proposed as a strategy to rapidly and randomly mix the genomes of eight founder strains to create independent breeding lines (Churchill et al. 2004). Five classical inbred strains (A/J, C57BL/6 J, 129S1/SvImJ, NOD/LtJ, NZO/H1LtJ) and three wild-derived strains (CAST/EiJ, PWK/PhJ, and WSB/EiJ) were selected to be the eight founders of the CC. Analysis of the allelic variation in mouse inbred strains demonstrates that the eight CC founder strains capture on average 90% of the known allelic diversity across all 1-Mb intervals spanning the entire mouse genome (Roberts et al. 2007). Simulations of the power (Valdar et al. 2006a) and precision (Broman 2005) of genetic mapping with the CC population indicated superior performance to alternative strategies and provided guidelines for sample sizes.
Construction of the Collaborative Cross at ORNL
To initiate construction of the CC, the eight progenitor strains, referred to as the G0 generation, were first intercrossed to generate 56 possible G1 hybrid combinations. G1 progeny are crossed to create the four-way G2 generation and a G2 × G2 cross yields the first eight-way progeny, the G2:F1 s. G2:F1 s are then propagated by sib-mating through the G2:Fn generations until they are fully inbred, at approximately G2:F22 (Broman 2005). Each resulting independent CC breeding funnel will be a unique and independent combination of the eight founder genomes. A major goal of the ORNL implementation of this breeding scheme has been to minimize clustering of recombination sites that result from strain pair-specific hotspots found in most mating designs (Kelmenson et al. 2005), and selection for single or multiple loci associated with viability, behavioral, and fertility traits by using the breeding software described below. The G0 progenitor strains were obtained from The Jackson Laboratory (TJL), and the G1 animals were produced either at TJL or at ORNL from stock obtained directly from TJL. The colony is restocked periodically to avoid drift in the progenitor lines.
Design for balance
With this systematic breeding design, the genetic contribution of each of the eight CC founder strains to each line is equivalent and when averaged across all CC lines, the allele frequency at each locus will ideally be 0.125 (1/8). While this is automatically true for autosomal loci, it is not necessarily true for mitochondrial genomes and sex chromosomes. Therefore, of the possible funnels, a balanced set is chosen to produce equal contributions for each single factor (X, Y, and mitochondria) as well as for all pairwise combinations of factors across all the breeding funnels.
A second way in which progenitor position determines genetic contribution is through the contribution of X chromosomal material. Males contribute their X chromosome to female offspring and their Y chromosome to male offspring. Females contribute their X chromosomes. In the cross denoted above, a female is chosen from the A × B cross, and a male is chosen from the C × D cross. The female has X chromosomal material from parents in both the A and B position, but the male only inherits an X chromosome from the parent in the C position.
Finally, the pairing of progenitors in the earliest generations may confer specific biases in the location and density of recombinations. Meiotic recombination events are cumulative. Those recombinations that occur in early generations are retained in subsequent generations. Additional recombinations occur in each subsequent generation and have the potential to accumulate detectably in each generation prior to inbreeding. In a particular segment of DNA, if alleles are similar in the two progenitors of a cross (identical by state) or fixed through inbreeding (identical by descent), recombination events are not detectable because identical strands of DNA are being broken and recombined. The accumulation of genetic recombination is influenced by genetic background (Kelmenson et al. 2005), and recombinations do not occur with equal probability across the genome. Consequently, each of the eight progenitor strains should occupy the A through H positions an approximately equal number of times when the entire CC panel is considered. Furthermore, it is desirable to balance higher-order combinations of mitochondrial and Y-chromosomal material by balancing all two-way combinations of these factors. Lastly, balance of the combinations of parents in the first generation of crosses will minimize potential bias in recombination accumulation.
Pairwise balance of the founder strain combinations is achieved when all 56 pairwise combinations occur at equal frequency across the set of funnels. This will reduce the impact of systematic allele incompatibilities within and across loci. The variance of the number of lines across 56 pairwise combinations, therefore, gives a single number by which to evaluate pairwise balance in a set of funnels. The controlled pairwise combinations are discussed in the following subsections.
Chr Y-mitochondria combinations
Progenitors A and H, respectively, contribute mitochondria and Y chromosome to the funnel line, and progenitors E and D, respectively, contribute mitochondria and Y chromosome to the reciprocal line. Y-mitochondria combinations will be balanced if all 56 pairwise combinations of strains appear with equal frequency in the progenitor pairs (A:H) and (D:E).
Chr X-Chr Y combinations
Since five progenitors (A, B, C, E, or F in the funnel) may contribute X chromosomes to a line, strain frequency must be averaged over ten two-strain progenitor combinations, with double weight for two combinations (C:E and C:F in the funnel), to evaluate the balance of X-Y combinations.
Genomes of strains paired in the first generation have an early opportunity for recombination (Broman 2005). To balance any effect of this opportunity, pairwise strain combinations can be balanced over the four matings in the first generation involving progenitor pairs (A:B), (C:D), (E:F), and (G:H).
Balanced CC design schemes are created using customized software (CC8scheme) which systematically tests all available funnel pairs to optimize over the above parameters. CC8scheme builds a design scheme by stepwise addition of the funnel pair that would most improve the balance of the existing set of funnel pairs. The resulting designs avoid using strain combinations known to be infertile or unproductive while still achieving the best possible balance. The current design avoids (NZO × CAST) and (NZO × PWK) hybrids, which are reproductively incompatible, and (PWK × 129) males, which are infertile.
At each generation, two to seven matings are started from a randomly selected litter from the previous generation within each funnel. Normally, a litter from the first-priority mating will provide the next generation. If the first-priority mating fails to produce a litter by the time that one of the other matings has produced a second litter, then the lower-priority litter will be used for the next generation. Thus, litter selection is pseudorandom, since selection for fecundity is allowed only if the survival of the line is threatened. Once a litter is chosen, mice within the litter are randomly assigned mates from available siblings, avoiding inadvertent selection for docility or other behavioral characteristics.
Status of the Collaborative Cross at ORNL
Characterizing the Collaborative Cross
Phenotype distributions and heritability
Genotyping the Collaborative Cross
Several genotyping efforts are underway, including the Tennessee Mouse Genome Consortium and DOE-funded effort to genotype and characterize the G2:F7 generation. In this project one female and one male of each line will be genotyped on a custom array of 13,000 SNPs that uniquely identify all eight progenitor haplotypes at over 1200 regions of the genome. This single-generation cross section will enable the analysis of population structure, recombination rate, and detection of systematically linked loci. Any such loci that are identified should be the result of actual biological selection, because the CCDB software-assisted breeding has eliminated many effects of human selection for docility and reproductive behavioral effects. A second effort underway will entail in-depth genotyping and phenotyping of individuals from all extant strains at various generations.
The future of the Collaborative Cross at ORNL
The CC mice and derivatives of their breeders are currently available through collaborative arrangements with the ORNL Mouse Genetics Research Facility. Plans are being made to ensure that finished lines will be available from a network of phenotyping centers, each of which will have all inbred CC lines on site. CC funnels will continue to be initiated and maintained until the target population size of 1000 CC lines has been met or exceeded. Archiving of strains through cryopreservation will be performed as inbreeding is advanced. The early studies performed on intermediate generations will provide a valuable resource for integrative genetics and genomics, and will yield a demonstration of the utility of the CC. As the CC genotypes are generated and regions of residual heterozygosity identified, mapping studies can be undertaken in their progeny.
The Collaborative Cross at ORNL is supported by grants from The Ellison Medical Foundation, the Department of Energy (Field Work Proposal ERKP804 “Mouse Genetics and Mutatgenesis for Functional Genomics”), and the National Institutes of Health (U01CA134240). CCWorks development was supported by P41 HG001656, U01 CA105417, P20 DA021131, and the Center of Genomics and Bioinformatics at UTHSC. The MouseTrack System was originally developed with support from the National Institutes of Health (5U01MH061971). The authors gratefully acknowledge the superb technical efforts of the ORNL staff, including Sarah Shinpock, Lori Easter, Ginger Shaw, Carmen Foster, Jason Spence, Melissa Beckmann, K. T. Cain, and Patricia R. Hunsicker, all at the ORNL, without whose diligence this project would not be possible.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.