Skip to main content
Log in

Informatics resources for the Collaborative Cross and related mouse populations

  • Published:
Mammalian Genome Aims and scope Submit manuscript

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

URLs

  • BAGPIPE. http://valdarlab.unc.edu/software/bagpipe

  • BAGPHENOTYPE. http://valdarlab.unc.edu/bagphenotype.html

  • Collaborative Cross Status website. http://www.csbio.unc.edu/CCstatus/

  • Collaborative Cross Viewer. http://www.csbio.unc.edu/CCstatus/index.py?run=CCV

  • DOQTL. http://www.bioconductor.org/packages/release/bioc/html/DOQTL.html

  • GECCO gene expression browser. http://csbio.unc.edu/gecco/

  • MDA genotypes for 100 inbred strains. http://cgd.jax.org/datasets/popgen/diversityarray/yang2011.shtml

  • MegaMUGA genotypes for CC founder strains. http://csbio.unc.edu/CCstatus/index.py?run=GeneseekMM

  • modtools + lapels + suspenders pipeline. http://www.csbio.unc.edu/CCstatus/index.py?run=Pseudo

  • Mouse Imputation Resource. http://csbio.unc.edu/imputation/

  • Mouse Phylogeny Viewer. http://msub.csbio.unc.edu/

  • Sanger Mouse Genomes Project. http://www.sanger.ac.uk/resources/mouse/genomes/

  • Searchable index of sequencing reads from CC founder strains. http://www.csbio.unc.edu/CEGSseq/index.py?run=MsbwtTools

  • Seqnature. https://github.com/jaxcs/Seqnature

Download references

Acknowledgments

The development of the CC population and related tools at the UNC Systems Genetics Core Facility was supported by Grants from the National Institutes of Health (U01CA134240, P50MH090338, P50HG006582, and U54AI081680); Ellison Medical Foundation (Grant AG-IA-0202-05), and National Science Foundation (Grants IIS0448392 and IIS0812464). Essential support was provided by the Dean of the UNC School of Medicine, the UNC Mutant Mouse Regional Resource Center (U42OD010924), the Lineberger Comprehensive Cancer Center at UNC (U01CA016086 from the National Cancer Institute), and the University Cancer Research Fund from the state of North Carolina. Development of the Mouse Diversity Array was supported by NIH Grant P50GM076468. Support for development of the MegaMUGA array was provided by Neogen Corporation, Lincoln, NE. APM was supported by Grants T32GM067553 and F30MH103925. The authors thank Darla Miller, the UNC Systems Genetics Group, and the Center for Genome Dynamics at the Jackson Laboratory for helpful discussions; and Fernando Pardo-Manuel de Villena and Leonard McMillan for their mentorship and for their comments on this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Catherine E. Welsh.

Appendix: terms and definitions

Appendix: terms and definitions

Relatedness Relatedness in the genetic sense refers to the proportion of alleles shared between two individuals. The degree to which two individuals are genetically related depends on the number of common ancestors they share and the number of generations which have elapsed since they shared them. A pedigree describes the expected relatedness between individuals: first-degree relatives (parents or siblings) share, on average, half of their alleles; second-degree relatives (grandparents) one-fourth; and so on. With dense genotype data, we can instead compute realized relatedness as the proportion of shared, unlinked alleles.

Using dense genotypes, we can define relatedness both at the genome-wide and at the local scale. In the presence of admixture or introgression (see below), local relatedness in different regions of the genome may deviate from the genome-wide average.

Population structure A population is “structured” when it has experienced deviations from random mating, or equivalently, when it is divided into subpopulations with restricted genetic exchange between them. In a structured population, some groups of individuals are more closely related to (share more alleles with) each other than with other groups. Geography and mating behavior generate at least some degree of structure in most natural populations. Population structure in laboratory mouse strains is widespread: for instance, the 129 and C57BL strain groups form a genetic cluster distinct from so-called “Swiss mice” including FVB/NJ, the NOD substrains, and ICR outbred stock (Beck et al. 2000). Failure to account for population structure can lead to false-positive QTL in genetic mapping of complex traits.

Linkage disequilibrium (LD) Two loci are said to be in LD if the frequencies of pairwise genotypes depart from those expected if alleles were sampled randomly at each locus. LD is decreased by recombination, and therefore generally decreases with time and with physical distance between loci. Unlinked markers are expected to be in linkage equilibrium, but non-random mating can produce “long-range” LD between unlinked loci in structured populations.

Haplotype block A haplotype block is a chromosomal segment in which there is no evidence for recombination during the history of a sample of individuals. Within a block, individuals in a population can be collapsed into one of a small (relative to the population size) number of ancestral haplotypes (Wall et al. 2003). LD is relatively high between loci within a block, but relatively low between loci in adjacent blocks.

Although many schemes have been proposed for defining haplotype blocks, the one discussed in this review is the four-gamete test (Hudson et al. 1985). Consider two loci A and B with alleles A,a and B,b, respectively. There are four possible haploid genotypes (gametes)—AB, aB, Ab, and ab—and if all four are observed in a sample, recombination between A and B must have occurred at least once in the past.

Haplotype blocks are a useful means of investigating patterns of genetic diversity at intermediate timescales since a common ancestor, such as among classical inbred strains of mice (Yang et al. 2011). But because recombination events accumulate and LD decreases with time, haplotype blocks shared between two individuals with a common ancestor far in the past—for example, a wild-derived inbred strain and a classical laboratory strain—will be very short. For this reason, haplotype blocks were not inferred for the wild mice and wild-derived strains in Yang et al. (2011).

Identity by descent (IBD) A chromosomal segment is shared identical-by-descent between two individuals if it was inherited from their common ancestor without recombination. The notion of IBD is closely related to the haplotype block.

Admixture Admixture refers to inter-breeding between individuals from populations which were previously genetically isolated from one another. Admixture facilitates gene flow between populations, and in the process creates heterogeneity of relatedness across the genome.

Introgression Introgression refers to the introduction of a chromosomal segment from one population into a separate, genetically distinct population. It is often used to describe gene flow between species or subspecies which can still form fertile hybrids. Unlike admixture, which describes ongoing inter-breeding, introgression describes events which are episodic in nature. In this review, we refer to genetic exchange between mouse subspecies, which do not interbreed in the wild except at narrow hybrid zones (Ursin 1952), as introgression.

Ancestry inference Broadly speaking, an ancestry-inference procedure steps along the genome of an individual and attempts to assign each segment to one of a few ancestral clusters. These clusters may represent ancestral population groups, for samples from natural populations, or founder haplotypes in laboratory populations. Examples of ancestry inference discussed in this review include assignment of subspecific origin in wild mice (Yang et al. 2011), which labels genomic regions with one of three subspecies; and haplotype reconstruction on the CC and DO (Fu et al. 2012), which assigns genomic regions to one of those populations’ 8 founder strains.

Hidden Markov model (HMM) A hidden Markov model is a probabilistic model which describes how an observed sequence can be generated from an underlying, unknown sequence of “hidden states” (Baum and Petrie 1966; Rabiner 1989). Efficient algorithms can be used to “decode” the sequence of hidden states given an observed sequence. In this review, we discuss HMMs in which the observed sequences are genotypes along a chromosome, and the hidden states are founder haplotypes.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Morgan, A.P., Welsh, C.E. Informatics resources for the Collaborative Cross and related mouse populations. Mamm Genome 26, 521–539 (2015). https://doi.org/10.1007/s00335-015-9581-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00335-015-9581-z

Keywords

Navigation