References
Aylor DL, Valdar W, Foulds-Mathes W et al (2011) Genetic analysis of complex traits in the emerging Collaborative Cross. Genome Res 21:1213–1222. doi:10.1101/gr.111310.110
Bailey JA, Gu Z, Clark RA et al (2002) Recent segmental duplications in the human genome. Science 297:1003–1007. doi:10.1126/science.1072047
Bailey JA, Baertsch R, Kent WJ et al (2004) Hotspots of mammalian chromosomal evolution. Genome Biol 5:R23. doi:10.1186/gb-2004-5-4-r23
Baker CL, Kajita S, Walker M et al (2015) PRDM9 drives evolutionary erosion of hotspots in Mus musculus through haplotype-specific initiation of meiotic recombination. PLoS Genet 11:e1004916. doi:10.1371/journal.pgen.1004916
Bauer MJ, Cox AJ, Rosone G et al (2013) Lightweight algorithms for constructing and inverting the BWT of string collections. Theor Comput Sci 483:134–148. doi:10.1016/j.tcs.2012.02.002
Baum LE, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat 37:1554–1563
Beck JA, Lloyd S, Hafezparast M et al (2000) Genealogies of mouse inbred strains. Nat Genet 24:23–25. doi:10.1038/71641
Benjamini Y, Hochberg Y et al (1995) Controlling the false-discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300
Bennett BJ, Farber CR, Orozco L et al (2010) A high-resolution association mapping panel for the dissection of complex traits in mice. Genome Res 20:281–290. doi:10.1101/gr.099234.109
Boursot P, Auffray JC, Britton-Davidian J, Bonhomme F et al (1993) The evolution of house mice. Annu Rev Ecol Syst 24:119–152
Broman KW, Wu H, Sen S, Churchill GA et al (2003) R/qtl: QTL mapping in experimental crosses. Bioinformatics 19:889–890
Calaway JD, Lenarcic AB, Didion JP et al (2013) Genetic architecture of skewed X inactivation in the laboratory mouse. PLoS Genet 9:e1003853. doi:10.1371/journal.pgen.1003853
CCC et al (2012) The genome architecture of the Collaborative Cross mouse genetic reference population. Genetics 190:389–401. doi:10.1534/genetics.111.132639
Chaisson MJ, Pevzner PA (2008) Short read fragment assembly of bacterial genomes. Genome Res 18:324–330. doi:10.1101/gr.7088808
Chesler EJ et al (2014) Out of the bottleneck: the Diversity Outcross and Collaborative Cross mouse populations in behavioral genetics research. Mamm Genome 25:3–11. doi:10.1007/s00335-013-9492-9
Church DM, Schneider VA, Steinberg KM et al (2015) Extending reference assembly models. Genome Biol 16:13. doi:10.1186/s13059-015-0587-3
Churchill GA, Doerge RW et al (1994) Empirical threshold values for quantitative trait mapping. Genetics 138:963–971
Churchill GA, Airey DC, Allayee H et al (2004) The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nat Genet 36:1133–1137. doi:10.1038/ng1104-1133
Clark AG, Hubisz MJ, Bustamante CD et al (2005) Ascertainment bias in studies of human genome-wide polymorphism. Genome Res 15:1496–1502. doi:10.1101/gr.4107905
Cook MN, Bolivar V, McFadyen MP, Flaherty L et al (2002) Behavioral differences among 129 substrains: implications for knockout and transgenic mice. BehavNeurosci 116:600–611. doi:10.1037/0735-7044.116.4.600
Crowley JJ, Zhabotynsky V, Sun W et al (2015) Analyses of allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance. Nat Genet. doi:10.1038/ng.3222
Daetwyler HD, Calus MPL, Pong-Wong R et al (2013) Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics 193:347–365. doi:10.1534/genetics.112.147983
Didion JP, Yang H, Sheppard K et al (2012) Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias. BMC Genom 13:34. doi:10.1186/1471-2164-13-34
Didion JP, de Villena FP-M et al (2013) Deconstructing Mus gemischus: advances in understanding ancestry, structure, and variation in the genome of the laboratory mouse. Mamm Genome 24:1–20. doi:10.1007/s00335-012-9441-z
Dobzhansky T et al (1936) Studies on hybrid sterility. II Localization of sterility factors in Drosophila pseudoobscura hybrids. Genetics 21:113–135
Ferguson B, Ram R, Handoko HY et al (2014) Melanoma susceptibility as a complex trait: genetic variation controls all stages of tumor progression. Oncogene. doi:10.1038/onc.2014.227
Ferris MT, Aylor DL, Bottomly D et al (2013) Modeling host genetic regulation of influenza pathogenesis in the Collaborative Cross. PLoS Pathog 9:e1003196. doi:10.1371/journal.ppat.1003196
Flicek P, Ahmed I, Amode MR et al (2013) Ensembl 2013. Nucleic Acids Res 41:D48–D55. doi:10.1093/nar/gks1236
Forejt J, Ivanyi P et al (1974) Genetic studies on male sterility of hybrids between laboratory and wild mice (Mus musculus L.). Genet Res 24:189–206
Frazer KA, Eskin E, Kang HM et al (2007) A sequence-based variation map of 8.27 million SNPs in inbred mouse strains. Nature 448:1050–1053. doi:10.1038/nature06067
Fu C-P, Welsh CE, Villena FP-M de, McMillan L et al (2012) Inferring ancestry in admixed populations using microarray probe intensities. In: Proceedings of the ACM conference on bioinformatics, computational biology and biomedicine—bCB’12. ACM Press, New York, pp 105–112
Gatti DM, Svenson KL, Shabalin A et al (2014) Quantitative trait locus mapping methods for Diversity Outbred mice. G3(4):1623–1633. doi:10.1534/g3.114.013748
Gelman A, Hill J et al (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge
Geraldes A, Basset P, Gibson B et al (2008) Inferring the history of speciation in house mice from autosomal, X-linked, Y-linked and mitochondrial genes. Mol Ecol 17:5349–5363. doi:10.1111/j.1365-294X.2008.04005.x
Ghazalpour A, Rau CD, Farber CR et al (2012) Hybrid Mouse Diversity Panel: a panel of inbred mouse strains suitable for analysis of complex genetic traits. Mamm Genome 23:680–692. doi:10.1007/s00335-012-9411-5
Gonzales NM, Palmer AA et al (2014) Fine-mapping QTLs in advanced intercross lines and other outbred populations. Mamm Genome 25:271–292. doi:10.1007/s00335-014-9523-1
Good JM, Dean MD, Nachman MW et al (2008) A complex genetic basis to X-linked hybrid male sterility between two species of house mice. Genetics 179:2213–2228. doi:10.1534/genetics.107.085340
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652. doi:10.1038/nbt.1883
Grubb SC, Bult CJ, Bogue MA et al (2014) Mouse phenome database. Nucleic Acids Res 42:D825–D834. doi:10.1093/nar/gkt1159
Haley CS, Knott SA et al (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315–324. doi:10.1038/hdy.1992.131
Harrow J, Denoeud F, Frankish A et al (2006) GENCODE: producing a reference annotation for ENCODE. Genome Biol 7(Suppl 1):S4 1–S4 9. doi:10.1186/gb-2006-7-s1-s4
Holt J, McMillan L et al (2014) Merging of multi-string BWTs with applications. Bioinformatics 30:3524–3531. doi:10.1093/bioinformatics/btu584
Huang S, Holt J, Kao C-Y et al (2014) A novel multi-alignment pipeline for high-throughput sequencing data. Database 2014:bau057. doi:10.1093/database/bau057
Hudson RR, Kaplan NL et al (1985) Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147–164
Iraqi FA, Athamni H, Dorman A et al (2014) Heritability and coefficient of genetic variation analyses of phenotypic traits provide strong basis for high-resolution QTL mapping in the Collaborative Cross mouse genetic reference population. Mamm Genome 25:109–119. doi:10.1007/s00335-014-9503-5
Kang HM, Zaitlen NA, Wade CM et al (2008) Efficient control of population structure in model organism association mapping. Genetics 178:1709–1723. doi:10.1534/genetics.107.080101
Karolchik D, Barber GP, Casper J et al (2014) The UCSC genome browser database: 2014 update. Nucleic Acids Res 42:D764–D770. doi:10.1093/nar/gkt1168
Keane TM, Goodstadt L, Danecek P et al (2011) Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477:289–294. doi:10.1038/nature10413
Kelada SNP, Aylor DL, Peck BCE et al (2012) Genetic analysis of hematological parameters in incipient lines of the Collaborative Cross. G3 2:157–165. doi:10.1534/g3.111.001776
Kelada SNP, Carpenter DE, Aylor DL et al (2014) Integrative genetic analysis of allergic inflammation in the murine lung. Am J Respir Cell Mol Biol 51:436–445. doi:10.1165/rcmb.2013-0501OC
Lenarcic AB, Svenson KL, Churchill GA, Valdar W et al (2012) A general Bayesian approach to analyzing diallel crosses of inbred strains. Genetics 190:413–435. doi:10.1534/genetics.111.132563
Lippert C, Listgarten J, Liu Y et al (2011) FaST linear mixed models for genome-wide association studies. Nat Methods 8:833–835. doi:10.1038/nmeth.1681
Liu EY, Zhang Q, McMillan L et al (2010) Efficient genome ancestry inference in complex pedigrees with inbreeding. Bioinformatics 26:i199–i207. doi:10.1093/bioinformatics/btq187
Liu EY, Morgan AP, Chesler EJ et al (2014) High-resolution sex-specific linkage maps of the mouse reveal polarized distribution of crossovers in male germline. Genetics 197:91–106. doi:10.1534/genetics.114.161653
McLaren W, Pritchard B, Rios D et al (2010) Deriving the consequences of genomic variants with the Ensembl API and SNP effect predictor. Bioinformatics 26:2069–2070. doi:10.1093/bioinformatics/btq330
Mott R, Talbot CJ, Turri MG et al (2000) A method for fine mapping quantitative trait loci in outbred animal stocks. Proc Natl Acad Sci USA 97:12649–12654. doi:10.1073/pnas.230304397
Munger SC, Raghupathy N, Choi K et al (2014) RNA-seq alignment to individualized genomes improves transcript abundance estimates in multiparent populations. Genetics 198:59–73. doi:10.1534/genetics.114.165886
Orth A, Adama T, Din W, Bonhomme F et al (1998) Natural hybridization between two subspecies of the house mouse, Mus musculus domesticus and Mus musculus castaneus, near Lake Casitas, California. Genome 41:104–110
Patro R, Mount SM, Kingsford C (2014) Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat Biotechnol 32:462–646. doi:10.1038/nbt.2862
Petkov PM, Ding Y, Cassell MA et al (2004) An efficient SNP system for mouse genome scanning and elucidating strain relationships. Genome Res 14:1806–1811. doi:10.1101/gr.2825804
Phillippi J, Xie Y, Miller DR et al (2014) Using the emerging Collaborative Cross to probe the immune system. Genes Immun 15:38–46. doi:10.1038/gene.2013.59
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286
Rasmussen AL, Okumura A, Ferris MT et al (2014) Host genetic diversity enables Ebola hemorrhagic fever pathogenesis and resistance. Science. doi:10.1126/science.1259595
Rogala AR, Morgan AP, Christensen AM et al (2014) The Collaborative Cross as a resource for modeling human disease: CC011/Unc, a new mouse model for spontaneous colitis. Mamm Genome 25:95–108. doi:10.1007/s00335-013-9499-2
She X, Cheng Z, Zöllner S et al (2008) Mouse segmental duplication and copy number variation. Nat Genet 40:909–914. doi:10.1038/ng.172
Simecek P, Churchill GA, Yang H et al (2015) Genetic analysis of substrain divergence in NOD mice. G3(5):771–775. doi:10.1534/g3.115.017046
Soh YQS, Alföldi J, Pyntikova T et al (2014) Sequencing the mouse Y chromosome reveals convergent gene acquisition and amplification on both sex chromosomes. Cell 159:800–813. doi:10.1016/j.cell.2014.09.052
Svenson KL, Gatti DM, Valdar W et al (2012) High-resolution genetic mapping using the mouse Diversity Outbred population. Genetics 190:437–447. doi:10.1534/genetics.111.132597
Taylor BA, Heiniger HJ, Meier H et al (1973) Genetic analysis of resistance to cadmium-induced testicular damage in mice. Proc Soc Exp Biol Med 143:629–633
Ursin E (1952) Occurrence of voles, mice, and rats (Muridae) in Denmark, with a special note on a zone of intergradation between two subspecies of the house mouse (Mus musculus L.). Vid Medd Dansk Naturhist Foren 114:217–244
Valdar W, Flint J, Mott R et al (2006a) Simulating the Collaborative Cross: power of quantitative trait loci detection and mapping resolution in large sets of recombinant inbred strains of mice. Genetics 172:1783–1797. doi:10.1534/genetics.104.039313
Valdar W, Solberg LC, Gauguier D et al (2006b) Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat Genet 38:879–887. doi:10.1038/ng1840
Valdar W, Holmes CC, Mott R, Flint J et al (2009) Mapping in structured populations by resample model averaging. Genetics 182:1263–1277. doi:10.1534/genetics.109.100727
Wade CM, Kulbokas EJ, Kirby AW et al (2002) The mosaic structure of variation in the laboratory mouse genome. Nature 420:574–578. doi:10.1038/nature01252
Wall JD, Pritchard JK et al (2003) Haplotype blocks and linkage disequilibrium in the human genome. Nat Rev Genet 4:587–597. doi:10.1038/nrg1123
Wang J, Moore KJ, Zhang Q et al (2010) Genome-wide compatible SNP intervals and their properties. In: Proceedings of the first aCM international conference on bioinformatics and computational biology—bCB’10. ACM Press, New York, p 43
Wang JR, de Villena FP-M, Lawson HA et al (2012a) Imputation of single-nucleotide polymorphisms in inbred mice using local phylogeny. Genetics 190:449–458. doi:10.1534/genetics.111.132381
Wang JR, de Villena FP-M, McMillan L et al (2012b) Comparative analysis and visualization of multiple collinear genomes. BMC Bioinform 13(Suppl 3):S13. doi:10.1186/1471-2105-13-S3-S13
Waterston RH, Lindblad-Toh K, Birney E et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562. doi:10.1038/nature01262
Weiser M, Mukherjee S, Furey TS et al (2014) Novel distal eQTL analysis demonstrates effect of population genetic architecture on detecting and interpreting associations. Genetics 198:879–893. doi:10.1534/genetics.114.167791
Williams RW, Gu J, Qi S, Lu L et al (2001) The genetic structure of recombinant inbred mice: high-resolution consensus maps for complex trait analysis. Genome Biol 2:46. doi:10.1186/gb-2001-2-11-research0046
Williams RW, Bennett B, Lu L et al (2004) Genetic structure of the LXS panel of recombinant inbred mouse strains: a powerful resource for complex trait analysis. Mamm Genome 15:637–647. doi:10.1007/s00335-004-2380-6
Wilming LG, Gilbert JGR, Howe K et al (2008) The vertebrate genome annotation (Vega) database. Nucleic Acids Res 36:D753–D760. doi:10.1093/nar/gkm987
Yang H, Bell TA, Churchill GA, de Villena FPM et al (2007) On the subspecific origin of the laboratory mouse. Nat Genet 39:1100–1107. doi:10.1038/ng2087
Yang H, Ding Y, Hutchins LN et al (2009) A customized and versatile high-density genotyping array for the mouse. Nat Methods 6:663–666. doi:10.1038/nmeth.1359
Yang H, Wang JR, Didion JP et al (2011) Subspecific origin and haplotype diversity in the laboratory mouse. Nat Genet 43:648–655. doi:10.1038/ng.847
Zhang Z, Wang W, Valdar W et al (2014) Bayesian modeling of haplotype effects in multiparent populations. Genetics 198:139–156. doi:10.1534/genetics.114.166249
URLs
BAGPIPE. http://valdarlab.unc.edu/software/bagpipe
BAGPHENOTYPE. http://valdarlab.unc.edu/bagphenotype.html
Collaborative Cross Status website. http://www.csbio.unc.edu/CCstatus/
Collaborative Cross Viewer. http://www.csbio.unc.edu/CCstatus/index.py?run=CCV
DOQTL. http://www.bioconductor.org/packages/release/bioc/html/DOQTL.html
GECCO gene expression browser. http://csbio.unc.edu/gecco/
MDA genotypes for 100 inbred strains. http://cgd.jax.org/datasets/popgen/diversityarray/yang2011.shtml
MegaMUGA genotypes for CC founder strains. http://csbio.unc.edu/CCstatus/index.py?run=GeneseekMM
modtools + lapels + suspenders pipeline. http://www.csbio.unc.edu/CCstatus/index.py?run=Pseudo
Mouse Imputation Resource. http://csbio.unc.edu/imputation/
Mouse Phylogeny Viewer. http://msub.csbio.unc.edu/
Sanger Mouse Genomes Project. http://www.sanger.ac.uk/resources/mouse/genomes/
Searchable index of sequencing reads from CC founder strains. http://www.csbio.unc.edu/CEGSseq/index.py?run=MsbwtTools
Seqnature. https://github.com/jaxcs/Seqnature
Acknowledgments
The development of the CC population and related tools at the UNC Systems Genetics Core Facility was supported by Grants from the National Institutes of Health (U01CA134240, P50MH090338, P50HG006582, and U54AI081680); Ellison Medical Foundation (Grant AG-IA-0202-05), and National Science Foundation (Grants IIS0448392 and IIS0812464). Essential support was provided by the Dean of the UNC School of Medicine, the UNC Mutant Mouse Regional Resource Center (U42OD010924), the Lineberger Comprehensive Cancer Center at UNC (U01CA016086 from the National Cancer Institute), and the University Cancer Research Fund from the state of North Carolina. Development of the Mouse Diversity Array was supported by NIH Grant P50GM076468. Support for development of the MegaMUGA array was provided by Neogen Corporation, Lincoln, NE. APM was supported by Grants T32GM067553 and F30MH103925. The authors thank Darla Miller, the UNC Systems Genetics Group, and the Center for Genome Dynamics at the Jackson Laboratory for helpful discussions; and Fernando Pardo-Manuel de Villena and Leonard McMillan for their mentorship and for their comments on this manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendix: terms and definitions
Appendix: terms and definitions
Relatedness Relatedness in the genetic sense refers to the proportion of alleles shared between two individuals. The degree to which two individuals are genetically related depends on the number of common ancestors they share and the number of generations which have elapsed since they shared them. A pedigree describes the expected relatedness between individuals: first-degree relatives (parents or siblings) share, on average, half of their alleles; second-degree relatives (grandparents) one-fourth; and so on. With dense genotype data, we can instead compute realized relatedness as the proportion of shared, unlinked alleles.
Using dense genotypes, we can define relatedness both at the genome-wide and at the local scale. In the presence of admixture or introgression (see below), local relatedness in different regions of the genome may deviate from the genome-wide average.
Population structure A population is “structured” when it has experienced deviations from random mating, or equivalently, when it is divided into subpopulations with restricted genetic exchange between them. In a structured population, some groups of individuals are more closely related to (share more alleles with) each other than with other groups. Geography and mating behavior generate at least some degree of structure in most natural populations. Population structure in laboratory mouse strains is widespread: for instance, the 129 and C57BL strain groups form a genetic cluster distinct from so-called “Swiss mice” including FVB/NJ, the NOD substrains, and ICR outbred stock (Beck et al. 2000). Failure to account for population structure can lead to false-positive QTL in genetic mapping of complex traits.
Linkage disequilibrium (LD) Two loci are said to be in LD if the frequencies of pairwise genotypes depart from those expected if alleles were sampled randomly at each locus. LD is decreased by recombination, and therefore generally decreases with time and with physical distance between loci. Unlinked markers are expected to be in linkage equilibrium, but non-random mating can produce “long-range” LD between unlinked loci in structured populations.
Haplotype block A haplotype block is a chromosomal segment in which there is no evidence for recombination during the history of a sample of individuals. Within a block, individuals in a population can be collapsed into one of a small (relative to the population size) number of ancestral haplotypes (Wall et al. 2003). LD is relatively high between loci within a block, but relatively low between loci in adjacent blocks.
Although many schemes have been proposed for defining haplotype blocks, the one discussed in this review is the four-gamete test (Hudson et al. 1985). Consider two loci A and B with alleles A,a and B,b, respectively. There are four possible haploid genotypes (gametes)—AB, aB, Ab, and ab—and if all four are observed in a sample, recombination between A and B must have occurred at least once in the past.
Haplotype blocks are a useful means of investigating patterns of genetic diversity at intermediate timescales since a common ancestor, such as among classical inbred strains of mice (Yang et al. 2011). But because recombination events accumulate and LD decreases with time, haplotype blocks shared between two individuals with a common ancestor far in the past—for example, a wild-derived inbred strain and a classical laboratory strain—will be very short. For this reason, haplotype blocks were not inferred for the wild mice and wild-derived strains in Yang et al. (2011).
Identity by descent (IBD) A chromosomal segment is shared identical-by-descent between two individuals if it was inherited from their common ancestor without recombination. The notion of IBD is closely related to the haplotype block.
Admixture Admixture refers to inter-breeding between individuals from populations which were previously genetically isolated from one another. Admixture facilitates gene flow between populations, and in the process creates heterogeneity of relatedness across the genome.
Introgression Introgression refers to the introduction of a chromosomal segment from one population into a separate, genetically distinct population. It is often used to describe gene flow between species or subspecies which can still form fertile hybrids. Unlike admixture, which describes ongoing inter-breeding, introgression describes events which are episodic in nature. In this review, we refer to genetic exchange between mouse subspecies, which do not interbreed in the wild except at narrow hybrid zones (Ursin 1952), as introgression.
Ancestry inference Broadly speaking, an ancestry-inference procedure steps along the genome of an individual and attempts to assign each segment to one of a few ancestral clusters. These clusters may represent ancestral population groups, for samples from natural populations, or founder haplotypes in laboratory populations. Examples of ancestry inference discussed in this review include assignment of subspecific origin in wild mice (Yang et al. 2011), which labels genomic regions with one of three subspecies; and haplotype reconstruction on the CC and DO (Fu et al. 2012), which assigns genomic regions to one of those populations’ 8 founder strains.
Hidden Markov model (HMM) A hidden Markov model is a probabilistic model which describes how an observed sequence can be generated from an underlying, unknown sequence of “hidden states” (Baum and Petrie 1966; Rabiner 1989). Efficient algorithms can be used to “decode” the sequence of hidden states given an observed sequence. In this review, we discuss HMMs in which the observed sequences are genotypes along a chromosome, and the hidden states are founder haplotypes.
Rights and permissions
About this article
Cite this article
Morgan, A.P., Welsh, C.E. Informatics resources for the Collaborative Cross and related mouse populations. Mamm Genome 26, 521–539 (2015). https://doi.org/10.1007/s00335-015-9581-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00335-015-9581-z