Abstract
We have created a high-density SNP resource encompassing 7.87 million polymorphic loci across 49 inbred mouse strains of the laboratory mouse by combining data available from public databases and training a hidden Markov model to impute missing genotypes in the combined data. The strong linkage disequilibrium found in dense sets of SNP markers in the laboratory mouse provides the basis for accurate imputation. Using genotypes from eight independent SNP resources, we empirically validated the quality of the imputed genotypes and demonstrated that they are highly reliable for most inbred strains. The imputed SNP resource will be useful for studies of natural variation and complex traits. It will facilitate association study designs by providing high-density SNP genotypes for large numbers of mouse strains. We anticipate that this resource will continue to evolve as new genotype data become available for laboratory mouse strains. The data are available for bulk download or query at http://cgd.jax.org/.
Similar content being viewed by others
Website references
References
Abe K, Noguchi H, Tagawa K, Yuzuriha M, Toyoda A et al (2004) Contribution of Asian mouse subspecies Mus musculus molossinus to genomic constitution of strain C57BL/6J, as defined by BAC-end sequence-SNP analysis. Genome Res 14:2439–2447
Bogue MA (2003) Mouse Phenome Project: understanding human biology through mouse genetics and genomics. J Appl Physiol 95:1335-1337
Cervino AC, Li G, Edwards S, Zhu J, Laurie C et al (2005) Integrating QTL and high-density SNP analyses in mice to identify Insig2 as a susceptibility gene for plasma cholesterol levels. Genomics 86:505–517
Churchill GA (1989) Stochastic models for heterogeneous DNA sequences. Bull Math Biol 51:79–94
Churchill GA, Airey DC, Allayee H, Angel JM, Attie AD et al (2004) The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nat Genet 36:1133–1137
DiPetrillo K, Wang X, Stylianou L, Pagien B (2005) Bioinformatics toolbox for narrowing rodent quantitative trait loci. Trends Genet 21:684–692
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis (Cambridge, UK: Cambridge University Press)
Drosophila 12 Genomes Consortium (2007) Evolution of genes and genomes on the Drosophila phylogeny. Nature 450:203–218
Frazer KA, Wade CM, Hinds DA, Patil N, Cox DR et al (2004) Segmental phylogenetic relationships of inbred mouse strains revealed by fine-scale analysis of sequence variation across 4.6 Mb of mouse genome. Genome Res 14:1493–1500
Frazer KA, Eskin E, Kang HM, Bogue MA, Hinds DA et al (2007) A sequence-based variation map of 8.27 million SNPs in inbred mouse strain. Nature 448:1050–1053
Guenet JL, Bohomme F (2003) Wild mice: an ever-increasing contribution to a popular mammalian model. Trends Genet 19:24–31
Ideraabdullah FY, de la Casa-Esperon E, Bell TA, Detwiler DA, Magnuson T et al (2004) Genetic and haplotype diversity among wild derived mouse inbred strains. Genome Res 14:1880–1887
Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664
Kimmel G, Shamir R (2005) A block-free hidden Markov model for genotypes and its application to disease association. J Comput Biol 12:1243
Liao G, Wang J, Guo J, Allard J, Cheng J et al 2004. In silico genetics: identification of a functional element regulating H2-Ealpha gene expression. Science 306:690–695
Lyon MF, Rastan S, Brown SDM (eds.) (1996) Genetic variants and strains of the laboratory mouse, 3rd ed. (Oxford, UK: Oxford Univeristy Press)
McClurg P, Janes J, Wu C, Delano DL, Walker JR et al (2007) Genomewide association analysis in diverse inbred mice: power and population structure. Genetics 176:675–683
Mott R (2007) A haplotype map for the laboratory mouse. Nat Genet 39:1054–1056
Mural RJ, Adams MD, Myers EW, Smith HO, Miklos GL et al (2002) A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science 296:1661–1671
Payseur BA, Hoekstra HE (2005) Signatures of reproductive isolation in patterns of single nucleotide diversity across inbred strains of mice. Genetics 171:1905–1016
Payseur BA, Place M (2007) Prospects for association mapping in classical inbred mouse strains. Genetics 175:1999–2008
Pletcher MT, McClurg P, Batalov S, Su AI, Barnes SW et al (2004) Use of a dense single nucleotide polymorphism map for in silico mapping in the mouse. PLoS Biol 2:2159–2169
Petkov PM, Ding Y, Cassell MA, Zhang W, Wagner G et al (2004) An efficient SNP system for mouse genome scanning and elucidating strain relationships. Genome Res 14:1806–1811
Petkov PM, Graber JH, Churchill GA, DiPetrillo K, King BL et al (2005) Evidence of a large-scale functional organization of mammalian chromosomes. PLoS Genet 1:e33
Roberts A, McMillan L, Wang W, Parker J, Rusyn I et al (2007) Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows. Bioinformatics 23:i401
Roberts A, Pardo-Manuel de Villena F, Wang W, McMillan L, Threadgill DW (2007) The polymorphism architecture of mouse genetic resources elucidated using genome-wide resequencing data: implications for QTL discovery and systems genetics. Mamm Genome 18:473–481
Siebert SK, Schadt EE (2007) Moving toward a system genetics view of disease. Mamm Genome 18:389–401
Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78:129
Shifman S, Bell JT, Copley RR, Taylor MS, Williams RW et al (2006) A high-resolution single nucleotide polymorphism genetic map of the mouse genome. PLoS Biol 4:e395
Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Information Theory 13:260–269
Wade CM, Daly MJ (2005) Genetic variation in laboratory mice. Nat Genet 37:1175–1180
Wade CM, Kulbokas EJ 3rd, Kirby AW, Zody MC, Mullikin JC et al (2002) The mosaic structure of variation in the laboratory mouse genome. Nature 420:574–578
Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562
Wiltshire T, Pletcher MT, Batalov S, Barnes SW, Tarantino LM et al (2003) Genome-wide single-nucleotide polymorphism analysis defines haplotype patterns in mouse. Proc Natl Acad Sci U S A 100:3380–3385
Yalcin B, Fullerton J, Miller S, Keays DA, Brady S et al (2004) Unexpected complexity in the haplotypes of commonly used inbred strains of laboratory mice. Proc Natl Acad Sci. U S A 101:9734–9739
Yang H, Bell TA, Churchill GA, Pardo-Manuel de Villena F (2007) On the subspecific origin of the laboratory mouse. Nat Genet 39:1100–1107
Acknowledgments
This work was supported by the U.S. National Institutes of General Medical Sciences as part of the Center of Excellence in Systems Biology (1P50 GM076468). The authors thank Tim Wiltshire for sharing genotyping data prior to its publication and Jesse Hammer and Susan Moxley for graphics assistance.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Szatkiewicz, J.P., Beane, G.L., Ding, Y. et al. An imputed genotype resource for the laboratory mouse. Mamm Genome 19, 199–208 (2008). https://doi.org/10.1007/s00335-008-9098-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00335-008-9098-9