Collaborative Cross and Diversity Outbred data resources in the Mouse Phenome Database
- 1.5k Downloads
The Mouse Phenome Database was originally conceived as a platform for the integration of phenotype data collected on a defined collection of 40 inbred mouse strains—the “phenome panel.” This model provided an impetus for community data sharing, and integration was readily achieved through the reproducible genotypes of the phenome panel strains. Advances in the development of mouse populations lead to an expanded role of the Mouse Phenome Database to encompass new strain panels and inbred strain crosses. The recent introduction of the Collaborative Cross and Diversity Outbred mice, which share an extensive pool of genetic variation from eight founder inbred strains, presents new opportunities and challenges for community data resources. A wide variety of molecular and clinical phenotypes are being collected across genotypes, tissues, ages, environmental exposures, interventions, and treatments. The Mouse Phenome Database provides a framework for retrieval, integration, analysis, and display of these data, enabling them to be evaluated in the context of existing data from standard inbred strains. Primary data in the Mouse Phenome Database are supported by extensive metadata on protocols and procedures. These are centrally curated to ensure accuracy and reproducibility and to provide data in consistent formats. The Mouse Phenome Database represents an established and growing community data resource for mouse phenotype data and encourages submissions from new mouse resources, enabling investigators to integrate existing data into their studies of the phenotypic consequences of genetic variation.
KeywordsQuantitative Trait Locus West Nile Virus Inbred Strain Collaborative Cross Founder Strain
Understanding the causes of variation in complex disease-related phenotypes, how these traits relate to one another, and which phenotypic outcomes most resemble human disease requires detailed characterization and integration of data across phenotyping domains. Deep phenotyping of model organisms is a powerful approach to basic and translational research of human disease. The laboratory mouse is an especially efficient and versatile model system. Mice have a relatively short lifespan, 99 % of mouse genes are shared with humans (Boguski 2002), and a rich repertoire of phenotyping modalities is available to study the physiology, behavior, and genetics of mice in normal development, aging, and disease. Mice present many advantages over direct study of diseases in humans including precise control of experimental conditions, low costs, access to tissues and interventions, and repeatability of experimentation. As a result, the laboratory mouse remains the most widely studied and most well-characterized model organism.
Thousands of inbred and genetically modified strains are currently available as live animals or as cryopreserved stocks, and more are being created and phenotyped (Ringwald et al. 2011; Brown and Moore 2012). Crosses of inbred mouse strains have revealed the genetic basis of numerous complex traits through quantitative trait locus (QTL) mapping (for numerous examples, see QTL Archive at phenome.jax.org). New genetic reference populations have been developed including the Collaborative Cross (CC) inbred strains (Churchill et al. 2004; Chesler et al. 2008; Iraqi et al. 2008; Morahan et al. 2008; Welsh et al. 2012; Threadgill and Churchill 2012) and their complementary high-precision mapping population, the Diversity Outbred (DO) mice (Churchill et al. 2012; Svenson et al. 2012a; Chesler 2013). The genomes of many widely used mouse strains, including the founders of the CC and DO, have been fully sequenced (Keane et al. 2011; Yalcin et al. 2012; Wong et al. 2012; Ananda et al. 2014), and high-density genotyping arrays are available (Yang et al. 2009).
Unlike the human population, genetic variation present in the mouse has been stabilized, characterized, and segregated (both randomly and non-randomly) across a variety of different populations. The shared genetic variation in mouse populations provides a basis for data integration and a means to discover the causal genetic variants for disease-related phenotypes. Dense genotyping and sequencing technologies enable characterization of genomic similarity of individual mice and strains. Our ability to relate this complete picture of genetic variation to phenotypic observations of individual mice enables identification and validation of the genetic basis of complex, disease-related traits, increasingly with single nucleotide resolution.
Collecting data on widely used populations provides significant opportunities to extend findings through data reuse but only if the data are harmonized and integrated. Dissemination of primary mouse phenotype data is an imperative complement to research publications; however, additional effort—beyond releasing data in supplemental files—is needed to facilitate coherent integrative analyses across multiple studies.
Primary data access is crucial for three reasons: (1) integrative analysis to find consensus among diverse studies, (2) reanalysis in light of new developments, and (3) reproducibility. Unfortunately, phenotypic data often exist in diverse and sometimes non-computable stores with insufficient documentation and restricted access. With increasing potential for data integration to provide new insights, it is critical that we provide access to carefully curated data in standardized formats that allow researchers to build upon previous studies. Recent advances in meta-analysis techniques have demonstrated the value of combining primary data across multiple mouse studies (Kang et al. 2014; Bubier et al. 2014).
The Mouse Phenome Database
The Mouse Phenome Database (MPD; http://phenome.jax.org) (Grubb et al. 2014) stores harmonized primary data, including per mouse phenotypes acquired over multiple trials, measures, or conditions. Now in its 14th year, MPD collects, annotates, and disseminates quantitative phenotype data and protocols in an integrated relational database to facilitate faceted search and other capabilities. MPD provides a repository of mouse phenotype data and a suite of tools for comparative and quantitative analysis. Originally developed as a repository for phenotype data collected on a small and defined set of inbred strains (Paigen and Eppig 2000), the scope and role of MPD as a general repository for primary data collected on individual mice has expanded. MPD data are organized around a catalog of phenotype ontology terms; assays and protocols are extensively documented; analysis tools provide summary statistics and data visualization; and, importantly, the common data framework enables easy data access to and integration of data from multiple labs and experiments. Data come from investigators around the world and represent a broad scope of physiological and behavioral traits in naïve mice and those exposed to drugs, environmental agents, or other treatments. Access to phenotype data and protocols from different sources enables researchers to reproduce experiments; reanalyzes data with new algorithms and up-to-date bioinformatics resources; and discovers unexpected relationships among trait data that were collected in different times and places. The high standards of documentation and curation and the stability of the program at The Jackson Laboratory have made MPD a primary resource for investigators to archive and retrieve quantitative mouse phenotypic data and protocols.
Collaborative Cross strains, Diversity Outbred mice, and related populations
Diversity Outbred (DO) mice are derived from incipient Collaborative Cross strains and represent a complementary resource with the same allelic diversity as the CC strains, maintained in a heterogeneous stock (Churchill et al. 2012; Chesler 2013). This population is an ideal resource for genetic mapping and selective breeding studies. The high fecundity and large stock population enables sampling of extremely large numbers of unique genomes derived from the same eight founders. Over successive generations, the recombination frequency increases, resulting in ultra-high-precision mapping studies (Svenson et al. 2012a; Logan et al. 2013b; Recla et al. 2014; Kelada et al. 2014; Smallwood et al. 2014; Gatti et al. 2014; Church et al. 2014; French et al. 2015; Church et al. 2015). Each animal is genetically unique with high levels of heterozygosity, enabling precise estimation of QTL location and allelic effect. Each DO mouse requires genotyping (see below) and haplotype reconstruction for QTL analysis (Gatti et al. 2014). CC strains and DO populations have been used to map QTLs to intervals that are less than 5 and 2 Mb, respectively (Philip et al. 2011; Logan et al. 2013b).
Integration of data from DO phenotyping studies is not as direct as with the CC strains. Aggregation and integration of data across DO studies can be achieved through common phenotypes and through shared genotypes at specific genetic loci. DO mice provide a unique opportunity to evaluate the effects of a causal genetic variant in the context of a variety of genetic backgrounds.
In addition to CC inbred strains and individual DO mice, there are numerous possibilities for study of related genotypes (Fig. 1) including the eight founder inbred strains, F1 hybrids of the founder strains, F1 hybrids of CC strains (CC-RIX), and DO mice backcrossed to inbred strains. Recombinant inbred crosses of CC-RIX provide genetically retrievable and defined F1 hybrids of CC strains, which carry all the benefits of inbred genetic reference populations with the added benefits of heterozygosity. The mapping resolution of CC-RIX is theoretically identical to that of a population of CC strains due to the fixed location of recombinations in the population. Together, these advanced mouse populations make a rich and powerful genetic resource with the potential to refine trait correlations and improve trait mapping. Founder strains and DO mice are available through The Jackson Laboratory (http://www.jax.org/); CC strains are available through the UNC Systems Genetics website (http://csbio.unc.edu/CCstatus/).
Data resources for the CC and DO
Phenotype data from early CC and DO studies have been disseminated through a variety of platforms including the MPD. In order to reap the benefits of integrated access, MPD will collect and harmonize these data in a common repository. There, the new data can be directly integrated with the QTL Archive (phenome.jax.org) and inbred strain phenotypes that make up the original MPD data resource. Integration of data from the CC and DO together with data from inbred strains, crosses, and other sets of genetically defined mice provides an opportunity for new discoveries of the function of the vast numbers of genetic variants available in laboratory mice. Access to primary data from these populations will enable and encourage reanalysis of historical data.
Mouse Phenome Database (MPD) enables users to share and reanalyze data from the CC and DO and to compare results with those obtained on other mouse strains and strain panels. MPD currently houses primary phenotype data from CC founder strains, F1 hybrids of founder strains, incipient CC strains, and DO mice covering a range of phenotypic domains, including behavior, hematology, renal function, disease susceptibility, and exercise and endurance. Additional datasets will be added to MPD as they become available. Users of these resources are encouraged to contact us about potential data submissions (see below).
MPD visualization and analysis tools
From the MPD homepage, users can search or navigate data by strain, strain panel, investigator, phenotype, or ontology terms. There are several options for downloading data—from a single measurement to bulk download of the entire database. Video tutorials and tool demos are available to walk users through the most common tasks.
CC founder F1 hybrids
Each of the examples above represents a means to review and interpret the results of phenotypic characterization within populations. Several compelling studies have identified genetic variants through the integration of experiments across multiple mapping populations. These include multi-population studies of HDL levels (Burgess-Herbert et al. 2008), efforts to integrate recombinant inbred and standard inbred populations in a single analysis (Bennett et al. 2010; Ghazalpour et al. 2012), and evaluations of SNP segregation patterns across overlapping QTLs (Bubier et al. 2014). Exciting developments in genetic analysis and computation have greatly increased the rigor with which investigators can combine studies across populations through the integration of population structure into mapping analysis (Devlin and Roeder 1999; Yu et al. 2006). Further, the availability of large numbers of densely genotyped strains and deep sequencing for many other strains renders feasible the reconstruction, albeit coarse in some cases, of virtually any mouse genome. With a common framework of sequence variation, it is now possible to integrate a much broader range of mouse phenomic studies, enabling discovery of the role of millions of genetic variants in the development of phenotypic variation and disease. The Collaborative Cross and Diversity Outbred population provide mouse resources with the power, precision and diversity to characterize the roles of these variants in molecular regulation, biological processes, and whole-organism disease-related phenotypes.
To submit data from mapping and reference populations for curation, integration, and community access through MPD, please contact us at firstname.lastname@example.org. We acquire new data generated by members of the community; incorporate evolving technologies for archiving, integrating, and analyzing new and existing data; and expand activities that promote research reproducibility within and across resources.
The Mouse Phenome Database was supported by NIH DA028420 (MB). Genetic analysis in advanced mouse populations was supported by NIH GM076468 (GAC, EJC), NIH AG038070 (GAC), and NIH DA037927 (EJC).
- Bennett BJ, Farber CR, Orozco L, Kang HM, Ghazalpour A, Siemers N, Neubauer M, Neuhaus I, Yordanova R, Guan B, Truong A, Yang WP, He A, Kayne P, Gargalovic P, Kirchgessner T, Pan C, Castellani LW, Kostem E, Furlotte N, Drake TA, Eskin E, Lusis AJ (2010) A high-resolution association mapping panel for the dissection of complex traits in mice. Genome Res 20:281–290PubMedCentralCrossRefPubMedGoogle Scholar
- Bubier JA, Jay JJ, Baker CL, Bergeson SE, Ohno H, Metten P, Crabbe JC, Chesler EJ (2014) Identification of a QTL in Mus musculus for alcohol preference, withdrawal, and Ap3m2 expression using integrative functional genomics and precision genetics. Genetics 197:1377–1393PubMedCentralCrossRefPubMedGoogle Scholar
- Chesler EJ, Miller DR, Branstetter LR, Galloway LD, Jackson BL, Philip VM, Voy BH, Culiat CT, Threadgill DW, Williams RW, Churchill GA, Johnson DK, Manly KF (2008) The Collaborative Cross at Oak Ridge National Laboratory: developing a powerful resource for systems genetics. Mamm Genome 19:382–389PubMedCentralCrossRefPubMedGoogle Scholar
- Church RJ, Wu H, Mosedale M, Sumner SJ, Pathmasiri W, Kurtz CL, Pletcher MT, Eaddy JS, Pandher K, Singer M, Batheja A, Watkins PB, Adkins K, Harrill AH (2014) A systems biology approach utilizing a mouse diversity panel identifies genetic differences influencing isoniazid-induced microvesicular steatosis. Toxicol SciGoogle Scholar
- Church RJ, Gatti DM, Urban TJ, Long N, Yang X, Shi Q, Eaddy JS, Mosedale M, Ballard S, Churchill GA, Navarro V, Watkins PB, Threadgill DW, Harrill AH (2015) Sensitivity to hepatotoxicity due to epigallocatechin gallate is affected by genetic background in diversity outbred mice. Food Chem Toxicol 76:19–26CrossRefPubMedGoogle Scholar
- Churchill GA, Airey DC, Allayee H, Angel JM, Attie AD, Beatty J, Beavis WD, Belknap JK, Bennett B, Berrettini W, Bleich A, Bogue M, Broman KW, Buck KJ, Buckler E, Burmeister M, Chesler EJ, Cheverud JM, Clapcote S, Cook MN, Cox RD, Crabbe JC, Crusio WE, Darvasi A, Deschepper CF, Doerge RW, Farber CR, Forejt J, Gaile D, Garlow SJ, Geiger H, Gershenfeld H, Gordon T, Gu J, Gu W, de Haan G, Hayes NL, Heller C, Himmelbauer H, Hitzemann R, Hunter K, Hsu HC, Iraqi FA, Ivandic B, Jacob HJ, Jansen RC, Jepsen KJ, Johnson DK, Johnson TE, Kempermann G, Kendziorski C, Kotb M, Kooy RF, Llamas B, Lammert F, Lassalle JM, Lowenstein PR, Lu L, Lusis A, Manly KF, Marcucio R, Matthews D, Medrano JF, Miller DR, Mittleman G, Mock BA, Mogil JS, Montagutelli X, Morahan G, Morris DG, Mott R, Nadeau JH, Nagase H, Nowakowski RS, O’Hara BF, Osadchuk AV, Page GP, Paigen B, Paigen K, Palmer AA, Pan HJ, Peltonen-Palotie L, Peirce J, Pomp D, Pravenec M, Prows DR, Qi Z, Reeves RH, Roder J, Rosen GD, Schadt EE, Schalkwyk LC, Seltzer Z, Shimomura K, Shou S, Sillanpaa MJ, Siracusa LD, Snoeck HW, Spearow JL, Svenson K, Tarantino LM, Threadgill D, Toth LA, Valdar W, de Villena FP, Warden C, Whatley S, Williams RW, Wiltshire T, Yi N, Zhang D, Zhang M, Zou F (2004) The collaborative cross, a community resource for the genetic analysis of complex traits. Nat Genet 36:1133–1137CrossRefPubMedGoogle Scholar
- Collins FS (2012) Hematological parameters in 8 inbred founder strains and 130 + emerging lines (pre-CC) of the Collaborative Cross. MPD: Collins1, Mouse Phenome Datbase (phenome.jax.org), Bar Harbor, Maine USAGoogle Scholar
- Ferris MT, Aylor DL, Bottomly D, Whitmore AC, Aicher LD, Bell TA, Bradel-Tretheway B, Bryan JT, Buus RJ, Gralinski LE, Haagmans BL, McMillan L, Miller DR, Rosenzweig E, Valdar W, Wang J, Churchill GA, Threadgill DW, McWeeney SK, Katze MG, Pardo-Manuel de Villena F, Baric RS, Heise MT (2013) Modeling host genetic regulation of influenza pathogenesis in the collaborative cross. PLoS Pathog 9:e1003196PubMedCentralCrossRefPubMedGoogle Scholar
- French JE, Gatti DM, Morgan DL, Kissling GE, Shockley KR, Knudsen GA, Shepard KG, Price HC, King D, Witt KL, Pedersen LC, Munger SC, Svenson KL, Churchill GA (2015) Diversity outbred mice identify population-based exposure thresholds and genetic factors that influence benzene-induced genotoxicity. Environ Health Perspect 123:237–245PubMedCentralPubMedGoogle Scholar
- Ghazalpour A, Rau CD, Farber CR, Bennett BJ, Orozco LD, van Nas A, Pan C, Allayee H, Beaven SW, Civelek M, Davis RC, Drake TA, Friedman RA, Furlotte N, Hui ST, Jentsch JD, Kostem E, Kang HM, Kang EY, Joo JW, Korshunov VA, Laughlin RE, Martin LJ, Ohmen JD, Parks BW, Pellegrini M, Reue K, Smith DJ, Tetradis S, Wang J, Wang Y, Weiss JN, Kirchgessner T, Gargalovic PS, Eskin E, Lusis AJ, LeBoeuf RC (2012) Hybrid mouse diversity panel: a panel of inbred mouse strains suitable for analysis of complex genetic traits. Mamm Genome 23:680–692PubMedCentralCrossRefPubMedGoogle Scholar
- Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, Heger A, Agam A, Slater G, Goodson M, Furlotte NA, Eskin E, Nellaker C, Whitley H, Cleak J, Janowitz D, Hernandez-Pliego P, Edwards A, Belgard TG, Oliver PL, McIntyre RE, Bhomra A, Nicod J, Gan X, Yuan W, van der Weyden L, Steward CA, Bala S, Stalker J, Mott R, Durbin R, Jackson IJ, Czechanski A, Guerra-Assuncao JA, Donahue LR, Reinholdt LG, Payseur BA, Ponting CP, Birney E, Flint J, Adams DJ (2011) Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477:289–294PubMedCentralCrossRefPubMedGoogle Scholar
- Logan RW, Robledo RF, Recla JM, Chesler EJ (2013a) Behavioral and nociception traits in a diversity outbred (DO) mouse population and 8 founder inbred strains. MPD:Chesler4, Mouse Phenome Database (phenome.jax.org), Bar Harbor, Maine USAGoogle Scholar
- Morgan AP, Welsh CE (2015) Informatics resources for the collaborative cross and related mouse populations. Mamm GenomeGoogle Scholar
- Philip VM, Sokoloff G, Ackert-Bicknell CL, Striz M, Branstetter L, Beckmann MA, Spence JS, Jackson BL, Galloway LD, Barker P, Wymore AM, Hunsicker PR, Durtschi DC, Shaw GS, Shinpock S, Manly KF, Miller DR, Donohue KD, Culiat CT, Churchill GA, Lariviere WR, Palmer AA, O’Hara BF, Voy BH, Chesler EJ (2011) Genetic analysis in the collaborative cross breeding population. Genome Res 21:1223–1238PubMedCentralCrossRefPubMedGoogle Scholar
- Rasmussen AL, Okumura A, Ferris MT, Green R, Feldmann F, Kelly SM, Scott DP, Safronetz D, Haddock E, LaCasse R, Thomas MJ, Sova P, Carter VS, Weiss JM, Miller DR, Shaw GD, Korth MJ, Heise MT, Baric RS, de Villena FP, Feldmann H, Katze MG (2014) Host genetic diversity enables Ebola hemorrhagic fever pathogenesis and resistance. Science 346:987–991PubMedCentralCrossRefPubMedGoogle Scholar
- Ringwald M, Iyer V, Mason JC, Stone KR, Tadepally HD, Kadin JA, Bult CJ, Eppig JT, Oakley DJ, Briois S, Stupka E, Maselli V, Smedley D, Liu S, Hansen J, Baldock R, Hicks GG, Skarnes WC (2011) The IKMC web portal: a central point of entry to data and resources from the International Knockout Mouse Consortium. Nucleic Acids Res 39:D849–D855PubMedCentralCrossRefPubMedGoogle Scholar
- Smallwood TL, Gatti DM, Quizon P, Weinstock GM, Jung KC, Zhao L, Hua K, Pomp D, Bennett BJ (2014) High-resolution genetic mapping in the diversity outbred mouse population identifies Apobec1 as a candidate gene for atherosclerosis. J Bethesda 4:2353–2363Google Scholar
- Svenson KL, Lenarcic AB, Churchill GA, Valdar W (2012b) Multi-system survey of mouse physiology in 8 inbred founder strains and 54 F1 hybrids of the Collaborative Cross. MPD:CGDpheno3, Mouse Phenome Database (phenome.jax.org) Bar Harbor, Maine USAGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.