Skip to main content

BioGPS and GXD: mouse gene expression data—the benefits and challenges of data integration

Abstract

Mouse gene expression data are complex and voluminous. To maximize the utility of these data, they must be made readily accessible through databases, and those resources need to place the expression data in the larger biological context. Here we describe two community resources that approach these problems in different but complementary ways: BioGPS and the Mouse Gene Expression Database (GXD). BioGPS connects its large and homogeneous microarray gene expression reference data sets via plugins with a heterogeneous collection of external gene centric resources, thus casting a wide but loose net. GXD acquires different types of expression data from many sources and integrates these data tightly with other types of data in the Mouse Genome Informatics (MGI) resource, with a strong emphasis on consistency checks and manual curation. We describe and contrast the “loose” and “tight” data integration strategies employed by BioGPS and GXD, respectively, and discuss the challenges and benefits of data integration. BioGPS is freely available at http://biogps.org. GXD is freely available through the MGI web site (www.informatics.jax.org) or directly at www.informatics.jax.org/expression.shtml.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  • Bard JB, Kaufman MH, Dubreuil C, Brune RM, Burger A, Baldock RA, Davidson DR (1998) An internet-accessible database of mouse developmental anatomy based on a systematic nomenclature. Mech Dev 74:111–120

    PubMed  Article  CAS  Google Scholar 

  • Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall AK, Phillippy KH, Sherman PM, Muertter RN, Holko M, Ayanbule O, Yefanov A, Soboleva A (2011) NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res 39:D1005–D1010

    PubMed  Article  Google Scholar 

  • Belleau F, Nolin MA, Tourigny N, Rigault P, Morissette J (2008) Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 41(5):706–716

    PubMed  Article  Google Scholar 

  • Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FCP, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M (2001) Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat Genet 29:365–371

    PubMed  Article  CAS  Google Scholar 

  • Diez-Roux G, Banfi S, Sultan M, Geffers L, Anand S et al (2011) A high-resolution anatomical atlas of the transcriptome in the mouse embryo. PLoS Biol 9(1):e1000582

    PubMed  Article  CAS  Google Scholar 

  • ENCODE Project Consortium (2011) A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 9(4):e1001046

    Article  Google Scholar 

  • Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE, the Mouse Genome Database Group (2012) The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse. Nucleic Acids Res 40(1):D881–D886

    PubMed  Article  CAS  Google Scholar 

  • Finger JH, Smith CM, Hayamizu TF, McCright IJ, Eppig JT, Kadin JA, Richardson JE, Ringwald M (2011) The mouse Gene Expression Database (GXD): 2011 update. Nucleic Acids Res 39(1):D835–D841

    PubMed  Article  Google Scholar 

  • Gates H, Mallon AM, Brown SD, EUMODIC Consortium (2011) High-throughput mouse phenotyping. Methods 53(4):394–404

    PubMed  Article  CAS  Google Scholar 

  • Haendel MA, Gkoutos GV, Lewis SE, Mungall CJ (2009) Uberon: towards a comprehensive multi-species anatomy ontology. Presented at the International Conference on Biomedical Ontology (ICBO), 26 July 2009. Available at Nature Precedings, http://precedings.nature.com/documents/3592/version/1

  • Hayamizu TF, Mangan M, Corradi JP, Kadin JA, Ringwald M (2005) The Adult Mouse Anatomical Dictionary: a tool for annotating and integrating data. Genome Biol 6:R29

    PubMed  Article  Google Scholar 

  • Hayamizu TF, de Coronado S, Fragoso G, Sioutos N, Kadin JA, Ringwald M (2012) The mouse-human anatomy ontology mapping project. Database (Oxford) 2012:bar066

  • Hornshoj H, Conley LN, Hedegaard J, Sorensen P, Panitz F, Bendixen C (2007) Microarray expression profiles of 20,000 genes across 23 healthy porcine tissues. PLoS One 11:e1203

    Article  Google Scholar 

  • Lattin JE, Schroder K, Su AI, Walker JR, Zhang J, Wiltshire T, Saijo K, Ck Glass, Hume DA, Kellie S, Sweet MJ (2008) Expression analysis of G protein-coupled receptors in mouse macrophages. Immunome Res 4:5

    PubMed  Article  Google Scholar 

  • Lein ES, Hawrylycz MJ, Ao N, Ayres M, Bensinger A, Bernard A, Boe AF, Boguski MS, Brockway KS, Byrnes EJ et al (2007) Genome-wide atlas of gene expression in the adult mouse brain. Nature 445:168–176

    PubMed  Article  CAS  Google Scholar 

  • Leipzig J, Pevzner P, Heber S (2004) The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome. Nucleic Acids Res 32:3977–3983

    PubMed  Article  CAS  Google Scholar 

  • Magdaleno S, Jensen P, Brumwell CL, Seal A, Lehman K, Asbury A, Cheung T, Cornelius T, Batten DM, Eden C, Norland SM, Rice DS, Dosooye N, Shakya S, Mehta P, Curran T (2006) BGEM: an in situ hybridization database of gene expression in the embryonic and adult mouse nervous system. PLoS Biol 4:e86

    PubMed  Article  Google Scholar 

  • McMahon AP, Aronow BJ, Davidson DR, Davies JA, Gaido KW, Grimmond S, Lessard JL, Little MH, Potter SS, Wilder EL, Zhang P, GUDMAP Project (2008) GUDMAP: the genitourinary developmental molecular anatomy project. J Am Soc Nephrol 19:667–671

    PubMed  Article  Google Scholar 

  • Musen MA, Noy NF, Shah NH, Whetzel PL, Chute CG, Story MA, Smith B, NCBO team (2012) The National Center for Biomedical Ontology. J Am Med Inform Assoc 19:190–195

    PubMed  Article  Google Scholar 

  • Oakley DJ, Iyer V, Skarnes WC, Smedley D (2011) BioMart as an integration solution for the International Knockout Mouse Consortium. Database (Oxford) 2011:bar028

  • Parkinson H, Sarkans U, Kolesnikov N, Abeygunawardena N, Burdett T, Dylag M, Emam I, Farne A, Hastings E, Holloway E, Kurbatova N, Lukk M, Malone J, Mani R, Pilicheva E, Rustici G, Sharma A, Williams E, Adamusiak T, Brandizi M, Sklyar N, Brazma A (2011) ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res 39:D1002–D1004

    PubMed  Article  Google Scholar 

  • Rayner TF, Rocca-Serra P, Spellman PT, Causton HC, Farne A, Holloway E, Irizarry RA, Liu J, Maier DS, Miller M, Petersen K, Quackenbush J, Sherlock G, Christian J, Stoeckert CJ, White J, Whetzel PL, Wymore F, Parkinson H, Sarkans U, Catherine A, Ball CA, Brazma A (2006) A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics 7:489

    PubMed  Article  Google Scholar 

  • Richardson L, Venkataraman S, Stevenson P, Yang Y, Burton N, Rao J, Fisher M, Baldock RA, Davidson DR, Christiansen JH (2010) EMAGE mouse embryo spatial gene expression database: 2010 update. Nucleic Acids Res 38:D703–D709

    PubMed  Article  CAS  Google Scholar 

  • Ringwald M, Baldock R, Bard J, Kaufman M, Eppig JT, Richardson JE, Nadeau JH, Davidson D (1994) A database for mouse development. Science 265:2033–2034

    PubMed  Article  CAS  Google Scholar 

  • Ringwald M, Iyer V, Mason JC, Stone KR, Tadepally HD, Kadin JA, Bult CJ, Eppig JT, Oakley DJ, Briois S, Stupka E, Maselli V, Smedley D, Liu S, Hansen J, Baldock R, Hicks GG, Skarnes WC (2011) The IKMC web portal: a central point of entry to data and resources from the International Knockout Mouse Consortium. Nucleic Acids Res 39(1):D849–D855

    PubMed  Article  Google Scholar 

  • Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, The OBI Consortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S (2007a) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25:1251–1255

    PubMed  Article  CAS  Google Scholar 

  • Smith CM, Finger JH, Hayamizu TF, McCright IJ, Eppig JT, Kadin JA, Richardson JE, Ringwald M (2007b) The mouse Gene Expression Database (GXD): 2007 update. Nucleic Acids Res 35:D618–D623

    PubMed  Article  CAS  Google Scholar 

  • Son CG, Bilke S, Davis S, Greer BT, Wei JS, Whiteford CC, Chen QR, Cenacchi N, Khan J (2005) Database of mRNA gene expression of profiles of multiple human organs. Genome Res 15(3):443–450

    PubMed  Article  CAS  Google Scholar 

  • Splendiani A (2008) RDFScape: Semantic Web meets systems biology. BMC Bioinformatics 9(Suppl 4):S6

    PubMed  Article  Google Scholar 

  • Stevenson P, Richardson L, Venkataraman S, Yang Y, Baldock R (2011) The BioMart interface to the eMouseAtlas gene expression database EMAGE. Database (Oxford) 2011:bar029

  • Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, Patapoutian A, Hampton GM, Schultz PG, Hogenesch JB (2002) Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA 99:4465–4470

    PubMed  Article  CAS  Google Scholar 

  • Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB (2004) A gene atlas of the mouse human protein-encoding transcriptomes. Proc Natl Acad Sci USA 101:6062–6067

    PubMed  Article  CAS  Google Scholar 

  • Visel A, Thaller C, Eichele G (2004) GenePaint.org: an atlas of gene expression patterns in the mouse embryo. Nucleic Acids Res 32:D552–D556

    PubMed  Article  CAS  Google Scholar 

  • Walker JR, Su AI, Self DW, Hogenesch JB, Lapp H, Maier R, Hoyer D, Bilbe G (2004) Applications of a rat multiple tissue gene expression data set. Genome Res 14(4):742–749

    PubMed  Article  CAS  Google Scholar 

  • Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge CL, Haase J, Janes J, Huss JW, Su AI (2009) BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol 10:R130

    PubMed  Article  Google Scholar 

Download references

Acknowledgments

The authors thank Drs. Joel Richardson, Constance Smith, and Benjamin Good for their helpful comments and discussions on the manuscript. The authors also thank all the members of the GXD and BioGPS teams for their dedicated work, as well as the members of other MGI projects for their contributions to GXD and to the larger MGI Resource. The authors acknowledge support from the National Institute of General Medical Sciences (GM083924 to AIS) and from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (HD062499 to MR).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Martin Ringwald or Andrew I. Su.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ringwald, M., Wu, C. & Su, A.I. BioGPS and GXD: mouse gene expression data—the benefits and challenges of data integration. Mamm Genome 23, 550–558 (2012). https://doi.org/10.1007/s00335-012-9408-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00335-012-9408-0

Keywords

  • Expression Data
  • Data Integration
  • Gene Identifier
  • Mouse Genome Informatics
  • Anatomical Ontology