Mammalian Genome

, Volume 23, Issue 9–10, pp 550–558 | Cite as

BioGPS and GXD: mouse gene expression data—the benefits and challenges of data integration

Article

Abstract

Mouse gene expression data are complex and voluminous. To maximize the utility of these data, they must be made readily accessible through databases, and those resources need to place the expression data in the larger biological context. Here we describe two community resources that approach these problems in different but complementary ways: BioGPS and the Mouse Gene Expression Database (GXD). BioGPS connects its large and homogeneous microarray gene expression reference data sets via plugins with a heterogeneous collection of external gene centric resources, thus casting a wide but loose net. GXD acquires different types of expression data from many sources and integrates these data tightly with other types of data in the Mouse Genome Informatics (MGI) resource, with a strong emphasis on consistency checks and manual curation. We describe and contrast the “loose” and “tight” data integration strategies employed by BioGPS and GXD, respectively, and discuss the challenges and benefits of data integration. BioGPS is freely available at http://biogps.org. GXD is freely available through the MGI web site (www.informatics.jax.org) or directly at www.informatics.jax.org/expression.shtml.

References

  1. Bard JB, Kaufman MH, Dubreuil C, Brune RM, Burger A, Baldock RA, Davidson DR (1998) An internet-accessible database of mouse developmental anatomy based on a systematic nomenclature. Mech Dev 74:111–120PubMedCrossRefGoogle Scholar
  2. Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall AK, Phillippy KH, Sherman PM, Muertter RN, Holko M, Ayanbule O, Yefanov A, Soboleva A (2011) NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res 39:D1005–D1010PubMedCrossRefGoogle Scholar
  3. Belleau F, Nolin MA, Tourigny N, Rigault P, Morissette J (2008) Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 41(5):706–716PubMedCrossRefGoogle Scholar
  4. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FCP, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M (2001) Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat Genet 29:365–371PubMedCrossRefGoogle Scholar
  5. Diez-Roux G, Banfi S, Sultan M, Geffers L, Anand S et al (2011) A high-resolution anatomical atlas of the transcriptome in the mouse embryo. PLoS Biol 9(1):e1000582PubMedCrossRefGoogle Scholar
  6. ENCODE Project Consortium (2011) A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 9(4):e1001046CrossRefGoogle Scholar
  7. Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE, the Mouse Genome Database Group (2012) The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse. Nucleic Acids Res 40(1):D881–D886PubMedCrossRefGoogle Scholar
  8. Finger JH, Smith CM, Hayamizu TF, McCright IJ, Eppig JT, Kadin JA, Richardson JE, Ringwald M (2011) The mouse Gene Expression Database (GXD): 2011 update. Nucleic Acids Res 39(1):D835–D841PubMedCrossRefGoogle Scholar
  9. Gates H, Mallon AM, Brown SD, EUMODIC Consortium (2011) High-throughput mouse phenotyping. Methods 53(4):394–404PubMedCrossRefGoogle Scholar
  10. Haendel MA, Gkoutos GV, Lewis SE, Mungall CJ (2009) Uberon: towards a comprehensive multi-species anatomy ontology. Presented at the International Conference on Biomedical Ontology (ICBO), 26 July 2009. Available at Nature Precedings, http://precedings.nature.com/documents/3592/version/1
  11. Hayamizu TF, Mangan M, Corradi JP, Kadin JA, Ringwald M (2005) The Adult Mouse Anatomical Dictionary: a tool for annotating and integrating data. Genome Biol 6:R29PubMedCrossRefGoogle Scholar
  12. Hayamizu TF, de Coronado S, Fragoso G, Sioutos N, Kadin JA, Ringwald M (2012) The mouse-human anatomy ontology mapping project. Database (Oxford) 2012:bar066Google Scholar
  13. Hornshoj H, Conley LN, Hedegaard J, Sorensen P, Panitz F, Bendixen C (2007) Microarray expression profiles of 20,000 genes across 23 healthy porcine tissues. PLoS One 11:e1203CrossRefGoogle Scholar
  14. Lattin JE, Schroder K, Su AI, Walker JR, Zhang J, Wiltshire T, Saijo K, Ck Glass, Hume DA, Kellie S, Sweet MJ (2008) Expression analysis of G protein-coupled receptors in mouse macrophages. Immunome Res 4:5PubMedCrossRefGoogle Scholar
  15. Lein ES, Hawrylycz MJ, Ao N, Ayres M, Bensinger A, Bernard A, Boe AF, Boguski MS, Brockway KS, Byrnes EJ et al (2007) Genome-wide atlas of gene expression in the adult mouse brain. Nature 445:168–176PubMedCrossRefGoogle Scholar
  16. Leipzig J, Pevzner P, Heber S (2004) The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome. Nucleic Acids Res 32:3977–3983PubMedCrossRefGoogle Scholar
  17. Magdaleno S, Jensen P, Brumwell CL, Seal A, Lehman K, Asbury A, Cheung T, Cornelius T, Batten DM, Eden C, Norland SM, Rice DS, Dosooye N, Shakya S, Mehta P, Curran T (2006) BGEM: an in situ hybridization database of gene expression in the embryonic and adult mouse nervous system. PLoS Biol 4:e86PubMedCrossRefGoogle Scholar
  18. McMahon AP, Aronow BJ, Davidson DR, Davies JA, Gaido KW, Grimmond S, Lessard JL, Little MH, Potter SS, Wilder EL, Zhang P, GUDMAP Project (2008) GUDMAP: the genitourinary developmental molecular anatomy project. J Am Soc Nephrol 19:667–671PubMedCrossRefGoogle Scholar
  19. Musen MA, Noy NF, Shah NH, Whetzel PL, Chute CG, Story MA, Smith B, NCBO team (2012) The National Center for Biomedical Ontology. J Am Med Inform Assoc 19:190–195PubMedCrossRefGoogle Scholar
  20. Oakley DJ, Iyer V, Skarnes WC, Smedley D (2011) BioMart as an integration solution for the International Knockout Mouse Consortium. Database (Oxford) 2011:bar028Google Scholar
  21. Parkinson H, Sarkans U, Kolesnikov N, Abeygunawardena N, Burdett T, Dylag M, Emam I, Farne A, Hastings E, Holloway E, Kurbatova N, Lukk M, Malone J, Mani R, Pilicheva E, Rustici G, Sharma A, Williams E, Adamusiak T, Brandizi M, Sklyar N, Brazma A (2011) ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res 39:D1002–D1004PubMedCrossRefGoogle Scholar
  22. Rayner TF, Rocca-Serra P, Spellman PT, Causton HC, Farne A, Holloway E, Irizarry RA, Liu J, Maier DS, Miller M, Petersen K, Quackenbush J, Sherlock G, Christian J, Stoeckert CJ, White J, Whetzel PL, Wymore F, Parkinson H, Sarkans U, Catherine A, Ball CA, Brazma A (2006) A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics 7:489PubMedCrossRefGoogle Scholar
  23. Richardson L, Venkataraman S, Stevenson P, Yang Y, Burton N, Rao J, Fisher M, Baldock RA, Davidson DR, Christiansen JH (2010) EMAGE mouse embryo spatial gene expression database: 2010 update. Nucleic Acids Res 38:D703–D709PubMedCrossRefGoogle Scholar
  24. Ringwald M, Baldock R, Bard J, Kaufman M, Eppig JT, Richardson JE, Nadeau JH, Davidson D (1994) A database for mouse development. Science 265:2033–2034PubMedCrossRefGoogle Scholar
  25. Ringwald M, Iyer V, Mason JC, Stone KR, Tadepally HD, Kadin JA, Bult CJ, Eppig JT, Oakley DJ, Briois S, Stupka E, Maselli V, Smedley D, Liu S, Hansen J, Baldock R, Hicks GG, Skarnes WC (2011) The IKMC web portal: a central point of entry to data and resources from the International Knockout Mouse Consortium. Nucleic Acids Res 39(1):D849–D855PubMedCrossRefGoogle Scholar
  26. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, The OBI Consortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S (2007a) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25:1251–1255PubMedCrossRefGoogle Scholar
  27. Smith CM, Finger JH, Hayamizu TF, McCright IJ, Eppig JT, Kadin JA, Richardson JE, Ringwald M (2007b) The mouse Gene Expression Database (GXD): 2007 update. Nucleic Acids Res 35:D618–D623PubMedCrossRefGoogle Scholar
  28. Son CG, Bilke S, Davis S, Greer BT, Wei JS, Whiteford CC, Chen QR, Cenacchi N, Khan J (2005) Database of mRNA gene expression of profiles of multiple human organs. Genome Res 15(3):443–450PubMedCrossRefGoogle Scholar
  29. Splendiani A (2008) RDFScape: Semantic Web meets systems biology. BMC Bioinformatics 9(Suppl 4):S6PubMedCrossRefGoogle Scholar
  30. Stevenson P, Richardson L, Venkataraman S, Yang Y, Baldock R (2011) The BioMart interface to the eMouseAtlas gene expression database EMAGE. Database (Oxford) 2011:bar029Google Scholar
  31. Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, Patapoutian A, Hampton GM, Schultz PG, Hogenesch JB (2002) Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA 99:4465–4470PubMedCrossRefGoogle Scholar
  32. Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB (2004) A gene atlas of the mouse human protein-encoding transcriptomes. Proc Natl Acad Sci USA 101:6062–6067PubMedCrossRefGoogle Scholar
  33. Visel A, Thaller C, Eichele G (2004) GenePaint.org: an atlas of gene expression patterns in the mouse embryo. Nucleic Acids Res 32:D552–D556PubMedCrossRefGoogle Scholar
  34. Walker JR, Su AI, Self DW, Hogenesch JB, Lapp H, Maier R, Hoyer D, Bilbe G (2004) Applications of a rat multiple tissue gene expression data set. Genome Res 14(4):742–749PubMedCrossRefGoogle Scholar
  35. Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge CL, Haase J, Janes J, Huss JW, Su AI (2009) BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol 10:R130PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.The Jackson LaboratoryBar HarborUSA
  2. 2.The Scripps Research InstituteLa JollaUSA

Personalised recommendations