Skip to main content

Current challenges and approaches for the synergistic use of systems biology data in the scientific community

  • Chapter

Part of the Experientia Supplementum book series (EXS,volume 97)

Abstract

Today’s rapid development and broad application of high-throughput analytical technologies are transforming biological research and provide an amount of data and analytical opportunities to understand the fundamentals of biological processes undreamt of in past years. To fully exploit the potential of the large amount of data, scientists must be able to understand and interpret the information in an integrative manner. While the sheer data volume and heterogeneity of technical platforms within each discipline already poses a significant challenge, the heterogeneity of platforms and data formats across disciplines makes the integrative management, analysis, and interpretation of data a significantly more difficult task. This challenge thus lies at the heart of systems biology, which aims at a quantitative understanding of biological systems to the extent that systemic features can be predicted. In this chapter, we discuss several key issues that need to be addressed in order to put an integrated systems biology data analysis and mining within reach.

Keywords

  • Gene Expression Omnibus
  • System Biology Markup Language
  • Protein Interaction Data
  • Open Biomedical Ontology
  • Gene Expression Database

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

equal contribution

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-7643-7439-6_12
  • Chapter length: 31 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   169.00
Price excludes VAT (USA)
  • ISBN: 978-3-7643-7439-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   219.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bork P, Serrano L (2005) Towards cellular systems in 4D. Cell 121:507–509

    PubMed  CrossRef  CAS  Google Scholar 

  2. Lauffenburger D (2003) Systems biology. Chem Eng News 81: 45–55

    Google Scholar 

  3. Maglott DR, Katz KS, Sicotte H, Pruitt KD (2000) NCBI’s LocusLink and RefSeq. Nucleic Acids Res 28: 126–128

    PubMed  CrossRef  CAS  Google Scholar 

  4. Pruitt KD, Katz KS, Sicotte H, Maglott DR (2000) Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet 16: 44–47

    PubMed  CrossRef  CAS  Google Scholar 

  5. Bairoch A, Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28: 45–48

    PubMed  CrossRef  CAS  Google Scholar 

  6. Jarvinen AK, Hautaniemi S, Edgren H, Auvinen P, Saarela J, Kallioniemi OP, Monni O (2004) Are data from different gene expression microarray platforms comparable? Genomics 83: 1164–1168

    PubMed  CrossRef  CAS  Google Scholar 

  7. Hack CJ (2004) Integrated transcriptome and proteome data: the challenges ahead. Brief Funct Genomic Proteomic 3: 212–219

    PubMed  CrossRef  CAS  Google Scholar 

  8. Schulze-Kremer S (2002) Ontologies for molecular biology and bioinformatics. In Silico Biol 2: 179–193

    PubMed  CAS  Google Scholar 

  9. Rojas I, Ratsch E, Saric J, Wittig U (2004) Notes on the use of ontologies in the biochemical domain. In Silico Biol 4: 89–96

    PubMed  CAS  Google Scholar 

  10. Blake J (2004) Bio-ontologies-fast and furious. Nat Biotechnol 22: 773–774

    PubMed  CrossRef  CAS  Google Scholar 

  11. Bard JB, Rhee SY (2004) Ontologies in biology: design, applications and future challenges. Nat Rev Genet 5: 213–222

    PubMed  CrossRef  CAS  Google Scholar 

  12. Gruber TR (1993) Toward principles for the design of ontologies used for knowledge sharing. http://ksl-web.stanford.edu/KSL_Abstracts/KSL-93-04.html

    Google Scholar 

  13. OBO. Open Biomedical Ontologies. http://obo.sourceforge.net.

    Google Scholar 

  14. Mungall C (2004) OBOL: Integrating language and meaning in bio-ontologies. Comp Funct Genomics 6–7: 509–520

    CrossRef  Google Scholar 

  15. The Plant Ontology Consortium (2002) The Plant Ontology Consortium and Plant Ontologies. Comp Funct Genomics 3: 137–142

    CrossRef  Google Scholar 

  16. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29

    PubMed  CrossRef  CAS  Google Scholar 

  17. The Gene Ontology Consortium (2001) Creating the gene ontology resource: design and implementation. Genome Res 11: 1425–1433

    CrossRef  Google Scholar 

  18. Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W (2004) GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol 136: 2621–2632

    PubMed  CrossRef  CAS  Google Scholar 

  19. Khatri P, Draghici S (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21: 3587–3595

    PubMed  CrossRef  CAS  Google Scholar 

  20. Khatri P, Draghici S, Ostermeier GC, Krawetz SA (2002) Profiling gene expression using onto-express. Genomics 79: 266–270

    PubMed  CrossRef  CAS  Google Scholar 

  21. Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz SA, Tainsky MA (2003) Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res 31: 3775–3781

    PubMed  CrossRef  CAS  Google Scholar 

  22. Zhang B, Schmoyer D, Kirov S, Snoddy J (2004) GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics 5: 16

    PubMed  CrossRef  Google Scholar 

  23. Lee HK, Braynen W, Keshav K, Pavlidis P. Ermine J (2005) Tool for functional analysis of gene expression data sets. BMC Bioinformatics 6: 269

    PubMed  CrossRef  Google Scholar 

  24. Maere S, Heymans K, Kuiper M (2005) BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21: 3448–3449

    PubMed  CrossRef  CAS  Google Scholar 

  25. Wrobel G, Chalmel F, Primig M (2005) goCluster integrates statistical analysis and functional interpretation of microarray expression data. Bioinformatics 21: 3575–3577

    PubMed  CrossRef  CAS  Google Scholar 

  26. Lottaz C, Spang R (2005) Molecular decomposition of complex clinical phenotypes using biologically structured analysis of microarray data. Bioinformatics 21: 1971–1978

    PubMed  CrossRef  CAS  Google Scholar 

  27. Berardini TZ, Mundodi S, Reiser L, Huala E, Garcia-Hernandez M, Zhang P, Mueller LA, Yoon J, Doyle A, Lander G et al. (2004) Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol 135: 745–755

    PubMed  CrossRef  CAS  Google Scholar 

  28. Beckett P, Bancroft I (2005) M.T. Computational tools for Brassica-Arabidopsis comparative genomics. Comp Funct Genomics 6: 147–152

    CrossRef  CAS  Google Scholar 

  29. Gramene. www.gramene.org

    Google Scholar 

  30. Ware D, Jaiswal P, Ni J, Pan X, Chang K, Clark K, Teytelman L, Schmidt S, Zhao W, Cartinhour S et al. (2002) Gramene: a resource for comparative grass genomics. Nucleic Acids Res 30: 103–105

    PubMed  CrossRef  CAS  Google Scholar 

  31. Ware DH, Jaiswal P, Ni J, Yap IV, Pan X, Clark KY, Teytelman L, Schmidt SC, Zhao W, Chang K et al. (2002) Gramene, a tool for grass genomics. Plant Physiol 130: 1606–1613

    PubMed  CrossRef  CAS  Google Scholar 

  32. Soldatova LN, King RD (2005) Are the current ontologies in biology good ontologies? Nat Biotechnol 23: 1095–1098

    PubMed  CrossRef  CAS  Google Scholar 

  33. Brazma A, Robinson A, Cameron G, Ashburner M (2000) One-stop shop for microarray data. Nature 403: 699–700

    PubMed  CrossRef  CAS  Google Scholar 

  34. MIAME. www.mged.org/Workgroups/MIAME/miame_checklist.html

    Google Scholar 

  35. Zimmermann P, Schildknecht B, Craigon D, Garcia-Hernandez M, Gruissem W, May S, Mukherjee G, Parkinson H, Rhee S, Wagner U et al. (2006) MIAME/Plant — adding value to plant microarray experiments. Plant Methods 2: 1

    PubMed  CrossRef  Google Scholar 

  36. MIAME-Tox. http://www.mged.org/MIAME1.1-DenverDraft.DOC)

    Google Scholar 

  37. Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M et al. (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 3:RESEARCH0046 Epub 2002 Aug 23

    Google Scholar 

  38. MAGE-ML.DTD. http://schema.omg.org/lsr/gene_expression/1.1/MAGE-ML.dtd

    Google Scholar 

  39. MGED Ontology draft. www.mged.org/Workgroups/MIAME/MIAMEv1.1-MAGEOntologyDraft2v1.0.htm

    Google Scholar 

  40. Jenkins H, Hardy N, Beckmann M, Draper J, Smith AR, Taylor J, Fiehn O, Goodacre R, Bino RJ, Hall R et al. (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nat Biotechnol 22: 1601–1606

    PubMed  CrossRef  CAS  Google Scholar 

  41. Kaiser J (2002) Proteomics. Public-private group maps out initiatives. Science 296: 827

    PubMed  CrossRef  CAS  Google Scholar 

  42. Orchard S, Hermjakob H, Apweiler R (2003) The proteomics standards initiative. Proteomics 3: 1374–1376

    PubMed  CrossRef  CAS  Google Scholar 

  43. Orchard S, Taylor C, Hermjakob H, Zhu W, Julian R, Apweiler R (2004) Current status of proteomic standards development. Exp Rev Proteomics 1: 179–183

    CrossRef  CAS  Google Scholar 

  44. Jensen ON (2004) Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry. Curr Opin Chem Biol 8: 33–41

    PubMed  CrossRef  Google Scholar 

  45. Tyers M, Mann M (2003) From genomics to proteomics. Nature 422: 193–197

    PubMed  CrossRef  CAS  Google Scholar 

  46. Anderson NL, Anderson NG (2002) The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics 1: 845–867

    PubMed  CrossRef  CAS  Google Scholar 

  47. de Lichtenberg U, Jensen LJ, Brunak S, Bork P (2005) Dynamic complex formation during the yeast cell cycle. Science 307: 724–727

    PubMed  CrossRef  Google Scholar 

  48. Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C et al. (2004) The HUPO PSI’s molecular interaction format — a community standard for the representation of protein interaction data. Nat Biotechnol 22: 177–183

    PubMed  CrossRef  CAS  Google Scholar 

  49. DIP. http://dip.doe-mbi.ucla.edu

    Google Scholar 

  50. MINT. http://mint.bio.uniroma2.it/mint

    Google Scholar 

  51. MPact. http://mips.gsf.de/genre/proj/mpact

    Google Scholar 

  52. IntAct. www.ebi.ac.uk/intact

    Google Scholar 

  53. http://imex.sf.net

    Google Scholar 

  54. Pedrioli PG, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti RH, Apweiler R et al. (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol 22: 1459–1466

    PubMed  CrossRef  CAS  Google Scholar 

  55. Orchard S, Hermjakob H, Taylor C, Aebersold R, Apweiler R (2005) Human proteome organisation proteomics standards initiative pre-congress initiative. Proteomics 5: 4651–4652

    PubMed  CrossRef  CAS  Google Scholar 

  56. Nesvizhskii AI, Aebersold R (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics 4: 1419–1440

    PubMed  CrossRef  CAS  Google Scholar 

  57. Carr S, Aebersold R, Baldwin M, Burlingame A, Clauser K, Nesvizhskii A (2004) The need for guidelines in publication of peptide and protein identification data: Working Group on Publication Guidelines for Peptide and Protein Identification Data. Mol Cell Proteomics 3: 531–533

    PubMed  CrossRef  CAS  Google Scholar 

  58. Keller A, Nesvizhskii AI, Kolker E, Aebersold R (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74: 5383–5392

    PubMed  CrossRef  CAS  Google Scholar 

  59. Nesvizhskii AI, Keller A, Kolker E, Aebersold R (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75: 4646–4658

    PubMed  CrossRef  CAS  Google Scholar 

  60. Ahrens C, Jespersen H, Schandorff S (2005) Bioinformatics for Proteomics: Wiley, 249–272

    Google Scholar 

  61. Schwarz K, Schmitt I, Türker C, Höding M, Hildebrandt E, Balko S, Conrad S, Saake G (1999) Design Support for Database Federations. Springer-Verlag, 445–459

    Google Scholar 

  62. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A et al. (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19: 524–531

    PubMed  CrossRef  CAS  Google Scholar 

  63. Adelberg A (1998) NoDoSE — A tool for semi-automatically extracting structured and semistructured data from text documents. In: Proceedings of the International Conference on Data Management, SIGMOD’98, ACM SIGMOD Record, 25

    Google Scholar 

  64. Sheth AP, Larson JA (1990) Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys 22: 183–236

    CrossRef  Google Scholar 

  65. Batini C, Lenzerini M, Navathe SB (1986) A comparative analysis of methodologies for database schema integration. ACM Computing Surveys 18: 323–364

    CrossRef  Google Scholar 

  66. Sarkans U, Parkinson H, Lara GG, Oezcimen A, Sharma A, Abeygunawardena N, Contrino S, Holloway E, Rocca-Serra P, Mukherjee G et al. (2005) The ArrayExpress gene expression database: a software engineering and implementation perspective. Bioinformatics 21: 1495–1501

    PubMed  CrossRef  CAS  Google Scholar 

  67. Parkinson H, Sarkans U, Shojatalab M, Abeygunawardena N, Contrino S, Coulson R, Farne A, Lara GG, Holloway E, Kapushesky M et al. (2005) ArrayExpress — a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 33: D553–555

    PubMed  CrossRef  CAS  Google Scholar 

  68. Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H et al. (2005) Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 33: e175

    PubMed  CrossRef  Google Scholar 

  69. Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R (2005) NCBI GEO: mining millions of expression profiles — database and tools. Nucleic Acids Res 33: D562–566

    PubMed  CrossRef  CAS  Google Scholar 

  70. Mukherjee G, Abeygunawardena N, Parkinson H, Contrino S, Durinck S, Farne A, Holloway E, Lilja P, Moreau Y, Oezcimen A et al. (2005) Plant-based microarray data at the European Bioinformatics Institute. Introducing AtMIAMExpress, a submission tool for Arabidopsis gene expression data to ArrayExpress. Plant Physiol 139: 632–636

    PubMed  CrossRef  CAS  Google Scholar 

  71. Boyes DC, Zayed AM, Ascenzi R, McCaskill AJ, Hoffman NE, Davis KR, Gorlach J (2001) Growth stage-based phenotypic analysis of Arabidopsis: a model for high throughput functional genomics in plants. Plant Cell 13: 1499–1510

    PubMed  CrossRef  CAS  Google Scholar 

  72. Craigon DJ, James N, Okyere J, Higgins J, Jotham J, May S (2004) NASCArrays: a repository for microarray data generated by NASC’s transcriptomics service. Nucleic Acids Res 32: D575–577

    PubMed  CrossRef  CAS  Google Scholar 

  73. Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M et al. (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 31: 224–228

    PubMed  CrossRef  CAS  Google Scholar 

  74. Yazaki J, Kishimoto N, Ishikawa M, Endo D, Kojima K (2002) The Rice Expression Database (RED): gateway to rice functional genomics. Trends in Plant Sci 7: 563–564

    CrossRef  CAS  Google Scholar 

  75. SGMD. http://psi081.ba.ars.usda.gov/SGMD/default.htm

    Google Scholar 

  76. Maizearray. www.maizearray.org

    Google Scholar 

  77. Shen L, Gong J, Caldo RA, Nettleton D, Cook D, Wise RP, Dickerson JA (2005) BarleyBase — an expression profiling database for plant genomics. Nucleic Acids Res 33: D614–618

    PubMed  CrossRef  CAS  Google Scholar 

  78. Button DK, Gartland KM, Ball LD, Natanson L, Gartland JS, Lyon GD (2006) DRASTIC — INSIGHTS: querying information in a plant gene expression database. Nucleic Acids Res 34: D712–716

    PubMed  CrossRef  CAS  Google Scholar 

  79. www.expasy.org/ch2d/2d-index.html

    Google Scholar 

  80. Desiere F, Deutsch EW, Nesvizhskii AI, Mallick P, King NL, Eng JK, Aderem A, Boyle R, Brunner E, Donohoe S et al. (2005) Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol 6: R9

    PubMed  CrossRef  Google Scholar 

  81. SBEAMS. www.sbeams.org/

    Google Scholar 

  82. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13: 2498–2504

    PubMed  CrossRef  CAS  Google Scholar 

  83. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M et al. (1996) Life with 6000 genes. Science 274: 546, 563–567

    PubMed  CrossRef  CAS  Google Scholar 

  84. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF et al. (2000) The genome sequence of Drosophila melanogaster. Science 287: 2185–2195

    PubMed  CrossRef  Google Scholar 

  85. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815

    CrossRef  Google Scholar 

  86. Bevan M, Walsh S (2005) The Arabidopsis genome: a foundation for plant research. Genome Res 15: 1632–1642

    PubMed  CrossRef  CAS  Google Scholar 

  87. Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M et al. (2003) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302: 842–846

    PubMed  CrossRef  CAS  Google Scholar 

  88. DasGupta R, Kaykas A, Moon RT, Perrimon N (2005) Functional genomic analysis of the Wnt-wingless signaling pathway. Science 308: 826–833

    PubMed  CrossRef  CAS  Google Scholar 

  89. Aebersold R, Mann M (2003) Mass spectrometry-based proteomics. Nature 422: 198–207

    PubMed  CrossRef  CAS  Google Scholar 

  90. Kuster B, Schirle M, Mallick P, Aebersold R (2005) Scoring proteomes with proteotypic peptide probes. Nat Rev Mol Cell Biol 6: 577–583

    PubMed  CrossRef  CAS  Google Scholar 

  91. Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292: 929–934

    PubMed  CrossRef  CAS  Google Scholar 

  92. Chory J, Ecker JR, Briggs S, Caboche M, Coruzzi GM, Cook D, Dangl J, Grant S, Guerinot ML, Henikoff S et al. (2000) National Science Foundation-Sponsored Workshop Report: “The 2010 Project” functional genomics and the virtual plant. A blueprint for understanding how plants are built and how to improve them. Plant Physiol 123: 423–426

    PubMed  CrossRef  CAS  Google Scholar 

  93. Cheng J, Sun S, Tracy A, Hubbell E, Morris J, Valmeekam V, Kimbrough A, Cline MS, Liu G, Shigeta R et al. (2004) NetAffx Gene Ontology Mining Tool: a visual approach for microarray data analysis. Bioinformatics 20: 1462–1463

    PubMed  CrossRef  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2007 Birkhäuser Verlag/Switzerland

About this chapter

Cite this chapter

Ahrens, C.H., Wagner, U., Rehrauer, H.K., Türker, C., Schlapbach, R. (2007). Current challenges and approaches for the synergistic use of systems biology data in the scientific community. In: Baginsky, S., Fernie, A.R. (eds) Plant Systems Biology. Experientia Supplementum, vol 97. Birkhäuser Basel. https://doi.org/10.1007/978-3-7643-7439-6_12

Download citation