Skip to main content

Data Standards for Omics Data: The Basis of Data Sharing and Reuse

  • Protocol
  • First Online:
Book cover Bioinformatics for Omics Data

Abstract

To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Boguski, M.S. (1999) Biosequence exegesis. Science 286(5439), 453–5.

    Article  PubMed  CAS  Google Scholar 

  2. Brazma, A. (2001) On the importance of standardisation in life sciences. Bioinformatics 17(2), 113–4.

    Article  PubMed  CAS  Google Scholar 

  3. Stoeckert, C.J., Jr., Causton, H.C., and Ball, C.A. (2002) Microarray databases: standards and ontologies. Nat Genet 32, 469–73.

    Article  PubMed  CAS  Google Scholar 

  4. Brooksbank, C., and Quackenbush, J. (2006) Data standards: a call to action. OMICS 10(2), 94–9.

    Article  PubMed  CAS  Google Scholar 

  5. Rogers, S., and Cambrosio, A. (2007) Making a new technology work: the standardization and regulation of microarrays. Yale J Biol Med 80(4), 165–78.

    PubMed  CAS  Google Scholar 

  6. Warrington, J.A. (2008) Standard controls and protocols for microarray based assays in clinical applications, in Book of Genes and Medicine. Medical Do Co: Osaka.

    Google Scholar 

  7. Piwowar, H.A., et al. (2008) Towards a data sharing culture: recommendations for leadership from academic health center. PLoS Med 5(9), e183.

    Article  PubMed  Google Scholar 

  8. Brazma, A., Krestyaninova, M., and Sarkans, U. (2006) Standards for systems biology. Nat Rev Genet 7(8), 593–605.

    Article  PubMed  CAS  Google Scholar 

  9. Brazma, A., et al. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29(4), 365–71.

    Article  PubMed  CAS  Google Scholar 

  10. Spellman, P.T., et al. (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 3(9), RESEARCH0046.

    Article  PubMed  Google Scholar 

  11. Whetzel, P.L., et al. (2006) The MGED ontology: a resource for semantics-based description of microarray experiments. Bioinformatics 22(7), 866–73.

    Article  PubMed  CAS  Google Scholar 

  12. Parkinson, H., et al. (2009) ArrayExpress update – from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 37(Database issue), D868–72.

    Article  PubMed  CAS  Google Scholar 

  13. Parkinson, H., et al. (2007) ArrayExpress – a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35(Database issue), D747–50.

    Article  PubMed  CAS  Google Scholar 

  14. Parkinson, H., et al. (2005) ArrayExpress – a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 33(Database issue), D553–5.

    Article  PubMed  CAS  Google Scholar 

  15. Barrett, T., and Edgar, R. (2006) Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411, 352–69.

    Article  PubMed  CAS  Google Scholar 

  16. Barrett, T., et al. (2005) NCBI GEO: mining millions of expression profiles – database and tools. Nucleic Acids Res 33(Database issue), D562–6.

    Article  PubMed  CAS  Google Scholar 

  17. Barrett, T., et al. (2007) NCBI GEO: mining tens of millions of expression profiles – database and tools update. Nucleic Acids Res 35(Database issue), D760–5.

    Article  PubMed  CAS  Google Scholar 

  18. Barrett, T., et al. (2009) NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 37(Database issue), D885–90.

    Article  PubMed  CAS  Google Scholar 

  19. Taylor, C.F., et al. (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25(8), 887–93.

    Article  PubMed  CAS  Google Scholar 

  20. Shi, L., et al. (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24(9), 1151–61.

    Article  PubMed  CAS  Google Scholar 

  21. Taylor, C.F., et al. (2008) Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 26(8), 889–96.

    Article  PubMed  CAS  Google Scholar 

  22. DeFrancesco, L. (2002) Journal trio embraces MIAME. Genome Biol 8(6), R112.

    Google Scholar 

  23. Jones, A.R., and Paton, N.W. (2005) An analysis of extensible modelling for functional genomics data. BMC Bioinformatics 6, 235.

    Article  PubMed  Google Scholar 

  24. Ashburner, M., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1), 25–9.

    Article  PubMed  CAS  Google Scholar 

  25. Smith, B., et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11), 1251–5.

    Article  PubMed  CAS  Google Scholar 

  26. Salit, M. (2006) Standards in gene expression microarray experiments. Methods Enzymol 411, 63–78.

    Article  PubMed  CAS  Google Scholar 

  27. Li, H., et al. (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–9.

    Article  PubMed  Google Scholar 

  28. Brookes, A.J., et al. (2009) The phenotype and genotype experiment object model (PaGE-OM): a robust data structure for information related to DNA variation. Hum Mutat 30(6), 968–77.

    Article  PubMed  CAS  Google Scholar 

  29. Brazma, A., and Parkinson, H. (2006) ArrayExpress service for reviewers/editors of DNA microarray papers. Nat Biotechnol 24(11), 1321–2.

    Article  PubMed  CAS  Google Scholar 

  30. Rayner, T.F., et al. (2006) A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics 7, 489.

    PubMed  Google Scholar 

  31. Rayner, T.F., et al. (2009) MAGETabulator, a suite of tools to support the microarray data format MAGE-TAB. Bioinformatics 25(2), 279–80.

    Article  PubMed  CAS  Google Scholar 

  32. Manduchi, E., et al. (2004) RAD and the RAD Study-Annotator: an approach to collection, organization and exchange of all relevant information for high-throughput gene expression studies. Bioinformatics 20(4), 452–9.

    Article  PubMed  CAS  Google Scholar 

  33. Ball, C.A., et al. (2005) The Stanford Microarray Database accommodates additional microarray platforms and data formats. Nucleic Acids Res 33(Database issue), D580–2.

    Article  PubMed  CAS  Google Scholar 

  34. Demeter, J., et al. (2007) The Stanford Microarray Database: implementation of new analysis tools and open source release of software. Nucleic Acids Res 35(Database issue), D766–70.

    Article  PubMed  CAS  Google Scholar 

  35. Gollub, J., et al. (2003) The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res 31(1), 94–6.

    Article  PubMed  CAS  Google Scholar 

  36. Gollub, J., Ball, C.A., and Sherlock, G. (2006) The Stanford Microarray Database: a user’s guide. Methods Mol Biol 338, 191–208.

    PubMed  CAS  Google Scholar 

  37. Hubble, J., et al. (2009) Implementation of GenePattern within the Stanford Microarray Database. Nucleic Acids Res 37(Database issue), D898–901.

    Article  PubMed  CAS  Google Scholar 

  38. Sherlock, G., et al. (2001) The Stanford Microarray Database. Nucleic Acids Res 29(1), 152–5.

    Article  PubMed  CAS  Google Scholar 

  39. Navarange, M., et al. (2005) MiMiR: a comprehensive solution for storage, annotation and exchange of microarray data. BMC Bioinformatics 6, 268.

    Article  PubMed  Google Scholar 

  40. Allison, M. (2008) Is personalized medicine finally arriving? Nat Biotechnol 26(5), 509–17.

    Article  PubMed  CAS  Google Scholar 

  41. Orchard, S., and Hermjakob, H. (2008) The HUPO proteomics standards initiative – easing communication and minimizing data loss in a changing world. Brief Bioinform 9(2), 166–73.

    Article  PubMed  CAS  Google Scholar 

  42. Pedrioli, P.G., et al. (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol 22(11), 1459–66.

    Article  PubMed  CAS  Google Scholar 

  43. Keller, A., et al. (2005) A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 1, 0017.

    Article  PubMed  Google Scholar 

  44. Deutsch, E. (2008) mzML: a single, unifying data format for mass spectrometer output. Proteomics 8(14), 2776–7.

    Article  PubMed  CAS  Google Scholar 

  45. Deutsch, E.W., Lam, H., and Aebersold, R. (2008) Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. Physiol Genomics 33(1), 18–25.

    Article  PubMed  CAS  Google Scholar 

  46. Orchard, S., et al. (2007) The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat Biotechnol 25(8), 894–8.

    Article  PubMed  CAS  Google Scholar 

  47. Kerrien, S., et al. (2007) Broadening the horizon – level 2.5 of the HUPO-PSI format for molecular interactions. BMC Biol 5, 44.

    Article  PubMed  Google Scholar 

  48. Fiehn, O., et al. (2006) Establishing reporting standards for metabolomic and metabonomic studies: a call for participation. OMICS 10(2), 158–63.

    Article  PubMed  CAS  Google Scholar 

  49. Sansone, S.A., et al. (2007) The metabolomics standards initiative. Nat Biotechnol 25(8), 846–8.

    Article  PubMed  CAS  Google Scholar 

  50. Goodacre, R., et al. (2007) Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics 3(3), 231–41.

    Article  CAS  Google Scholar 

  51. Hardy, N., and Taylor, C. (2007) A roadmap for the establishment of standard data exchange structures for metabolomics. Metabolomics 3(3), 243–8.

    Article  CAS  Google Scholar 

  52. Jenkins, H., Johnson, H., Kular, B., Wang, T., and Hardy, N. (2005) Toward supportive data collection tools for plant metabolomics. Plant Physiol 138(1), 67–77.

    Article  PubMed  CAS  Google Scholar 

  53. Jenkins, H., et al. (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nat Biotechnol 22(12), 1601–6.

    Article  PubMed  CAS  Google Scholar 

  54. Spasic, I., et al. (2006) MeMo: a hybrid SQL/XML approach to metabolomic data management for functional genomics. BMC Bioinformatics 7, 281.

    Article  PubMed  Google Scholar 

  55. Sansone, S.-A., Schober, D., Atherton, H., Fiehn, O., Jenkins, H., Rocca-Serra, P., et al. (2007) Metabolomics standards initiative: ontology working group work in progress. Metabolomics 3(3), 249–56.

    Article  CAS  Google Scholar 

  56. Jenkins, H., Hardy, N., Beckmann, M., Draper, J., Smith, A.R., Taylor, J., et al. (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nat Biotechnol 22(12), 1601–6.

    Article  PubMed  CAS  Google Scholar 

  57. Kumar, D. (2007) From evidence-based medicine to genomic medicine. Genomic Med 1(3–4), 95–104.

    Article  PubMed  Google Scholar 

  58. Fostel, J.M. (2008) Towards standards for data exchange and integration and their impact on a public database such as CEBS (Chemical Effects in Biological Systems). Toxicol Appl Pharmacol 233(1), 54–62.

    Article  PubMed  CAS  Google Scholar 

  59. Bland, P.H., Laderach, G.E., and Meyer, C.R. (2007) A web-based interface for communication of data between the clinical and research environments without revealing identifying information. Acad Radiol 14(6), 757–64.

    Article  PubMed  Google Scholar 

  60. Meslin, E.M. (2006) Shifting paradigms in health services research ethics. Consent, privacy, and the challenges for IRBs. J Gen Intern Med 21(3), 279–80.

    Article  PubMed  Google Scholar 

  61. Ferris, T.A., Garrison, G.M., and Lowe, H.J. (2002) A proposed key escrow system for secure patient information disclosure in biomedical research databases. Proc AMIA Symp, 245–9.

    Google Scholar 

  62. Quackenbush, J., et al. (2006) Top-down standards will not serve systems biology. Nature 440(7080), 24.

    Article  PubMed  CAS  Google Scholar 

  63. Jones, A.R., et al. (2007) The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics. Nat Biotechnol 25(10), 1127–33.

    Article  PubMed  CAS  Google Scholar 

  64. Sansone, S.A., et al. (2008) The first RSBI (ISA-TAB) workshop: “can a simple format work for complex studies?” OMICS 12(2), 143–9.

    Article  PubMed  CAS  Google Scholar 

  65. Sansone, S.A., et al. (2006) A strategy capitalizing on synergies: the Reporting Structure for Biological Investigation (RSBI) working group. OMICS 10(2), 164–71.

    Article  PubMed  CAS  Google Scholar 

  66. Whetzel, P.L., et al. (2006) Development of FuGO: an ontology for functional genomics investigations. OMICS 10(2), 199–204.

    Article  PubMed  CAS  Google Scholar 

  67. Smith, B., et al. (2005) Relations in biomedical ontologies. Genome Biol 6(5), R46.

    Article  PubMed  Google Scholar 

  68. Rubin, D.L., et al. (2006) National Center for Biomedical Ontology: advancing biomedicine through structured organization of scientific knowledge. OMICS 10(2), 185–98.

    Article  PubMed  CAS  Google Scholar 

  69. Piwowar, H.A., and Chapman, W.W. (2008) Identifying data sharing in biomedical literature. AMIA Annu Symp Proc, 596–600.

    Google Scholar 

  70. Galperin, M.Y., and Cochrane, G.R. (2009) Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009. Nucleic Acids Res 37(Database issue), D1–4.

    Article  PubMed  CAS  Google Scholar 

  71. Ruttenberg, A., et al. (2007) Advancing translational research with the Semantic Web. BMC Bioinformatics (8 Suppl 3), S2.

    Google Scholar 

  72. Sagotsky, J.A., et al. (2008) Life Sciences and the web: a new era for collaboration. Mol Syst Biol 4, 201.

    Article  PubMed  Google Scholar 

  73. Stein, L.D. (2008) Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Rev Genet 9(9), 678–88.

    Article  PubMed  CAS  Google Scholar 

  74. Day, A., et al. (2007) Celsius: a community resource for Affymetrix microarray data. Genome Biol 8(6), R112.

    Article  PubMed  Google Scholar 

  75. Ochsner, S.A., et al. (2008) Much room for improvement in deposition rates of expression microarray datasets. Nat Methods 5(12), 991.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Catherine A. Ball .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Chervitz, S.A. et al. (2011). Data Standards for Omics Data: The Basis of Data Sharing and Reuse. In: Mayer, B. (eds) Bioinformatics for Omics Data. Methods in Molecular Biology, vol 719. Humana Press. https://doi.org/10.1007/978-1-61779-027-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-027-0_2

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-026-3

  • Online ISBN: 978-1-61779-027-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics