Data Standards for Omics Data: The Basis of Data Sharing and Reuse

  • Stephen A. Chervitz
  • Eric W. Deutsch
  • Dawn Field
  • Helen Parkinson
  • John Quackenbush
  • Phillipe Rocca-Serra
  • Susanna-Assunta Sansone
  • Christian J. StoeckertJr.
  • Chris F. Taylor
  • Ronald Taylor
  • Catherine A. Ball
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 719)

Abstract

To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.

Key words

Data sharing Data exchange Data standards MGED MIAME Ontology Data format Microarray Proteomics Metabolomics 

References

  1. 1.
    Boguski, M.S. (1999) Biosequence exegesis. Science 286(5439), 453–5.PubMedCrossRefGoogle Scholar
  2. 2.
    Brazma, A. (2001) On the importance of standardisation in life sciences. Bioinformatics 17(2), 113–4.PubMedCrossRefGoogle Scholar
  3. 3.
    Stoeckert, C.J., Jr., Causton, H.C., and Ball, C.A. (2002) Microarray databases: standards and ontologies. Nat Genet 32, 469–73.PubMedCrossRefGoogle Scholar
  4. 4.
    Brooksbank, C., and Quackenbush, J. (2006) Data standards: a call to action. OMICS 10(2), 94–9.PubMedCrossRefGoogle Scholar
  5. 5.
    Rogers, S., and Cambrosio, A. (2007) Making a new technology work: the standardization and regulation of microarrays. Yale J Biol Med 80(4), 165–78.PubMedGoogle Scholar
  6. 6.
    Warrington, J.A. (2008) Standard controls and protocols for microarray based assays in clinical applications, in Book of Genes and Medicine. Medical Do Co: Osaka.Google Scholar
  7. 7.
    Piwowar, H.A., et al. (2008) Towards a data sharing culture: recommendations for leadership from academic health center. PLoS Med 5(9), e183.PubMedCrossRefGoogle Scholar
  8. 8.
    Brazma, A., Krestyaninova, M., and Sarkans, U. (2006) Standards for systems biology. Nat Rev Genet 7(8), 593–605.PubMedCrossRefGoogle Scholar
  9. 9.
    Brazma, A., et al. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29(4), 365–71.PubMedCrossRefGoogle Scholar
  10. 10.
    Spellman, P.T., et al. (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 3(9), RESEARCH0046.PubMedCrossRefGoogle Scholar
  11. 11.
    Whetzel, P.L., et al. (2006) The MGED ontology: a resource for semantics-based description of microarray experiments. Bioinformatics 22(7), 866–73.PubMedCrossRefGoogle Scholar
  12. 12.
    Parkinson, H., et al. (2009) ArrayExpress update – from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 37(Database issue), D868–72.PubMedCrossRefGoogle Scholar
  13. 13.
    Parkinson, H., et al. (2007) ArrayExpress – a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35(Database issue), D747–50.PubMedCrossRefGoogle Scholar
  14. 14.
    Parkinson, H., et al. (2005) ArrayExpress – a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 33(Database issue), D553–5.PubMedCrossRefGoogle Scholar
  15. 15.
    Barrett, T., and Edgar, R. (2006) Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol 411, 352–69.PubMedCrossRefGoogle Scholar
  16. 16.
    Barrett, T., et al. (2005) NCBI GEO: mining millions of expression profiles – database and tools. Nucleic Acids Res 33(Database issue), D562–6.PubMedCrossRefGoogle Scholar
  17. 17.
    Barrett, T., et al. (2007) NCBI GEO: mining tens of millions of expression profiles – database and tools update. Nucleic Acids Res 35(Database issue), D760–5.PubMedCrossRefGoogle Scholar
  18. 18.
    Barrett, T., et al. (2009) NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 37(Database issue), D885–90.PubMedCrossRefGoogle Scholar
  19. 19.
    Taylor, C.F., et al. (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25(8), 887–93.PubMedCrossRefGoogle Scholar
  20. 20.
    Shi, L., et al. (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24(9), 1151–61.PubMedCrossRefGoogle Scholar
  21. 21.
    Taylor, C.F., et al. (2008) Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 26(8), 889–96.PubMedCrossRefGoogle Scholar
  22. 22.
    DeFrancesco, L. (2002) Journal trio embraces MIAME. Genome Biol 8(6), R112.Google Scholar
  23. 23.
    Jones, A.R., and Paton, N.W. (2005) An analysis of extensible modelling for functional genomics data. BMC Bioinformatics 6, 235.PubMedCrossRefGoogle Scholar
  24. 24.
    Ashburner, M., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1), 25–9.PubMedCrossRefGoogle Scholar
  25. 25.
    Smith, B., et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11), 1251–5.PubMedCrossRefGoogle Scholar
  26. 26.
    Salit, M. (2006) Standards in gene expression microarray experiments. Methods Enzymol 411, 63–78.PubMedCrossRefGoogle Scholar
  27. 27.
    Li, H., et al. (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–9.PubMedCrossRefGoogle Scholar
  28. 28.
    Brookes, A.J., et al. (2009) The phenotype and genotype experiment object model (PaGE-OM): a robust data structure for information related to DNA variation. Hum Mutat 30(6), 968–77.PubMedCrossRefGoogle Scholar
  29. 29.
    Brazma, A., and Parkinson, H. (2006) ArrayExpress service for reviewers/editors of DNA microarray papers. Nat Biotechnol 24(11), 1321–2.PubMedCrossRefGoogle Scholar
  30. 30.
    Rayner, T.F., et al. (2006) A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics 7, 489.PubMedGoogle Scholar
  31. 31.
    Rayner, T.F., et al. (2009) MAGETabulator, a suite of tools to support the microarray data format MAGE-TAB. Bioinformatics 25(2), 279–80.PubMedCrossRefGoogle Scholar
  32. 32.
    Manduchi, E., et al. (2004) RAD and the RAD Study-Annotator: an approach to collection, organization and exchange of all relevant information for high-throughput gene expression studies. Bioinformatics 20(4), 452–9.PubMedCrossRefGoogle Scholar
  33. 33.
    Ball, C.A., et al. (2005) The Stanford Microarray Database accommodates additional microarray platforms and data formats. Nucleic Acids Res 33(Database issue), D580–2.PubMedCrossRefGoogle Scholar
  34. 34.
    Demeter, J., et al. (2007) The Stanford Microarray Database: implementation of new analysis tools and open source release of software. Nucleic Acids Res 35(Database issue), D766–70.PubMedCrossRefGoogle Scholar
  35. 35.
    Gollub, J., et al. (2003) The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res 31(1), 94–6.PubMedCrossRefGoogle Scholar
  36. 36.
    Gollub, J., Ball, C.A., and Sherlock, G. (2006) The Stanford Microarray Database: a user’s guide. Methods Mol Biol 338, 191–208.PubMedGoogle Scholar
  37. 37.
    Hubble, J., et al. (2009) Implementation of GenePattern within the Stanford Microarray Database. Nucleic Acids Res 37(Database issue), D898–901.PubMedCrossRefGoogle Scholar
  38. 38.
    Sherlock, G., et al. (2001) The Stanford Microarray Database. Nucleic Acids Res 29(1), 152–5.PubMedCrossRefGoogle Scholar
  39. 39.
    Navarange, M., et al. (2005) MiMiR: a comprehensive solution for storage, annotation and exchange of microarray data. BMC Bioinformatics 6, 268.PubMedCrossRefGoogle Scholar
  40. 40.
    Allison, M. (2008) Is personalized medicine finally arriving? Nat Biotechnol 26(5), 509–17.PubMedCrossRefGoogle Scholar
  41. 41.
    Orchard, S., and Hermjakob, H. (2008) The HUPO proteomics standards initiative – easing communication and minimizing data loss in a changing world. Brief Bioinform 9(2), 166–73.PubMedCrossRefGoogle Scholar
  42. 42.
    Pedrioli, P.G., et al. (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol 22(11), 1459–66.PubMedCrossRefGoogle Scholar
  43. 43.
    Keller, A., et al. (2005) A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 1, 0017.PubMedCrossRefGoogle Scholar
  44. 44.
    Deutsch, E. (2008) mzML: a single, unifying data format for mass spectrometer output. Proteomics 8(14), 2776–7.PubMedCrossRefGoogle Scholar
  45. 45.
    Deutsch, E.W., Lam, H., and Aebersold, R. (2008) Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. Physiol Genomics 33(1), 18–25.PubMedCrossRefGoogle Scholar
  46. 46.
    Orchard, S., et al. (2007) The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat Biotechnol 25(8), 894–8.PubMedCrossRefGoogle Scholar
  47. 47.
    Kerrien, S., et al. (2007) Broadening the horizon – level 2.5 of the HUPO-PSI format for molecular interactions. BMC Biol 5, 44.PubMedCrossRefGoogle Scholar
  48. 48.
    Fiehn, O., et al. (2006) Establishing reporting standards for metabolomic and metabonomic studies: a call for participation. OMICS 10(2), 158–63.PubMedCrossRefGoogle Scholar
  49. 49.
    Sansone, S.A., et al. (2007) The metabolomics standards initiative. Nat Biotechnol 25(8), 846–8.PubMedCrossRefGoogle Scholar
  50. 50.
    Goodacre, R., et al. (2007) Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics 3(3), 231–41.CrossRefGoogle Scholar
  51. 51.
    Hardy, N., and Taylor, C. (2007) A roadmap for the establishment of standard data exchange structures for metabolomics. Metabolomics 3(3), 243–8.CrossRefGoogle Scholar
  52. 52.
    Jenkins, H., Johnson, H., Kular, B., Wang, T., and Hardy, N. (2005) Toward supportive data collection tools for plant metabolomics. Plant Physiol 138(1), 67–77.PubMedCrossRefGoogle Scholar
  53. 53.
    Jenkins, H., et al. (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nat Biotechnol 22(12), 1601–6.PubMedCrossRefGoogle Scholar
  54. 54.
    Spasic, I., et al. (2006) MeMo: a hybrid SQL/XML approach to metabolomic data management for functional genomics. BMC Bioinformatics 7, 281.PubMedCrossRefGoogle Scholar
  55. 55.
    Sansone, S.-A., Schober, D., Atherton, H., Fiehn, O., Jenkins, H., Rocca-Serra, P., et al. (2007) Metabolomics standards initiative: ontology working group work in progress. Metabolomics 3(3), 249–56.CrossRefGoogle Scholar
  56. 56.
    Jenkins, H., Hardy, N., Beckmann, M., Draper, J., Smith, A.R., Taylor, J., et al. (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nat Biotechnol 22(12), 1601–6.PubMedCrossRefGoogle Scholar
  57. 57.
    Kumar, D. (2007) From evidence-based medicine to genomic medicine. Genomic Med 1(3–4), 95–104.PubMedCrossRefGoogle Scholar
  58. 58.
    Fostel, J.M. (2008) Towards standards for data exchange and integration and their impact on a public database such as CEBS (Chemical Effects in Biological Systems). Toxicol Appl Pharmacol 233(1), 54–62.PubMedCrossRefGoogle Scholar
  59. 59.
    Bland, P.H., Laderach, G.E., and Meyer, C.R. (2007) A web-based interface for communication of data between the clinical and research environments without revealing identifying information. Acad Radiol 14(6), 757–64.PubMedCrossRefGoogle Scholar
  60. 60.
    Meslin, E.M. (2006) Shifting paradigms in health services research ethics. Consent, privacy, and the challenges for IRBs. J Gen Intern Med 21(3), 279–80.PubMedCrossRefGoogle Scholar
  61. 61.
    Ferris, T.A., Garrison, G.M., and Lowe, H.J. (2002) A proposed key escrow system for secure patient information disclosure in biomedical research databases. Proc AMIA Symp, 245–9.Google Scholar
  62. 62.
    Quackenbush, J., et al. (2006) Top-down standards will not serve systems biology. Nature 440(7080), 24.PubMedCrossRefGoogle Scholar
  63. 63.
    Jones, A.R., et al. (2007) The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics. Nat Biotechnol 25(10), 1127–33.PubMedCrossRefGoogle Scholar
  64. 64.
    Sansone, S.A., et al. (2008) The first RSBI (ISA-TAB) workshop: “can a simple format work for complex studies?” OMICS 12(2), 143–9.PubMedCrossRefGoogle Scholar
  65. 65.
    Sansone, S.A., et al. (2006) A strategy capitalizing on synergies: the Reporting Structure for Biological Investigation (RSBI) working group. OMICS 10(2), 164–71.PubMedCrossRefGoogle Scholar
  66. 66.
    Whetzel, P.L., et al. (2006) Development of FuGO: an ontology for functional genomics investigations. OMICS 10(2), 199–204.PubMedCrossRefGoogle Scholar
  67. 67.
    Smith, B., et al. (2005) Relations in biomedical ontologies. Genome Biol 6(5), R46.PubMedCrossRefGoogle Scholar
  68. 68.
    Rubin, D.L., et al. (2006) National Center for Biomedical Ontology: advancing biomedicine through structured organization of scientific knowledge. OMICS 10(2), 185–98.PubMedCrossRefGoogle Scholar
  69. 69.
    Piwowar, H.A., and Chapman, W.W. (2008) Identifying data sharing in biomedical literature. AMIA Annu Symp Proc, 596–600.Google Scholar
  70. 70.
    Galperin, M.Y., and Cochrane, G.R. (2009) Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009. Nucleic Acids Res 37(Database issue), D1–4.PubMedCrossRefGoogle Scholar
  71. 71.
    Ruttenberg, A., et al. (2007) Advancing translational research with the Semantic Web. BMC Bioinformatics (8 Suppl 3), S2.Google Scholar
  72. 72.
    Sagotsky, J.A., et al. (2008) Life Sciences and the web: a new era for collaboration. Mol Syst Biol 4, 201.PubMedCrossRefGoogle Scholar
  73. 73.
    Stein, L.D. (2008) Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Rev Genet 9(9), 678–88.PubMedCrossRefGoogle Scholar
  74. 74.
    Day, A., et al. (2007) Celsius: a community resource for Affymetrix microarray data. Genome Biol 8(6), R112.PubMedCrossRefGoogle Scholar
  75. 75.
    Ochsner, S.A., et al. (2008) Much room for improvement in deposition rates of expression microarray datasets. Nat Methods 5(12), 991.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Stephen A. Chervitz
    • 1
  • Eric W. Deutsch
    • 2
  • Dawn Field
    • 3
  • Helen Parkinson
    • 4
  • John Quackenbush
    • 5
  • Phillipe Rocca-Serra
    • 4
  • Susanna-Assunta Sansone
    • 4
  • Christian J. StoeckertJr.
    • 6
  • Chris F. Taylor
    • 4
  • Ronald Taylor
    • 7
  • Catherine A. Ball
    • 8
  1. 1.Affymetrix, Inc.Santa ClaraUSA
  2. 2.Institute for Systems BiologySeattleUSA
  3. 3.NERC Centre for Ecology and HydrologyOxfordUK
  4. 4.EMBL-EBICambridgeUK
  5. 5.Department of BiostatisticsDana-Farber Cancer InstituteBostonUSA
  6. 6.Department of Genetics and Center for BioinformaticsUniversity of Pennsylvania School of MedicinePhiladelphiaUSA
  7. 7.Computational Biology & Bioinformatics GroupPacific Northwest National LaboratoryRichlandUSA
  8. 8.Department of GeneticsStanford University School of MedicineStanfordUSA

Personalised recommendations