An Open Repository Model for Acquiring Knowledge About Scientific Experiments

  • Martin J. O’ConnorEmail author
  • Marcos Martínez-Romero
  • Attila L. Egyedi
  • Debra Willrett
  • John Graybeal
  • Mark A. Musen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10024)


The availability of high-quality metadata is key to facilitating discovery in the large variety of scientific datasets that are increasingly becoming publicly available. However, despite the recent focus on metadata, the diversity of metadata representation formats and the poor support for semantic markup typically result in metadata that are of poor quality. There is a pressing need for a metadata representation format that provides strong interoperation capabilities together with robust semantic underpinnings. In this paper, we describe such a format, together with open-source Web-based tools that support the acquisition, search, and management of metadata. We outline an initial evaluation using metadata from a variety of biomedical repositories.


Control Term Template Model Link Open Data Metadata Model Metadata Repository 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



CEDAR is supported by the National Institutes of Health through an NIH Big Data to Knowledge program under grant 1U54AI117925. NCBO is supported by the NIH Common Fund under grant U54HG004028. We appreciate the collaborations offered by the ImmPort, BioSharing, HIPC, and LINCS communities.


  1. 1.
    Borgman, C.L.: The conundrum of sharing research data. J. Am. Soc. Inform. Sci. Technol. 63(6), 1059–1078 (2012)CrossRefGoogle Scholar
  2. 2.
    Tenenbaum, J.D., Sansone, S.-A., Haendel, M.A.: A sea of standards for omics data: sink or swim? JAMIA 21(2), 200–203 (2014)Google Scholar
  3. 3.
    Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)CrossRefGoogle Scholar
  4. 4.
    BioSample. Accessed 15 Sept 2016
  5. 5.
    Bhattacharya, S., et al.: ImmPort: disseminating data to the public for the future of immunology. Immunol. Res. 58(2–3), 234–239 (2014)CrossRefGoogle Scholar
  6. 6.
    Musen, M.A., et al.: The center for expanded data annotation and retrieval. J. Am. Med. Inform. Assoc. 22(6), 1148–1152 (2015)Google Scholar
  7. 7.
    BD2K. Accessed 15 Sept 2016
  8. 8.
    Sansone, S.-A., Rocca-Serra, P., Field, D., et al.: Toward interoperable bioscience data. Nat. Genet. 44(2), 121–126 (2012)CrossRefGoogle Scholar
  9. 9.
    Rocca-Serra, P., Brandizi, M., Maquire, E., et al.: ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics 26(18), 2354–2356 (2010)CrossRefGoogle Scholar
  10. 10.
    Rayner, T.D., et al.: A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinform. 7(1), 489 (2006)CrossRefGoogle Scholar
  11. 11.
    Wilkinson, M.D., et al.: The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3(1), 160018 (2016)CrossRefGoogle Scholar
  12. 12.
    Nosek, B.A., et al.: Promoting an open research culture. Science 6242(348), 1422–1425 (2015)CrossRefGoogle Scholar
  13. 13.
    JSON Schema. Accessed 15 Sept 2016
  14. 14.
    JSON-LD. Accessed 15 Sept 2016
  15. 15.
    Musen, M.A., Noy, N.F., Shah, N.H., et al.: The national center for biomedical ontology. JAMIA 19(2), 190–195 (2012)Google Scholar
  16. 16.
    Maecker, H., et al.: Standardizing immunophenotyping for the human immunology project. Nat. Rev. Immunol. 12(3), 191–200 (2012)Google Scholar
  17. 17.
    LINCS. Accessed 15 Sept 2016
  18. 18.
    Panahiazar, M., et al.: Context aware recommendation engine for metadata submission. In: Workshop on Capturing Scientific Knowledge (2015)Google Scholar
  19. 19.
    Motik, B., Horrocks, I., Sattler, U.: Adding integrity constraints to OWL. In: OWLED, vol. 258 (2007)Google Scholar
  20. 20.
    SHACL. Accessed 15 Sept 2016
  21. 21.
    JSON-LD Use Cases. Accessed 15 Sept 2016
  22. 22.
    CEDAR GitHub Organization. Accessed 15 Sept 2016

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Martin J. O’Connor
    • 1
    Email author
  • Marcos Martínez-Romero
    • 1
  • Attila L. Egyedi
    • 1
  • Debra Willrett
    • 1
  • John Graybeal
    • 1
  • Mark A. Musen
    • 1
  1. 1.Stanford Center for Biomedical Informatics ResearchStanfordUSA

Personalised recommendations