Semantic Data and Models Sharing in Systems Biology: The Just Enough Results Model and the SEEK Platform
Research in Systems Biology involves integrating data and knowledge about the dynamic processes in biological systems in order to understand and model them. Semantic web technologies should be ideal for exploring the complex networks of genes, proteins and metabolites that interact, but much of this data is not natively available to the semantic web. Data is typically collected and stored with free-text annotations in spreadsheets, many of which do not conform to existing metadata standards and are often not publically released.
Along with initiatives to promote more data sharing, one of the main challenges is therefore to semantically annotate and extract this data so that it is available to the research community. Data annotation and curation are expensive and undervalued tasks that have enormous benefits to the discipline as a whole, but fewer benefits to the individual data producers.
By embedding semantic annotation into spreadsheets, however, and automatically extracting this data into RDF at the time of repository submission, the process of producing standards-compliant data, that is available for semantic web querying, can be achieved without adding additional overheads to laboratory data management. This paper describes these strategies in the context of semantic data management in the SEEK. The SEEK is a web-based resource for sharing and exchanging Systems Biology data and models that is underpinned by the JERM ontology (Just Enough Results Model), which describes the relationships between data, models, protocols and experiments. The SEEK was originally developed for SysMO, a large European Systems Biology consortium studying micro-organisms, but it has since had widespread adoption across European Systems Biology.
KeywordsSemantic Systems Biology Semantic Data Management OWL Ontology RDF Extraction from spreadsheets Standard Metadata
- 1.Antezana, E., Blonde, W., Egana, M., Rutherford, A., Stevens, R., De Baets, B., Mironov, V., Kuiper, M.: BioGateway: a semantic Systems Biology tool for the life sciences. BMC Bioinformatics (10 suppl. 10), S11 (2009)Google Scholar
- 2.Chen, B., Dong, X., Jiao, D., Wang, H., Zhu, Q., Ding, Y., Wild, D.J.: Chem2Bio2RDF: a semantic framework for linking and data mining chemo-genomic and systems chemical biology data. BMC Bioinformatics 11, 255 (2010)Google Scholar
- 3.Courtot, M., Juty, N., Knupfer, C., Waltemath, D., Zhukova, A., Drager, A., Dumontier, M., Finney, A., Golebiewski, M., Hastings, J., et al.: Controlled vocabularies and semantics in Systems Biology. Mol. Syst. Biol. 7, 543 (2011)Google Scholar
- 5.Parkinson, H., Kapushesky, M., Kolesnikov, N., Rustici, G., Shojatalab, M., Abeygunawardena, N., Berube, H., Dylag, M., Emam, I., Farne, A., et al.: ArrayExpress update–from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res. 37, D868–D872 (2009)Google Scholar
- 7.Wolstencroft, K., Owen, S., du Preez, F., Krebs, O., Mueller, W., Goble, C., Snoep, J.L.: The SEEK: a platform for sharing data and models in Systems Biology. Methods Enzymol. 500, 629–655 (2011)Google Scholar
- 9.Wolstencroft, K., Owen, S., Horridge, M., Krebs, O., Mueller, W., Snoep, J.L., du Preez, F., Goble, C.: RightField: embedding ontology annotation in spreadsheets. Bioinformatics 27, 2021–2022 (2011)Google Scholar
- 10.Taylor, C.F., Field, D., Sansone, S.A., Aerts, J., Apweiler, R., Ashburner, M., Ball, C.A., Binz, P.A., Bogue, M., Booth, T., et al.: Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat. Biotechnol. 26, 889–896 (2008)CrossRefGoogle Scholar
- 11.Gray, J., Szalay, A.: Microsoft Research. Microsoft Corporation (2004)Google Scholar
- 12.Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 40, D71–D75 (2012)Google Scholar
- 13.Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., et al.: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004)Google Scholar
- 14.Rocca-Serra, P., Brandizi, M., Maguire, E., Sklyar, N., Taylor, C., Begley, K., Field, D., Harris, S., Hide, W., Hofmann, O., et al.: ISA software suite: supporting standards-compliant experimental annotation and enabling cura-tion at the community level. Bioinformatics 26, 2354–2356 (2010)Google Scholar
- 16.Dreher, F., Kreitler, T., Hardt, C., Kamburov, A., Yildirimman, R., Schellander, K., Lehrach, H., Lange, B.M., Herwig, R.: DIPSBC–data integration platform for Systems Biology collaborations. BMC Bioinformatics 13, 85 (2012)Google Scholar