Annotation-Based Feature Extraction from Sets of SBML Models

  • Rebekka Alm
  • Dagmar Waltemath
  • Olaf Wolkenauer
  • Ron Henkel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8574)


Model repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics can then help to classify models, to identify additional features for model retrieval tasks, or to enable the comparison of sets of models. In this paper, we present four methods for annotation-based feature extraction from model sets. All methods have been used with four different model sets in SBML format and taken from BioModels Database. To characterize each of these sets, we analyzed and extracted concepts from three frequently used ontologies for SBML models, namely Gene Ontology, ChEBI and SBO. We find that three of the four tested methods are suitable to determine characteristic features for model sets. The selected features vary depending on the underlying model set, and they are also specific to the chosen model set. We show that the identified features map on concepts that are higher up in the hierarchy of the ontologies than the concepts used for model annotations. Our analysis also reveals that the information content of concepts in ontologies and their usage for model annotation do not correlate.


Gene Ontology Feature Extraction Model Retrieval Semantic Annotation Model Annotation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Le Novère, N., et al.: Meeting report from the first meetings of the Computational Modeling in Biology Network (COMBINE). Standards in Genomic Sciences 5(2), 230 (2011)CrossRefGoogle Scholar
  2. 2.
    Hucka, M., et al.: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19(4), 524–531 (2003)CrossRefGoogle Scholar
  3. 3.
    Courtot, M., et al.: Controlled vocabularies and semantics in systems biology. Molecular Systems Biology 7(1) (2011)Google Scholar
  4. 4.
    Robinson, P.N., Bauer, S.: Introduction to Bio-ontologies. Taylor & Francis, US (2011)Google Scholar
  5. 5.
    Li, C., et al.: BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models. BMC Systems Biology 4(1), 92 (2010)CrossRefGoogle Scholar
  6. 6.
    Henkel, R., et al.: Ranked retrieval of Computational Biology models. BMC Bioinformatics 11(1), 423 (2010)MathSciNetGoogle Scholar
  7. 7.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press Books (1999)Google Scholar
  8. 8.
    Waltemath, D., et al.: SBML Level 3 Package Proposal: Annot. Nature Preceedings (2011),
  9. 9.
    Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)CrossRefGoogle Scholar
  10. 10.
    Hastings, J., et al.: The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res. 41, D456–D463 (2013)Google Scholar
  11. 11.
    Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, San Francisco, CA, USA, pp. 412–420. Morgan Kaufmann Publishers Inc. (1997)Google Scholar
  12. 12.
    Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)zbMATHGoogle Scholar
  13. 13.
    Hastie, T., Tibshirani, R., Friedman, J.: Hierarchical Clustering. In: The Elements of Statistical Learning, pp. 520–528. Springer (2009)Google Scholar
  14. 14.
    Li, Y., et al.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering 15(4), 871–882 (2003)CrossRefGoogle Scholar
  15. 15.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 445–453 (1995)Google Scholar
  16. 16.
    Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)zbMATHGoogle Scholar
  17. 17.
    Trißl, S., Hussels, P., Leser, U.: InterOnto – Ranking Inter-Ontology Links. In: Bodenreider, O., Rance, B. (eds.) DILS 2012. LNCS, vol. 7348, pp. 5–20. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    McGuinness, D.L., et al.: Owl web ontology language overview. W3C Recommendation 10(2004-03) (2004)Google Scholar
  19. 19.
    Henkel, R., Wolkenhauer, O., Walthemath, D.: Combining computational models, semantic annotations, and associated simulation experiments in a graph database. Peer J. Preprints (2:e376v1) (2014)Google Scholar
  20. 20.
    Waltemath, D., et al.: Possibilities for Integrating Model-related Data in Computational Biology. In: CEUR Workshop Proceedings of the 9th International Conference on Data Integration in the Life Sciences (2013),
  21. 21.
    Henkel, R., et al.: Considerations of graph-based concepts to manage computational biology models and associated simulations. In: GI-Jahrestagung, pp. 1545–1551 (2012)Google Scholar
  22. 22.
    Waltemath, D., et al.: Das Sombi-Framework zum Ermitteln geeigneter Suchfunktionen für biologische Modelldatenbasen. Datenbank-Spektrum 11(1), 27–36 (2011)CrossRefGoogle Scholar
  23. 23.
    Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research Logistics Quarterly 2(1-2), 83–97 (1955)CrossRefMathSciNetGoogle Scholar
  24. 24.
    Cuellar, A.A., et al.: An overview of CellML 1.1, a biological model description language. Simulation 79(12), 740–747 (2003)CrossRefGoogle Scholar
  25. 25.
    Gleeson, P., et al.: NeuroML: a language for describing data driven models of neurons and networks with a high degree of biological detail. PLoS Computational Biology 6(6), e1000815 (2010)Google Scholar
  26. 26.
    Schomburg, I., et al.: BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA. Nucleic Acids Research 41(D1), D764–D772 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Rebekka Alm
    • 1
    • 2
  • Dagmar Waltemath
    • 3
  • Olaf Wolkenauer
    • 3
    • 4
  • Ron Henkel
    • 3
  1. 1.Dept. of Multimedia CommunicationUniversity of RostockGermany
  2. 2.Fraunhofer Institute for Computer Graphics RostockGermany
  3. 3.Dept. of Systems Biology and BioinformaticsUniversity of RostockGermany
  4. 4.Stellenbosch Institute for Advanced Study (STIAS)Wallenberg Research Centre at Stellenbosch UniversityStellenboshSouth Africa

Personalised recommendations