From a Conceptual Model to a Knowledge Graph for Genomic Datasets

  • Anna BernasconiEmail author
  • Arif Canakoglu
  • Stefano Ceri
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11788)


Data access at genomic repositories is problematic, as data is described by heterogeneous and hardly comparable metadata. We previously introduced a unified conceptual schema, collected metadata in a single repository and provided classical search methods upon them. We here propose a new paradigm to support semantic search of integrated genomic metadata, based on the Genomic Knowledge Graph, a semantic graph of genomic terms and concepts, which combines the original information provided by each source with curated terminological content from specialized ontologies.

Commercial knowledge-assisted search is designed for transparently supporting keyword-based search without explaining inferences; in biology, inference understanding is instead critical. For this reason, we propose a graph-based visual search for data exploration; some expert users can navigate the semantic graph along the conceptual schema, enriched with simple forms of homonyms and term hierarchies, thus understanding the semantic reasoning behind query results.


Knowledge graph Semantic search Conceptual model Data integration Genomics Next Generation Sequencing Open data 



This research is funded by the ERC Advanced Grant 693174 GeCo (Data-Driven Genomic Computing), 2016-2021.


  1. 1.
    Bernasconi, A., Ceri, S., Campi, A., Masseroli, M.: Conceptual modeling for genomics: building an integrated repository of open data. In: Mayr, H.C., Guizzardi, G., Ma, H., Pastor, O. (eds.) ER 2017. LNCS, vol. 10650, pp. 325–339. Springer, Cham (2017). Scholar
  2. 2.
    Bernasconi, A., et al.: Ontology-driven metadata enrichment for genomic datasets. In: International Conference on Semantic Web Applications and Tools for Life Sciences, vol. 2275. CEUR-WS (2018)Google Scholar
  3. 3.
    Bonnici, V., et al.: Comprehensive reconstruction and visualization of non-coding regulatory networks in human. Front. Bioeng. Biotechnol. 2, 69 (2014)CrossRefGoogle Scholar
  4. 4.
    Martínez Ferrandis, A.M., Pastor López, O., Guizzardi, G.: Applying the principles of an ontology-based approach to a conceptual schema of human genome. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds.) ER 2013. LNCS, vol. 8217, pp. 471–478. Springer, Heidelberg (2013). Scholar
  5. 5.
    Hammer, J., Schneider, M.: The GenAlg project: developing a new integrating data model, language, and tool for managing and querying genomic information. ACM SIGMOD Rec. 33(2), 45–50 (2004)CrossRefGoogle Scholar
  6. 6.
    Jensen, M.A., et al.: The NCI Genomic Data Commons as an engine for precision medicine. Blood 130(4), 453–459 (2017)CrossRefGoogle Scholar
  7. 7.
    Jupp, S., et al.: A new ontology lookup service at EMBL-EBI. In: Malone, J., et al. (eds.) International Conference on Semantic Web Applications and Tools for Life Sciences, vol. 1546, pp. 118–119. CEUR-WS (2015)Google Scholar
  8. 8.
    Kundaje, A., et al.: Integrative analysis of 111 reference human epigenomes. Nature 518(7539), 317–330 (2015)CrossRefGoogle Scholar
  9. 9.
    Masseroli, M., et al.: Processing of big heterogeneous genomic datasets for tertiary analysis of Next Generation Sequencing data. Bioinformatics 35, 729–736 (2018)CrossRefGoogle Scholar
  10. 10.
    Messina, A., et al.: BioGraph: a web application and a graph database for querying and analyzing bioinformatics resources. BMC Syst. Biol. 12(5), 98 (2018)CrossRefGoogle Scholar
  11. 11.
    Palacio, A.L., López, Ó.P., Ródenas, J.C.C.: A method to identify relevant genome data: conceptual modeling for the medicine of precision. In: Trujillo, J., et al. (eds.) ER 2018. LNCS, vol. 11157, pp. 597–609. Springer, Cham (2018). Scholar
  12. 12.
    Pareja-Tobes, P., et al.: Bio4j: a high-performance cloud-enabled graph-based data platform. BioRxiv, p. 016758 (2015)Google Scholar
  13. 13.
    Rambold, G., et al.: Meta-omics data and collection objects (MOD-CO): a conceptual schema and data model for processing sample data in meta-omics research. Database 2019, baz002 (2019).
  14. 14.
    Consortium ENCODE. An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)Google Scholar
  15. 15.
    Reyes Román, J.F., Pastor, Ó., Casamayor, J.C., Valverde, F.: Applying conceptual modeling to better understand the human genome. In: Comyn-Wattiau, I., Tanaka, K., Song, I.-Y., Yamamoto, S., Saeki, M. (eds.) ER 2016. LNCS, vol. 9974, pp. 404–412. Springer, Cham (2016). Scholar
  16. 16.
    Stephens, Z.D., et al.: Big data: astronomical or genomical? PLoS Biol. 13(7), e1002195 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Dipartimento di Elettronica, Informazione e BioingegneriaPolitecnico di MilanoMilanItaly

Personalised recommendations