Linked Biomedical Dataspace: Lessons Learned Integrating Data for Drug Discovery

  • Ali Hasnain
  • Maulik R. Kamdar
  • Panagiotis Hasapis
  • Dimitris Zeginis
  • Claude N. WarrenJr.
  • Helena F. Deus
  • Dimitrios Ntalaperas
  • Konstantinos Tarabanis
  • Muntazir Mehdi
  • Stefan Decker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8796)


The increase in the volume and heterogeneity of biomedical data sources has motivated researchers to embrace Linked Data (LD) technologies to solve the ensuing integration challenges and enhance information discovery. As an integral part of the EU GRANATUM project, a Linked Biomedical Dataspace (LBDS) was developed to semantically interlink data from multiple sources and augment the design of in silico experiments for cancer chemoprevention drug discovery. The different components of the LBDS facilitate both the bioinformaticians and the biomedical researchers to publish, link, query and visually explore the heterogeneous datasets. We have extensively evaluated the usability of the entire platform. In this paper, we showcase three different workflows depicting real-world scenarios on the use of LBDS by the domain users to intuitively retrieve meaningful information from the integrated sources. We report the important lessons that we learned through the challenges encountered and our accumulated experience during the collaborative processes which would make it easier for LD practitioners to create such dataspaces in other domains. We also provide a concise set of generic recommendations to develop LD platforms useful for drug discovery.


Linked Data Drug Discovery SPARQL Federation Visualization Biomedical Research 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alexander, K., Cyganiak, R., et al.: Describing linked datasets. In: LDOW (2009)Google Scholar
  2. 2.
    Antoniades, A., Georgousopoulos, C., Forgo, N., et al.: Linked2Safety: A secure linked data medical information space for semantically-interconnecting EHRs advancing patients’ safety in medical research. In: 12th International Conference on Bioinformatics & Bioengineering (BIBE), pp. 517–522. IEEE (2012)Google Scholar
  3. 3.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)CrossRefGoogle Scholar
  4. 4.
    Belleau, F., Nolin, M.A., et al.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. Journal of Biomedical Informatics 41(5), 706–716 (2008)CrossRefGoogle Scholar
  5. 5.
    Berlanga, R., et al.: Exploring and linking biomedical resources through multidimensional semantic spaces. BMC Bioinformatics 13(suppl. 1), S6 (2012)Google Scholar
  6. 6.
    Bizer, C., Seaborne, A.: D2RQ-treating non-RDF databases as virtual RDF graphs. In: Proceedings of the 3rd International Semantic Web Conference (ISWC) (2004)Google Scholar
  7. 7.
    Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Research 32(suppl. 1), D267–D270 (2004)Google Scholar
  8. 8.
    Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL Web-Querying Infrastructure: Ready for Action? In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  9. 9.
    Castillo, R., Leser, U.: Selecting materialized views for RDF data. In: Daniel, F., Facca, F.M. (eds.) ICWE 2010. LNCS, vol. 6385, pp. 126–137. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Cheung, K.H., Frost, H.R., Marshall, M.S., et al.: A journey to semantic web query federation in the life sciences. BMC Bioinformatics 10(suppl. 10), S10 (2009)Google Scholar
  11. 11.
    Euzenat, J., Meilicke, C., Stuckenschmidt, H., Shvaiko, P., Trojahn, C.: Ontology alignment evaluation initiative: Six years of experience. In: Spaccapietra, S. (ed.) Journal on Data Semantics XV. LNCS, vol. 6720, pp. 158–192. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Freitas, A., Curry, E., et al.: Querying linked data using semantic relatedness: a vocabulary independent approach. IEEE Internet Computing, 24–33 (2012)Google Scholar
  13. 13.
    Goble, C., et al.: Incorporating commercial and private data into an open linked data platform for drug discovery. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 65–80. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  14. 14.
    Hartig, O., Bizer, C., Freytag, J.C.: Executing sparql queries over the web of linked data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 293–309. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Hasnain, A., Fox, R., Decker, S., Deus, H.F.: Cataloguing and linking life sciences LOD Cloud. In: 1st International Workshop on Ontology Engineering in a Data-driven World at EKAW 2012 (2012)Google Scholar
  16. 16.
    Irwin, J.J., Shoichet, B.K.: ZINC-a free database of commercially available compounds for virtual screening. Journal of Chemical Information and Modeling 45(1), 177–182 (2005)CrossRefGoogle Scholar
  17. 17.
    Kamdar, M.R., Iqbal, A., Saleem, M., Deus, H.F., Decker, S.: GenomeSnip: Fragmenting the Genomic Wheel to augment discovery in cancer research. In: Conference on Semantics in Healthcare and Life Sciences (CSHALS). ISCB (2014)Google Scholar
  18. 18.
    Kamdar, M.R., Zeginis, D., Hasnain, A., Decker, S., Deus, H.F.: ReVeaLD: A user-driven domain-specific interactive search platform for biomedical research. Journal of Biomedical Informatics 47, 112–130 (2014)CrossRefGoogle Scholar
  19. 19.
    Kannas, C., Achilleos, K., Antoniou, Z., Nicolaou, C., Pattichis, C., et al.: A workflow system for virtual screening in cancer chemoprevention. In: 12th International Conference on Bioinformatics & Bioengineering (BIBE), pp. 439–446. IEEE (2012)Google Scholar
  20. 20.
    Kaufmann, E., Bernstein, A.: Evaluating the usability of natural language query languages and interfaces to Semantic Web knowledge bases. Web Semantics: Science, Services and Agents on the World Wide Web 8(4), 377–393 (2010)CrossRefGoogle Scholar
  21. 21.
    Li, Q., Cheng, T., Wang, Y., Bryant, S.H.: PubChem as a public resource for drug discovery. Drug Discovery Today 15(23), 1052–1057 (2010)CrossRefGoogle Scholar
  22. 22.
    Markham, K.M., et al.: The concept map as a research and evaluation tool: Further evidence of validity. Journal of Research in Science Teaching 31(1), 91–101 (1994)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Miller, G.A., Beckwith, R., Fellbaum, C., et al.: Introduction to WordNet: An on-line lexical database. International Journal of Lexicography 3(4), 235–244 (1990)CrossRefGoogle Scholar
  24. 24.
    Nikolov, A., Uren, V., Motta, E., de Roeck, A.: Overcoming schema heterogeneity between linked semantic repositories to improve coreference resolution. In: Gómez-Pérez, A., Yu, Y., Ding, Y. (eds.) ASWC 2009. LNCS, vol. 5926, pp. 332–346. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  25. 25.
    Pence, H.E., Williams, A.: ChemSpider: an online chemical information resource. Journal of Chemical Education 87(11), 1123–1124 (2010)CrossRefGoogle Scholar
  26. 26.
    Pietriga, E., Bizer, C., Karger, D.R., Lee, R.: Fresnel: A browser-independent presentation vocabulary for RDF. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 158–171. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  27. 27.
    Ruttenberg, A., Rees, J.A., et al.: Life sciences on the semantic web: the Neurocommons and beyond. Briefings in Bioinformatics 10(2), 193–204 (2009)CrossRefGoogle Scholar
  28. 28.
    Saleem, M., Khan, Y., Hasnain, A., Ermilov, I., et al.: A fine-grained evaluation of SPARQL endpoint federation systems. Semantic Web Journal (2014)Google Scholar
  29. 29.
    Saleem, M., et al.: Big linked cancer data: Integrating linked TCGA and PubMed. In: Web Semantics: Science, Services and Agents on the World Wide Web (2014)Google Scholar
  30. 30.
    Samwald, M., Jentzsch, A., et al.: Linked open drug data for pharmaceutical research and development. Journal of Cheminformatics 3(1), 19 (2011)CrossRefGoogle Scholar
  31. 31.
    Sandler, R.S., Halabi, S., Baron, J.A., Budinger, S., Paskett, E., et al.: A randomized trial of aspirin to prevent colorectal adenomas in patients with previous colorectal cancer. New England Journal of Medicine 348(10), 883–890 (2003)CrossRefGoogle Scholar
  32. 32.
    Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: Optimization techniques for federated query processing on linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  33. 33.
    Searls, D.B.: Data integration: challenges for drug discovery. Nature Reviews Drug Discovery 4(1), 45–58 (2005)CrossRefGoogle Scholar
  34. 34.
    Shi, L., Campagne, F.: Building a protein name dictionary from full text: a machine learning term extraction approach. BMC Bioinformatics 6(1), 88 (2005)CrossRefGoogle Scholar
  35. 35.
    Sousa, S.F., et al.: Protein-ligand docking: current status and future challenges. Proteins: Structure, Function, and Bioinformatics 65(1), 15–26 (2006)CrossRefGoogle Scholar
  36. 36.
    Speirs, V., Parkes, A.T., et al.: Coexpression of Estrogen Receptor α and β Poor Prognostic factors in Human Breast Cancer? Cancer Research 59(3), 525–528 (1999)Google Scholar
  37. 37.
    Uschold, M., Gruninger, M.: Ontologies: Principles, methods and applications. The Knowledge Engineering Review 11(2), 93–136 (1996)CrossRefGoogle Scholar
  38. 38.
    Visser, P.R., Jones, D.M., Bench-Capon, T., Shave, M.: An analysis of ontology mismatches; heterogeneity versus interoperability. In: AAAI 1997 Spring Symposium on Ontological Engineering, Stanford CA., USA, pp. 164–172 (1997)Google Scholar
  39. 39.
    Weininger, D.: SMILES, a chemical language and information system. Journal of Chemical Information and Computer Sciences 28(1), 31–36 (1988)Google Scholar
  40. 40.
    Whetzel, P.L., Noy, N.F., et al.: Bioportal: enhanced functionality via new web services from the national center for biomedical ontology to access and use ontologies in software applications. Nucleic Acids Research 39(suppl. 2), W541–W545 (2011)Google Scholar
  41. 41.
    Williams, A.J., Harland, L., Groth, P., Pettifer, S., et al.: Open PHACTS: semantic interoperability for drug discovery. Drug Discovery Today 17(21), 1188–1198 (2012)CrossRefGoogle Scholar
  42. 42.
    Zeginis, D., et al.: A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources. Semantic Web (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ali Hasnain
    • 1
  • Maulik R. Kamdar
    • 1
  • Panagiotis Hasapis
    • 2
  • Dimitris Zeginis
    • 3
    • 4
  • Claude N. WarrenJr.
    • 5
  • Helena F. Deus
    • 6
  • Dimitrios Ntalaperas
    • 2
  • Konstantinos Tarabanis
    • 3
    • 4
  • Muntazir Mehdi
    • 1
  • Stefan Decker
    • 1
  1. 1.Insight Center for Data AnalyticsNational University of IrelandGalwayIreland
  2. 2.UBITECH ResearchAthensGreece
  3. 3.Centre for Research and Technology HellasThessalonikiGreece
  4. 4.Information Systems LabUniversity of MacedoniaThessalonikiGreece
  5. 5.Xenei.comUSA
  6. 6.Foundation Medicine Inc.CambridgeUSA

Personalised recommendations