Advertisement

Semantic Data Integration of Big Biomedical Data for Supporting Personalised Medicine

  • Maria-Esther VidalEmail author
  • Kemele M. Endris
  • Samaneh Jozashoori
  • Farah Karim
  • Guillermo Palma
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 815)

Abstract

Big biomedical data has grown exponentially during the last decades and a similar growth rate is expected in the next years. Likewise, semantic web technologies have also advanced during the last years, and a great variety of tools, e.g., ontologies and query languages, have been developed by different scientific communities and practitioners. Although a rich variety of tools and big data collections are available, many challenges need to be addressed in order to discover insights from which decisions can be taken. For instance, different interoperability conflicts can exist among data collections, data may be incomplete, and entities may be dispersed across different datasets. These issues hinder knowledge exploration and discovery, being thus required data integration in order to unveil meaningful outcomes. In this chapter, we address these challenges and devise a knowledge-driven framework that relies on semantic web technologies to enable knowledge exploration and discovery. The framework receives big data sources and integrates them into a knowledge graph. Semantic data integration methods are utilized for identifying equivalent entities, i.e., entities that correspond to the same real-world elements. Fusion policies enable the merging of equivalent entities inside the knowledge graph, as well as with entities in other knowledge graphs, e.g., DBpedia and Bio2RFD. Knowledge discovery allows for the exploration of knowledge graphs in order to uncover novel patterns and relations. As proof of concept, we report on the results of applying the knowledge-driven framework in the EU funded project iASiS (http://project-iasis.eu/) in order to transform big data into actionable knowledge, paving thus the way for personalised medicine.

Notes

Acknowledgements

This work has been partially funded in by the European Union’s Horizon 2020 research and innovation programme project iASiS under grant agreement No. 727658. Kemele Endris has been sponsored by the EU Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 642795 (WDAqua). Farah Karin has been supported by a scholarship of German Academic Exchange Service (DAAD).

References

  1. 1.
    Schmidlen, T.J., Wawak, L., Kasper, R., García-España, J.F., Christman, M.F., Gordon, E.S.: Personalized genomic results: analysis of informational needs. J. Genetic Counseling 578–587 (2014)CrossRefGoogle Scholar
  2. 2.
    Shah, N.H., LePendu, P., Bauer-Mehren, A., Ghebremariam, Y.T., Iyer, S.V., Marcus, J., Nead, K.T., Cooke, J.P., Leeper, N.J.: Proton pump inhibitor usage and the risk of myocardial infarction in the general population. PLoS One (2015)Google Scholar
  3. 3.
    Iturria-Medina, Y., Sotero, R., Toussaint, P.: Early role of vascular dysregulation on late-onset Alzheimer’s disease based on multifactorial data-driven analysis. Nature Commun. (2016)Google Scholar
  4. 4.
    Acosta, M., Vidal, M.E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Proceedings of the 10th International Conference on the Semantic Web ISWC (2011)CrossRefGoogle Scholar
  5. 5.
    Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Proceedings of the 10th International Conference on the Semantic Web ISWC (2011)CrossRefGoogle Scholar
  6. 6.
    Collarana, D., Galkin, M., Traverso-Ribón, I., Vidal, M.E., Lange, C., Auer, S.: MINTE: semantically integrating RDF graphs. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics WIMS (2017)Google Scholar
  7. 7.
    Collarana, D., Lange, C., Auer, S.: FuhSen: a platform for federated, RDF-based hybrid search. In: Proceedings of the 25th International Conference on World Wide Web (2016)Google Scholar
  8. 8.
    Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: Proceedings of the 9th Extended Semantic Web Conference ESWC (2012)Google Scholar
  9. 9.
    Collarana, D., Galkin, M., Lange, C., Scerri, S., Auer, S., Vidal, M.E.: Synthesizing Knowledge Graphs from Web Sources with the MINTE + Framework (2018)Google Scholar
  10. 10.
    Gawriljuk, G., Harth, A., Knoblock, C.A., Szekely, P.: A scalable approach to incrementally building knowledge graphs. In International Conference on Theory and Practice of Digital Libraries TPDL, pp. 188–199 (2016)CrossRefGoogle Scholar
  11. 11.
    Kejriwal, M., Szekely, P. and Knoblock, C.: Investigative knowledge discovery for combating illicit activities. IEEE Intell. Syst. 53–63 (2018)CrossRefGoogle Scholar
  12. 12.
    Fundulaki, I., Auer, S.: Linked Open Data—Introduction to the Special Theme. ERCIM News (2014)Google Scholar
  13. 13.
    Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., Iyer, R., Schatz, M.C., Sinha, S., Robinson, G.E.: Big Data: astronomical or genomical. PLoS One (2015)Google Scholar
  14. 14.
    Chen, M., Mao, S., Liu, Y.: Big Data: a survey. MONET 171–209 (2014)CrossRefGoogle Scholar
  15. 15.
    Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a generic language for integrated rdf mappings of heterogeneous data. In: Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW) (2014)Google Scholar
  16. 16.
    Wiederhold, G.: Mediators in the architecture of future information systems. IEEE Comput. 38–49 (1992)CrossRefGoogle Scholar
  17. 17.
    Zadorozhny, V., Raschid, L., Vidal, M.E., Urhan, T., Bright, L.: Efficient evaluation of queries in a mediator for WebSources. In: Proceedings of the 2002 {ACM} {SIGMOD} International Conference on Management of Data (2002)Google Scholar
  18. 18.
    Cao, L.: Data science: challenges and directions. Commun. ACM, 59–68 (2017)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of Big Data challenges and analytical methods. J. Business Res. 263–286 (2017)CrossRefGoogle Scholar
  20. 20.
    Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. Commun. ACM 86–94 (2014)CrossRefGoogle Scholar
  21. 21.
    Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: de Extended Semantic Web Conference (2012)Google Scholar
  22. 22.
    Collarana, D., Galkin, M., Traverso-Ribón, I., Vidal, M.E., Lange, C., Auer, S.: MINTE: semantically integrating RDF graphs. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (2017)Google Scholar
  23. 23.
    Isele, R., Bizer, C.: Active learning of expressive linkage rules using genetic programming. Web Semantics: Science, Services and Agents on the World Wide Web, pp. 2–15 (2013)CrossRefGoogle Scholar
  24. 24.
    Galkin, M., Collarana, D., Traverso-Ribón, I., Vidal, M.E., Auer, S.: SJoin: a semantic join operator to integrate heterogeneous RDF graphs. In: de International Conference on Database and Expert Systems Applications (2017)Google Scholar
  25. 25.
    Schultz, A., Matteini, A., Isele, R., Mendes, P.N., Bizer, C., Becker, C.: LDIF-a framework for large-scale linked data integration. In: 21st International World Wide Web Conference (WWW 2012), Developers Track, Lyon, France (2012)Google Scholar
  26. 26.
    Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops (2012)Google Scholar
  27. 27.
    Ngomo, A.C.N., Auer, S.: Limes-a time-efficient approach for large-scale link discovery on the web of data. de IJCAI (2011)Google Scholar
  28. 28.
    Ristoski, P., Bizer, C., Paulheim, H.: Mining the web of linked data with rapidminer. Web Semantics: Science, Services and Agents on the World Wide Web, pp. 142–151 (2015)CrossRefGoogle Scholar
  29. 29.
    Hu, W., Qiu, H., Huang, J., Dumontier, M.: BioSearch: a semantic search engine for Bio2RDF. Database (2017)Google Scholar
  30. 30.
    Hu, W., Qiu, H., Dumontier, M.: Link analysis of life science linked data. In: de International Semantic Web Conference (2015)Google Scholar
  31. 31.
    Callahan, A., Cruz-Toledo, J., Ansell, P., Dumontier, M.: Bio2RDF release 2: improved coverage, interoperability and provenance of life science linked data. In; de Extended Semantic Web Conference (2013)Google Scholar
  32. 32.
    Sahu, S., Mhedhbi, A., Salihoglu, S., Lin, J., Özsu, M.T.: The ubiquity of large graphs and surprising challenges of graph processing. In: Proceedings of the VLDB Endowment, pp. 420–431 (2017)Google Scholar
  33. 33.
    Hartig, O., Vidal, M.E., Freytag, J.C.: Federated Semantic Data Management (Dagstuhl Seminar 17262), Dagstuhl Reports, pp. 135–167 (2017)Google Scholar
  34. 34.
    Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: Fedx: optimization techniques for federated query processing on linked data. de International Semantic Web Conference (2011)Google Scholar
  35. 35.
    Acosta, M., Vidal, M.E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: International Semantic Web Conference (2011)Google Scholar
  36. 36.
    Endris, K.M., Galkin, M., Lytra, I., Mami, M.N., Vidal, M.E., Auer, S.: MULDER: querying the linked data web by bridging RDF molecule templates. In: International Conference on Database and Expert Systems Applications (2017)Google Scholar
  37. 37.
    Colombo, P., Ferrari, E.: Privacy aware access control for Big Data: a research roadmap. Big Data Res. 145–154 (2015)CrossRefGoogle Scholar
  38. 38.
    Kirrane, S., Villata, S., d’Aquin, M.: Privacy, security and policies: a review of problems and solutions with semantic web technologies. Semantic Web 1–10 (2018)Google Scholar
  39. 39.
    Kamateri, E., Kalampokis, E., Tambouris, E., Tarabanis, K.: The linked medical data access control framework. J. Biomed. Informat. 213–225 (2014)CrossRefGoogle Scholar
  40. 40.
    Grando, A., Schwab, R.: Building and evaluating an ontology-based tool for reasoning about consent permission. In: de AMIA Annual Symposium Proceedings (2013)Google Scholar
  41. 41.
    Zeng, Q., Zhao, M., Liu, P., Yadav, P., Calo, S., Lobo, J.: Enforcement of autonomous authorizations in collaborative distributed query evaluation. IEEE Trans. Knowl. Data Eng. (2015)Google Scholar
  42. 42.
    Endris, K.M., Almhithawi, Z., Lytra, I., Vidal, M.E., Auer, S.: BOUNCER: privacy-aware query processing over federations of RDF datasets. In: 29th International Conference on Database and Expert Systems Applications (2018)Google Scholar
  43. 43.
    Ribón, I.T., Vidal, M.-E., Kämpgen, B., Sure-Vetter, Y.: GADES: a graph-based semantic similarity measure. In: Proceedings of the 12th International Conference on Semantic Systems, Leipzig, Germany (2016)Google Scholar
  44. 44.
    Menasalvas, E., Rodríguez, A., Costumero, R., Ambit, H., Gonzalo, C.: “Clinical Narrative Analytics Challenges”, in Rough Sets—International Joint Conference. IJCRS, Santiago de Chile (2016)Google Scholar
  45. 45.
    Toro, C., Gonzalo-Martín, C., García-Pedrero, A., Menasalvas Ruiz, E.: Supervoxels-based histon as a new Alzheimer’s disease imaging biomarker. Sensors 1752 (2018)Google Scholar
  46. 46.
    Livi, C.M., Klus, P., Delli Ponti, R., Tartaglia, G.G.: catRAPID signature: identification of ribonucleoproteins and RNA-binding regions. Bioinformatics 773–775 (2016)CrossRefGoogle Scholar
  47. 47.
    La Cruz, A., Baranya, A., Vidal, M.-E.: Medical image rendering and description driven by semantic annotations. In: Resource Discovery—5th International Workshop, {RED} 2012, Co-located with the 9th Extended Semantic Web Conference, {ESWC} 2012, Heraklion, Greece, May 27, 2012, Heraklion (2012)Google Scholar
  48. 48.
    Pérez, W., Tello, A., Saquicela, V., Vidal, M.E., La Cruz, A.: An automatic method for the enrichment of {DICOM} metadata using biomedical. In: Proceedings of the 37th Annual International Conference of the {IEEE} Engineering in Medicine and Biology Society, {EMBC} 2015, Milan, Italy, August 25–29, 2015, Milan (2015)Google Scholar
  49. 49.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: 9th International Conference on Semantic Systems of I-SEMANTICS 2013, ISEM ‘13, Graz, Austria, September 4–6, 2013, Graz (2013)Google Scholar
  50. 50.
    Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia). In: Proceedings of the 19th {ACM} Conference on Information and Knowledge Management, {CIKM} 2010, Toronto, Ontario, Canada, October 26–30, 2010, Toronto (2010)Google Scholar
  51. 51.
    Hasnain, A., Mehmood, Q., Sana e Zainab, S., Saleem, M., Warren, C., Zehra, D., Decker, S., Rebholz-Schuhmann, D.: BioFed: federated query processing over life sciences linked open data. J. Biomed. Semant. 13 (2017)Google Scholar
  52. 52.
    Palma, G., Vidal, M.-E., Raschid, L.: Drug-target interaction prediction using semantic similarity and edge partitioning. In: 13th International Semantic Web Conference on the Semantic Web–{ISWC} 2014, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I, Riva del Garda (2014)Google Scholar
  53. 53.
    Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Scientif. Comput. (1998)Google Scholar
  54. 54.
    Ribon, I.T., Vidal, M.E.: GARUM: a semantic similarity measure based on machine learning and entity characteristics. In: 29th International Conference on Database and Expert Systems Applications, DEXA (2018)Google Scholar
  55. 55.
    Morales, C., Collarana, D., Vidal, M.E., Auer, S.: MateTee: A semantic similarity metric based on translation embeddings for knowledge graphs. In: 17th International Conference on Web Engineering, ICWE (2017)Google Scholar
  56. 56.
    Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: 27th Annual Conference on Neural Information Processing Systems on Advances in Neural Information Processing Systems 26 (2013)Google Scholar
  57. 57.
    Nickel, M., Rosasco, L., Poggio, T.A.: Holographic embeddings of knowledge graphs. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (2016)Google Scholar
  58. 58.
    Nickel, M., Tresp, V.: Tensor factorization for multi-relational learning. In: European Conference of Machine Learning and Knowledge Discovery in Databases, ECML PKDD (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Maria-Esther Vidal
    • 1
    • 2
    Email author
  • Kemele M. Endris
    • 1
    • 2
  • Samaneh Jozashoori
    • 1
    • 2
  • Farah Karim
    • 1
    • 2
  • Guillermo Palma
    • 1
  1. 1.TIB Leibniz Information Centre for Science and TechnologyHannoverGermany
  2. 2.L3S Institute, Leibniz University of HannoverHannoverGermany

Personalised recommendations