Transforming Heterogeneous Data into Knowledge for Personalized Treatments—A Use Case

Vidal, Maria-Esther; Endris, Kemele M.; Jazashoori, Samaneh; Sakor, Ahmad; Rivas, Ariam

doi:10.1007/s13222-019-00312-z

Transforming Heterogeneous Data into Knowledge for Personalized Treatments—A Use Case

Schwerpunktbeitrag
Published: 15 April 2019

Volume 19, pages 95–106, (2019)
Cite this article

Datenbank-Spektrum Aims and scope Submit manuscript

Maria-Esther Vidal ORCID: orcid.org/0000-0003-1160-8727¹,
Kemele M. Endris¹,
Samaneh Jazashoori¹,
Ahmad Sakor¹ &
…
Ariam Rivas¹

792 Accesses
14 Citations
Explore all metrics

Abstract

Big data has exponentially grown in the last decade; it is expected to grow at a faster rate in the next years as a result of the advances in the technologies for data generation and ingestion. For instance, in the biomedical domain, a wide variety of methods are available for data ingestion, e.g., liquid biopsies and medical imaging, and the collected data can be represented using myriad formats, e.g., FASTQ and Nifti. In order to extract and manage valuable knowledge and insights from big data, the problem of data integration from structured and unstructured data needs to be effectively solved. In this paper, we devise a knowledge-driven approach able to transform disparate data into knowledge from which actions can be taken. The proposed framework resorts to computational extraction methods for mining knowledge from data sources, e.g., clinical notes, images, or scientific publications. Moreover, controlled vocabularies are utilized to annotate entities and a unified schema describes the meaning of these entities in a knowledge graph; entity linking methods discover links to existing knowledge graphs, e.g., DBpedia and Bio2RDF. A federated query engine enables the exploration of the linked knowledge graphs while knowledge discovery methods allow for uncovering patterns in the knowledge graphs. The proposed framework is used in the context of the EU H2020 funded project iASiS with the aim of paving the way for accurate diagnostics and personalized treatments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Data Integration of Big Biomedical Data for Supporting Personalised Medicine

Challenges for Healthcare Data Analytics Over Knowledge Graphs

A Knowledge-Driven Pipeline for Transforming Big Data into Actionable Knowledge

Notes

References

Acosta M, Vidal M, Lampo T, Castillo J, Ruckhaus E (2011) ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Proceedings of the 10th International Conference on The Semantic Web ISWC Bonn, 23.10.-27.10., pp 18–34 https://doi.org/10.1007/978-3-642-25073-6_2
Chapter Google Scholar
Acosta M, Simperl E, Flöck F, Vidal M (2017a) Enhancing answer completeness of SPARQL queries via crowdsourcing. J Web Semant 45:41–62
Article Google Scholar
Acosta M, Vidal M, Sure-Vetter Y (2017b) Diefficiency metrics: measuring the continuous efficiency of query processing approaches. In: The Semantic Web – ISWC 2017 – 16th International Semantic Web Conference
Google Scholar
Acosta M, Zaveri A, Simperl E, Kontokostas D, Flöck F, Lehmann J (2018) Detecting linked data quality issues via crowdsourcing: a dbpedia study. Semant Web 9(3):303–335
Article Google Scholar
Agerri R, Artola X, Beloki Z, Rigau G, Soroa A (2015) Big data for natural language processing: a streaming approach. Knowl Based Syst 79:36–42
Article Google Scholar
Schulz A, Matteini A, Isele R, Mendes PM, Bizer C, Becker C (2012) Ldif- a framework for large-scale linked data integration. In: Proceedings of the 21st International World Wide Web Conference WWW, Developers Track Lyon, 16.04.-20.04.
Google Scholar
Angles R, Arenas M, Barceló P, Hogan A, Reutter JL, Vrgoc D (2017) Foundations of modern query languages for graph databases. ACM Comput Surv 50(5):68:1–68:40
Article Google Scholar
Ceri S, Gottlob G, Tanca L (1989) What you always wanted to know about datalog (and never dared to ask). IEEE Trans Knowl Data Eng 1(1):146–166
Article Google Scholar
Cheatham M, Cruz IF, Euzenat J, Pesquita C (2017) Special issue on ontology and linked data matching. Semant Web 8(2):183–184
Article Google Scholar
Collarana D, Galkin M, Ribón IT, Vidal M, Lange C, Auer S (2017) MINTE: semantically integrating RDF graphs. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics, WIMS 2017 Amantea, 19.06.-22.06.. https://doi.org/10.1145/3102254.3102280
Chapter Google Scholar
Collarana D, Galkin M, Lange C, Scerri S, Auer S, Vidal M (2018) Synthesizing knowledge graphs from web sources with the MINTE++ framework. In: The Semantic Web – ISWC 2018 – 17th International Semantic Web Conference
Google Scholar
Cruz AL, Baranya A, Vidal M (2012) Medical image rendering and description driven by semantic annotations. In: Resource Discovery – 5th International Workshop, RED 2012, Co-located with the 9th Extended Semantic Web Conference, ESWC 2012 Heraklion, 27.05.2012, pp 123–149 (Revised Selected Papers)
Google Scholar
Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: I‑SEMANTICS 2013 – 9th International Conference on Semantic Systems, ISEM ’13 Graz, 04.09.‑06.09., pp 121–124
Google Scholar
Dimou A, Sande MV, Colpaert P, Verborgh R, Mannens E, de Walle RV (2014) RML: a generic language for integrated RDF mappings of heterogeneous data. In: Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW 2014)
Google Scholar
Doan AH, Halevy AY, Ives ZG (2012) Principles of Data Integration. Morgan Kaufmann, ISBN 978-0-12-416044-6, pp I–XVIII, 1–497
Endris KM, Galkin M, Lytra I, Mami MN, Vidal M, Auer S (2018) Querying interlinked data by bridging RDF molecule templates. T Large Scale Data Knowl Cent Syst 39:1–42
Google Scholar
Euzenat J, Shvaiko P (2013) Ontology matching, 2nd edn. Springer, Berlin Heidelberg
Book MATH Google Scholar
Galkin M, Collarana D, Ribón IT, Vidal M, Auer S (2017) Sjoin: A semantic join operator to integrate heterogeneous RDF graphs. In: Database and Expert Systems Applications – 28th International Conference, DEXA 2017 Lyon, 28.08.-31.08., pp 206–221 (Proceedings, Part I)
Google Scholar
Gawriljuk G, Harth A, Knoblock CA, Szekely PA (2016) A scalable approach to incrementally building knowledge graphs. In: Research and Advanced Technology for Digital Libraries – 20th International Conference on Theory and Practice of Digital Libraries, TPDL 2016 Hannover, 05.09.‑09.09., pp 188–199 (Proceedings)
Google Scholar
Getoor L (2013) Probabilistic soft logic: a scalable approach for markov random fields over continuous-valued variables – (abstract of keynote talk). In: Theory, Practice, and Applications of Rules on the Web – 7th International Symposium, RuleML 2013 Seattle, 11.07.-13.07., p 1 (Proceedings)
Google Scholar
Golshan B, Halevy AY, Mihaila GA, Tan W (2017) Data integration: after the teenage years. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017 Chicago, 14.05.-19.05., pp 101–106
Chapter Google Scholar
Halevy AY (2017) Technical perspective: building knowledge bases from messy data. Commun ACM 60(5):92
Article Google Scholar
Halevy AY (2018) Information integration. In: Encyclopedia of Database Systems, 2nd edn.
Google Scholar
Halevy AY, Rajaraman A, Ordille JJ (2006) Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases Seoul, 12.09.-15.09., pp 9–16
Google Scholar
Hasnain A, Mehmood Q, Sana E, Zainab S, Saleem M, Warren C, Zehra D, Decker S, Rebholz-Schuhmann D (2017) Biofed: federated query processing over life sciences linked open data. J Biomed Semantics 8(1):13
Article Google Scholar
Hassanzadeh O, Chiang F, Miller RJ, Lee HC (2009) Framework for evaluating clustering algorithms in duplicate detection. Proceedings VLDB Endowment 2(1):1282–1293
Article Google Scholar
Henning CA, Ewerth R (2018) Estimating the information gap between textual and visual representations. Int J Multimed Inf Retr 7(1):43–56
Article Google Scholar
Hu W, Qiu H, Huang J, Dumontier M (2017) Biosearch: a semantic search engine for bio2rdf. Database. https://doi.org/10.1093/database/bax059
Article Google Scholar
Isele R, Bizer C (2013) Active learning of expressive linkage rules using genetic programming. J Web Semant 23:2–15. https://doi.org/10.1016/j.websem.2013.06.001
Article Google Scholar
Klimchuk OI, Konovalov KA, Perekhvatov VV, Skulachev KV, Dibrova DV, Mulkidjanian AY (2017) Cognat: a web server for comparative analysis of genomic neighborhoods. Biol Direct. https://doi.org/10.1186/s13062-017-0196-z
Article Google Scholar
Knoblock CA, Szekely PA (2015) Exploiting semantics for big data integration. AI Mag 36(1):25–38
Article Google Scholar
Lenzerini M (2002) Data Integration: a theoretical perspective. In: Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems Madison, 03.06.‑05.06., pp 233–246
Chapter Google Scholar
Libkin L, Reutter JL, Soto A, Vrgoc D (2018) TriAL: A navigational algebra for RDF triplestores. Acm Trans Database Syst 43(1):5:1–5:46
Article MathSciNet Google Scholar
Livi CM, Klus P, Delli Ponti R, Tartaglia GG (2016) catrapid signature: identification of ribonucleoproteins and rna-binding regions. Bioinformatics 32(5):773–775. https://doi.org/10.1093/bioinformatics/btv629
Article Google Scholar
Loster M, Naumann F, Ehmueller J, Feldmann B (2018) Curex: a system for extracting, curating, and exploring domain-specific knowledge graphs from text. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018 Torino, 22.10.-26.10.
Google Scholar
Menasalvas E, González AR, Costumero R, Ambit H, Gonzalo C (2016) Clinical narrative analytics challenges. In: Rough Sets – International Joint Conference, IJCRS 2016 Santiago de Chile, 07.10.‑11.10., pp 23–32 (Proceedings)
Google Scholar
Mendes PN, Mühleisen H, Bizer C (2012) Sieve: linked data quality assessment and fusion. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops Berlin, 30.03., pp 116–123
Chapter Google Scholar
Ross MK, Wei W, Ohno-Machado L (2014) Big data and the electronic health record. IMIA yearbook of medical Informatics, vol 1
Google Scholar
Mohammadi M, Atashin AA, Hofman W, Tan Y (2018) Comparison of ontology alignment systems across single matching task via the mcNemar’s test. TKDD 12(4):51:1–51:18
Google Scholar
Munevar S (2017) Unlocking big data for better health. Nat Biotechnol 35(7):684–686. https://doi.org/10.1038/nbt.3918
Article Google Scholar
Navigli R (2018) Natural language understanding: instructions for (present and future) use. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018 Stockholm, 13.07.-19.07., pp 5697–5702
Google Scholar
Nentidis A, Bougiatiotis K, Krithara A, Paliouras G (2018) Semantic integration of disease-specific knowledge. In: Poster in European Conference on Computational Biology (ECCB18)
Google Scholar
Ngomo ACN, Auer S (2011) Limes-a time-efficient approach for large-scale link discovery on the web of data. In: IJCAI, pp 2312–2317
Google Scholar
Ortiz CA, Gonzalo-Martín C, Garcia-Pedrero A, Ruiz EM (2018) Supervoxels-based histon as a new alzheimer’s disease imaging biomarker. Sensors 18(6):1752
Article Google Scholar
Palma G, Vidal M, Raschid L (2014) Drug-target interaction prediction using semantic similarity and edge partitioning. In: ISWC
Google Scholar
Papachristou N, Puschmann D, Barnaghi P, Cooper B, Hu X, Maguire R, Apostolidis K, Conley YP, Hammer M, Katsaragakis S, Kober KM, Levine JD, McCann L, Patiraki E, Furlong EP, Fox PA, Paul SM, Ream E, Wright F, Miaskowski C (2018) Learning from data to predict future symptoms of oncology patients. PLoS ONE. https://doi.org/10.1371/journal.pone.0208808
Article Google Scholar
Perez W, Tello A, Saquicela V, Vidal M, Cruz AL (2015) An automatic method for the enrichment of DICOM metadata using biomedical ontologies. In: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2015 Milan, 25.08.-29.08., pp 2551–2554
Google Scholar
Priyatna F, Corcho Ó, Sequeda JF (2014) Formalisation and experiences of R2RML-based SPARQL to SQL query translation using morph. In: 23rd International World Wide Web Conference, WWW ’14 Seoul, 07.04.–11.04., pp 479–490
Google Scholar
Ristoski P, Bizer C, Paulheim H (2015) Mining the web of linked data with rapidminer. Web Semant 35:142–151
Article Google Scholar
Ruiz EM, Tuñas JM, Bermejo G, Gonzalo-Martín C, González AR, Zanin M, de Pedro CG, Mendez M, Zaretskaia O, Rey J, Parejo C, Bermudez JLC, Provencio M (2018) Profiling lung cancer patients using electronic health records. J Med Syst 42(7):126:1–126:10
Google Scholar
Sakor A, Mulang’ IO, Singh K, Shekarpour S, Vidal ME, Lehmann J, Auer S (2019) Old is gold: linguistic driven approach for entity and relation linking of short text. In: Proceedings of the NAACL HLT
Google Scholar
Sequeda JF, Arenas M, Miranker DP (2014) OBDA: query rewriting or materialization? in practice, both! In: The Semantic Web – ISWC 2014 – 13th International Semantic Web Conference Riva del Garda, 19.10.-23.10., pp 535–551 (Proceedings, Part I)
Chapter Google Scholar
Tukiainen T (2017) Landscape of x chromosome inactivation across human tissues. Nature. https://doi.org/10.1038/nature24265
Article Google Scholar
Wiederhold G (1992) Mediators in the architecture of future information systems. IEEE Comput 25(3):38–49
Article Google Scholar
Zadorozhny V, Raschid L, Vidal M, Urhan T, Bright L (2002) Efficient evaluation of queries in a mediator for websources. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data Madison, 03.06.‑06.06., pp 85–96
Chapter Google Scholar
Zhong RY, Newman ST, Huang GQ, Lan S (2016) Big data for supply chain management in the service and manufacturing sectors: challenges, opportunities, and future perspectives. Comput Ind Eng 101:572–591
Article Google Scholar

Download references

Acknowledgements

This work has been partially funded by the EU H2020 Project No. 727658 (IASIS).

Author information

Authors and Affiliations

TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hannover, Germany
Maria-Esther Vidal, Kemele M. Endris, Samaneh Jazashoori, Ahmad Sakor & Ariam Rivas

Authors

Maria-Esther Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Kemele M. Endris
View author publications
You can also search for this author in PubMed Google Scholar
Samaneh Jazashoori
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Sakor
View author publications
You can also search for this author in PubMed Google Scholar
Ariam Rivas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria-Esther Vidal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vidal, ME., Endris, K.M., Jazashoori, S. et al. Transforming Heterogeneous Data into Knowledge for Personalized Treatments—A Use Case. Datenbank Spektrum 19, 95–106 (2019). https://doi.org/10.1007/s13222-019-00312-z

Download citation

Received: 27 February 2019
Accepted: 30 March 2019
Published: 15 April 2019
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s13222-019-00312-z

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transforming Heterogeneous Data into Knowledge for Personalized Treatments—A Use Case

Abstract

Access this article

Similar content being viewed by others

Semantic Data Integration of Big Biomedical Data for Supporting Personalised Medicine

Challenges for Healthcare Data Analytics Over Knowledge Graphs

A Knowledge-Driven Pipeline for Transforming Big Data into Actionable Knowledge

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Transforming Heterogeneous Data into Knowledge for Personalized Treatments—A Use Case

Abstract

Access this article

Similar content being viewed by others

Semantic Data Integration of Big Biomedical Data for Supporting Personalised Medicine

Challenges for Healthcare Data Analytics Over Knowledge Graphs

A Knowledge-Driven Pipeline for Transforming Big Data into Actionable Knowledge

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation