Abstract
Ontology-based mappings in knowledge graphs are a widely discussed topic in biomedical research. Contextual information is widely considered for NLP and knowledge discovery in life sciences since it highly influences the exact meaning of natural language. The scientific challenge is not only to extract such context data, but also to store this data for further query and discovery approaches. Classical approaches use RDF triple stores, which have serious limitations. Here, we introduce the graph-theoretic foundation for a general context concept within semantic networks and show a proof-of-concept based on biomedical literature and text mining as a multiple step knowledge graph approach using labeled property graphs based on polyglot persistence systems to utilize context data for context mining, graph queries, knowledge discovery and extraction. Our test system contains a knowledge graph derived from the entirety of PubMed and SCAIView data and is enriched with text mining data and domain specific language data using BEL. Here, context is a more general concept than annotations. Storing and querying a giant knowledge graph as a labeled property graph is still a technological challenge. Here we demonstrate how our data model is able to support the understanding and interpretation of biomedical data. We present several real world use cases that utilize our massive, generated knowledge graph derived from PubMed data and enriched with additional contextual data. Finally, we show a working example in context of biologically relevant information using SCAIView.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Desai, M., Mehta, R.G., Rana, D.P.: Issues and challenges in big graph modelling for smart city: an extensive survey. Int. J. Comput. Intell. & IoT 1(1) (2018)
Dumontier, M., Callahan, A., Cruz-Toledo, J., Ansell, P. Emonet, V., Belleau, F., Droit, A.: Bio2rdf release 3: a larger connected network of linked data for the life sciences, In: Proceedings of the 2014 International Conference on Posters & Demonstrations Track, vol. 1272, pp. 401–404 (2014)
Callahan, A., Cruz-Toledo, J., Ansell, P., Dumontier, M.: Bio2rdf release 2: improved coverage, interoperability and provenance of life science linked data. In: Extended Semantic Web Conference, pp. 200–212. Springer (2013)
Li, S., Xin, L.: Research on integration and sharing of scientific data based on linked data–a case study of bio2rdf. Res. Libr. Sci. 21 (2014)
Natsiavas, P., Koutkias, V., Maglaveras, N.: Exploring the capacity of open, linked data sources to assess adverse drug reaction signals. In: SWAT4LS, pp. 224–226 (2015)
Aggarwal, C.C., Zhai, C.: An introduction to text mining. In: Mining Text Data, pp. 1–10. Springer (2012)
Dörpinghaus, J., Stefan, A.: Knowledge extraction and applications utilizing context data in knowledge graphs. In: 2019 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 265–272. IEEE (2019)
Hanisch, D., Fundel-Clemens, K., Mevissen, H.-T., Zimmer, R., Fluck, J.: Prominer: Rule-based protein and gene entity recognition. BMC Bioinf. 6(Suppl 1), S14, 02 (2005). https://doi.org/10.1186/1471-2105-6-S1-S14
Fluck, J., Klenner, A., Madan, S., Ansari, S., Bobic, T., Hoeng, J., Hofmann-Apitius, M., Peitsch, M.: Bel networks derived from qualitative translations of bionlp shared task annotations. In: Proceedings of the 2013 Workshop on Biomedical Natural Language Processing, pp. 80–88 (2013)
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene ontology: tool for the unification of biology. Nat. Gen. 25(1), 25 (2000)
Wishart, D.S., Feunang, Y.D., Guo, A.C., Lo, E.J., Marcu, A., Grant, J.R., Sajed, T., Johnson, D., Li, C., Sayeeda, Z., et al.: Drugbank 5.0: a major update to the drugbank database for 2018. Nucl. Acids Res. 46(D1), D1074–D1082 (2017)
Khan, K., Benfenati, E., Roy, K.: Consensus qsar modeling of toxicity of pharmaceuticals to different aquatic organisms: ranking and prioritization of the drugbank database compounds. Ecotoxicol. Environ. Safety 168, 287–297 (2019)
Hey, J.: The data, information, knowledge, wisdom chain: the metaphorical link. Intergov. Oceanograp. Comm. 26, 1–18 (2004)
Zeleny, M.: Management support systems: towards integrated knowledge management. Human Syst. Manag. 7(1), 59–70 (1987)
Ackoff, R.L.: From data to wisdom. J. Appl. Syst. Anal. 16(1), 3–9 (1989)
Rowley, J.: The wisdom hierarchy: representations of the dikw hierarchy. J. Inf. Sci. 33(2), 163–180 (2007)
Dörpinghaus, J., Jacobs, M.: Semantic knowledge graph embeddings for biomedical research: data integration using linked open data. In: Posters and Demo Track of the 15th International Conference on Semantic Systems. (Poster and Demo Track at SEMANTiCS 2019), no. 2451, pp. 46–50 (2019). http://ceur-ws.org/Vol-2451/#paper-10
Dörpinghaus, J., Darms, J., Jacobs, M.: What was the question? a systematization of information retrieval and nlp problems. In: 2018 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE (2018)
Losiewicz, P., Oard, D.W., Kostoff, R.N.: Textual data mining to support science and technology management. J. Intell. Inf. Syst. 15(2), 99–119 (2000)
Dörpinghaus, J., Klein, J., Darms, J., Madan, S., Jacobs, M.: Scaiview – a semantic search engine for biomedical research utilizing a microservice architecture. In: Proceedings of the Posters and Demos Track of the 14th International Conference on Semantic Systems - SEMANTiCS2018 (2018)
Webber, J., Eifrem, E., Robinson, I.: Graph Databases. O’Reilly (2015)
Rogers, F.B.: Medical subject headings. Bull. Med. Libr. Assoc. 51, 114–116 (1963)
Yang, H., Lee, H.: Research trend visualization by mesh terms from pubmed. Int. J. Environ. Res. Pub. Health 15(6), 1113 (2018)
Cyganiak, R., Wood, D., Lanthaler, M.: RDF 1.1 concepts and abstract syntax. W3C, W3C Recommendation (2014). http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/
Patel-Schneider, P., Rudolph, S., Krötzsch, M., Hitzler, P., Parsia, B.: OWL 2 web ontology language primer (second edition). W3C, Technical Report (2012). http://www.w3.org/TR/2012/REC-owl2-primer-20121211/
Summers, E., Isaac, A.: SKOS simple knowledge organization system primer. W3C, W3C Note (2009). http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/
Zeng, M., Hlava, M., Qin, J., Hodge, G., Bedford, D.: Knowledge organization systems (kos) standards. Proc. Assoc. Inf. Sci. Technol. 44(1), 1–3 (2007)
NISO: Guidelines for the construction, format, and management of monolingual controlled vocabularies. National Information Standards Organization, Baltimore, Maryland, U.S.A., Standard (2005)
Zeng, M.: Knowledge organization systems (kos). Knowl. Org. 35, 160–182 (2008). https://doi.org/10.5771/0943-7444-2008-2-3-160
Malhotra, A., Younesi, E., Gündel, M., Müller, B., Heneka, M.T., Hofmann-Apitius, M.: Ado: a disease ontology representing the domain knowledge specific to alzheimer’s disease. Alzheimer’s & Dementia 10(2), 238–246 (2014)
Iyappan, A., Younesi, E., Redolfi, A., Vrooman, H., Khanna, S., Frisoni, G.B., Hofmann-Apitius, M.: Neuroimaging feature terminology: a controlled terminology for the annotation of brain imaging features. J. Alzheimer’s Dis. 59(4), 1153–1169 (2017)
S. Madan, M. Fiosins, S. Bonn, and J. Fluck, “A Semantic Data Integration Methodology for Translational Neurodegenerative Disease Research,” in Proceedings of the 11th International Conference Semantic Web Applications and Tools for Life Sciences (SWAT4HCLS 2018), Dec. 2018. DOI: 10.6084/m9.figshare.7339244.v1
Voß, J.: Classification of knowledge organization systems with wikidata. In: NKOS@ TPDL, pp. 15–22 (2016)
Vrandečić, D.: Toward an abstract wikipedia. In: Ortiz, M., Schneider, T. (eds.) 31st International Workshop on Description Logics (DL), CEUR Workshop Proceedings, no. 2211, Aachen (2018)
Oßwald, A., Schöpfel, J., Jacquemin, B.: Continuing professional education in open access. A French-German survey. LIBER Quart. J. Assoc. Europ. Res. Libr. 26(2), 43–66 (2015)
Volanakis, A., Krawczyk, K.: Sciride finder: a citation-based paradigm in biomedical literature search. Sci. Rep. 8(1), 6193 (2018)
Madan, S., Hodapp, S., Senger, P., Ansari, S., Szostak, J., Hoeng, J., Peitsch, M., Fluck, J.: The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track. Database 2016 (2016)
Madan, S., Szostak, J., Dörpinghaus, J., Hoeng, J., Fluck, J.: Overview of BEL track: extraction of complex relationships and their conversion to BEL. In: Proceedings of the BioCreative VI Workshop (2017)
Dörpinghaus, J., Düing, C., Weil, V.: A minimum set-cover problem with several constraints. In: Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 115–122 (2019). https://doi.org/10.15439/2019F2
Bertossi, A.A.: Dominating sets for split and bipartite graphs. Inf. Proc. Lett. 19(1), 37–40 (1984)
Yannakakis, M., Gavril, F.: Edge dominating sets in graphs. SIAM J. Appl. Math. 38(3), 364–372 (1980)
Korte, B., Vygen, J., Korte, B., Vygen, J.: Combinatorial optimization, vol. 2. Springer (2012)
Dörpinghaus, J., Düing, C., Weil, V.: Utilizing Minimum Set-Cover Structures with Several Constraints for Knowledge Discovery on Large Literature Databases, pp. 49–69. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-58884-7_3
Wood, P.T.: Query languages for graph databases. SIGMOD Rec. 41(1), 50–60 (2012). https://doi.org/10.1145/2206869.2206879
Angles, R., Arenas, M., Barceló, P., Hogan, A., Reutter, J., Vrgoč, D.: Foundations of modern query languages for graph databases. ACM Comput. Surv. 50(5), 68:1–68:40 (2017). https://doi.org/10.1145/3104031
Kodamullil, A.T., Younesi, E., Naz, M., Bagewadi, S., Hofmann-Apitius, M.: Computable cause-and-effect models of healthy and alzheimer’s disease states and their mechanistic differential analysis. Alzheimer’s & Dement. 11(11), 1329–1339 (2015)
Kim, J.: Correction to: Evaluating author name disambiguation for digital libraries: a case of dblp. Scientometrics 118(1), 383–383 (2019)
Franzoni, V., Lepri, M., Milani, A.: Topological and semantic graph-based author disambiguation on dblp data in neo4j (2019). arXiv:1901.08977
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Cai, D., Wu, G.: Content-aware attributed entity embedding for synonymous named entity discovery. Neurocomputing 329, 237–247 (2019)
Prajapati, P., Sivakumar, P.: Context dependency relation extraction using modified evolutionary algorithm based on web mining. In: Emerging Technologies in Data Mining and Information Security, pp. 259–267. Springer, Göttingen (2019)
Cook, S.A.: The complexity of theorem-proving procedures. In: Proceedings of the third annual ACM symposium on Theory of Computing, pp. 151–158. ACM (1971)
Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L.B., Bourne, P.E., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Dörpinghaus, J., Düing, C., Stefan, A. (2022). Biomedical Knowledge Graphs: Context, Queries and Complexity. In: Dörpinghaus, J., Weil, V., Schaaf, S., Apke, A. (eds) Computational Life Sciences. Studies in Big Data, vol 112. Springer, Cham. https://doi.org/10.1007/978-3-031-08411-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-08411-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08410-2
Online ISBN: 978-3-031-08411-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)