Abstract
The size of massive knowledge graphs (KGs) and the lack of prior information regarding the schemas, ontologies and vocabularies they use frequently makes them hard to understand and visualize. Graph summarization techniques can help by abstracting details of the original graph to produce a reduced summary that can more easily be explored. Identifiers often carry latent information which could be used for classification of the entities they represent. Particularly, IRI namespaces can be used to classify RDF resources. Namespaces, used in some RDF serialization formats as a shortening mechanism for resource IRIs, have no role in the semantics of RDF. Nevertheless, there is often a hidden meaning behind the decision of grouping resources under a common prefix and assigning an alias to it. We improved on previous work on a namespace-based approach to KG summarization that classifies resources using their namespaces. Producing the summary graph is fast, light on computing resources and requires no previous domain knowledge. The summary graph can be used to analyze the namespace inter-dependencies of the original graph. We also present chilon, a tool for calculating namespace-based KG summaries. Namespaces are gathered from explicit declarations in the graph serialization, community contributions or resource IRI prefix analysis. We applied chilon to publicly available KGs, used it to generate interactive visualizations of the summaries, and discuss the results obtained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Named after Chilon of Sparta, one of the Seven Sages of Greece, who coined the ancient proverb “less is more” or “brevity is a way of philosophy”.
- 2.
- 3.
For example, http://example.org/ \(\rightarrow \) (http://example.org/foo/, http://example.org/bar/).
- 4.
Ontology available at https://andrefs.github.io/chilon_rs/ns-graph-summ.ttl.
References
Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing linked datasets with the VoID vocabulary (2011)
Beckett, D.: RDF 1.1 N-Triples (2014). https://www.w3.org/TR/n-triples/
Beckett, D., Berners-Lee, T., Prud’hommeaux, E., Carothers, G.: RDF 1.1 Turtle (2014)
Bergman, M.K.: A Knowledge Representation Practionary. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98092-8
Berners-Lee, T.: Linked data - design issues (2006). http://www.w3.org/DesignIssues/LinkedData.html
Bizer, C., Heath, T., Berners-Lee, T.: Linked data: the story so far. In: Semantic Services, Interoperability and Web Applications: Emerging Concepts. IGI global (2011)
Bonifati, A., Dumbrava, S., Kondylakis, H.: Graph summarization. arXiv preprint arXiv:2004.14794 (2020)
Čebirić, Š, et al.: Summarizing semantic graphs: a survey. VLDB J. 28, 295–327 (2019). https://doi.org/10.1007/s00778-018-0528-3
da Costa, A.R.S.L.: Sumariação de grafos semânticos de grande dimensão usando espaços de nomes. Master’s thesis, Faculty of Sciences of the University of Porto (2022)
da Costa, A.R.S.L., Santos, A., Leal, J.P.: Large semantic graph summarization using namespaces. In: 11th Symposium on Languages, Applications and Technologies, SLATE 2022. Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2022)
Cyganiak, R., Wood, D., Lanthaler, M., Klyne, G., Carroll, J.J., McBride, B.: RDF 1.1 concepts and abstract syntax. W3C Recommendation, 25 February 2014
Debattista, J., Lange, C., Auer, S., Cortis, D.: Evaluating the quality of the LOD cloud: an empirical investigation. Semant. Web 9(6), 859–901 (2018)
Duerst, M., Suignard, M.: RFC 3987: Internationalized Resource Identifiers (IRIs) (2005)
Färber, M.: The Microsoft academic knowledge graph: a linked data source with 8 billion triples of scholarly data. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 113–129. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_8
Färber, M., Menne, C., Harth, A.: A linked data wrapper for CrunchBase. Semant. Web 9(4), 505–515 (2018)
Haller, A., Fernández, J.D., Kamdar, M.R., Polleres, A.: What are links in linked open data? A characterization and evaluation of links between knowledge graphs on the web. J. Data Inf. Qual. (JDIQ) 12(2), 1–34 (2020)
Hassanzadeh, O., Consens, M.P.: Linked movie data base. In: LDOW (2009)
Hofmann, A., Perchani, S., Portisch, J., Hertling, S., Paulheim, H.: DBkWik: towards knowledge graph creation from thousands of Wikis. In: ISWC (2017)
Janowicz, K., Hitzler, P., Adams, B., Kolas, D., Vardeman, C., II.: Five stars of linked data vocabulary use. Semant. Web 5(3), 173–176 (2014)
Kondylakis, H., Kotzinos, D., Manolescu, I.: RDF graph summarization: principles, techniques and applications (tutorial). In: EDBT/ICDT 2019–22nd International Conference on Extending Database Technology-Joint Conference (2019)
Krishnan, A.: Making search easier (2018). https://www.aboutamazon.com/news/innovation-at-amazon/making-search-easier
Lehmann, J., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)
Ley, M.: DBLP: some lessons learned. VLDB Endow. 2(2), 1493–1500 (2009)
Liu, Y., Safavi, T., Dighe, A., Koutra, D.: Graph summarization methods and applications: a survey. ACM Comput. Surv. (CSUR) 51(3), 1–34 (2018)
Matuszek, C., Witbrock, M., Cabral, J., DeOliveira, J.: An introduction to the syntax and content of Cyc. UMBC Computer Science and Electrical Engineering Department Collection (2006)
McCrae, J., Fellbaum, C., Cimiano, P.: Publishing and linking wordnet using lemon and RDF. In: Proceedings of the 3rd Workshop on Linked Data in Linguistics (2014)
Pellissier Tanon, T., Weikum, G., Suchanek, F.: YAGO 4: a reason-able knowledge base. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 583–596. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_34
Singhal, A., et al.: Introducing the knowledge graph: things, not strings. Official Google blog, 16 May 2012
Tchechmedjiev, A., et al.: ClaimsKG: a knowledge graph of fact-checked claims. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 309–324. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_20
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Acknowledgments
This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project LA/P/0063/2020. André Fernandes dos Santos: Ph.D. Grant SFRH/BD/129225/2017 from Fundação para a Ciência e Tecnologia (FCT), Portugal.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
dos Santos, A.F., Leal, J.P. (2023). Summarization of Massive RDF Graphs Using Identifier Classification. In: Ojeda-Aciego, M., Sauerwald, K., Jäschke, R. (eds) Graph-Based Representation and Reasoning. ICCS 2023. Lecture Notes in Computer Science(). Springer, Cham. https://doi.org/10.1007/978-3-031-40960-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-40960-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40959-2
Online ISBN: 978-3-031-40960-8
eBook Packages: Computer ScienceComputer Science (R0)