A Software Framework and Datasets for the Analysis of Graph Measures on RDF Graphs

  • Matthäus ZlochEmail author
  • Maribel AcostaEmail author
  • Daniel Hienert
  • Stefan Dietze
  • Stefan Conrad
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11503)


As the availability and the inter-connectivity of RDF datasets grow, so does the necessity to understand the structure of the data. Understanding the topology of RDF graphs can guide and inform the development of, e.g. synthetic dataset generators, sampling methods, index structures, or query optimizers. In this work, we propose two resources: (i) a software framework (Resource URL of the framework: able to acquire, prepare, and perform a graph-based analysis on the topology of large RDF graphs, and (ii) results on a graph-based analysis of 280 datasets (Resource URL of the datasets: from the LOD Cloud with values for 28 graph measures computed with the framework. We present a preliminary analysis based on the proposed resources and point out implications for synthetic dataset generators. Finally, we identify a set of measures, that can be used to characterize graphs in the Semantic Web.


  1. 1.
    Alstott, J., Bullmore, E., Plenz, D.: powerlaw: a python package for analysis of heavy-tailed distributions. PloS one 9(1), e85777 (2014)CrossRefGoogle Scholar
  2. 2.
    Bachlechner, D., Strang, T.: Is the semantic web a small world? In: ITA, pp. 413–422 (2007)Google Scholar
  3. 3.
    Ben Ellefi, M., et al.: RDF dataset profiling - a survey of features, methods, vocabularies and applications. Semant. Web J. 9(5), 677–705 (2018)CrossRefGoogle Scholar
  4. 4.
    Debattista, J., Lange, C., Auer, S., Cortis, D.: Evaluating the quality of the LOD cloud: an empirical investigation. Semant. Web J. 9(6), 859–901 (2018)CrossRefGoogle Scholar
  5. 5.
    Auer, S., Demter, J., Martin, M., Lehmann, J.: LODStats–an extensible framework for high-performance dataset analytics. In: ten Teije, A., et al. (eds.) EKAW 2012. LNCS (LNAI), vol. 7603, pp. 353–362. Springer, Heidelberg (2012). Scholar
  6. 6.
    Ding, L., Finin, T.: Characterizing the semantic web on the web. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 242–257. Springer, Heidelberg (2006). Scholar
  7. 7.
    Duan, S., Kementsietsidis, A., Srinivas, K., Udrea, O.: Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In: ACM SIGMOD, pp. 145–156. ACM (2011)Google Scholar
  8. 8.
    Fernández, J.D., Martínez-Prieto, M.A., de la Fuente Redondo, P., Gutiérrez, C.: Characterising RDF data sets. JIS 44(2), 203–229 (2018)Google Scholar
  9. 9.
    Flores, A., Vidal, M., Palma, G.: Graphium chrysalis: exploiting graph database engines to analyze RDF graphs. In: ESWC Satellite Events, pp. 326–331 (2014)Google Scholar
  10. 10.
    Freeman, L.C.: Centrality in social networks: conceptual clarification. Soc. Netw. 1(3), 215–239 (1979)CrossRefGoogle Scholar
  11. 11.
    Hirsch, J.E.: An index to quantify an individual’s scientific research output. Proc. Nat. Acad. Sci. U.S.A. 102(46), 16569–16572 (2005)CrossRefGoogle Scholar
  12. 12.
    Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic web. In: LDOW (2010)Google Scholar
  13. 13.
    Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: SIGKDD, pp. 631–636 (2006)Google Scholar
  14. 14.
    Mihindukulasooriya, N., Poveda-Villalón, M., García-Castro, R., Gómez-Pérez, A.: Loupe - an online tool for inspecting datasets in the linked data cloud. In: ISWC Posters & Demonstrations (2015)Google Scholar
  15. 15.
    Newman, M.E.J.: Networks: An Introduction. Oxford University Press, New York (2010)CrossRefGoogle Scholar
  16. 16.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)Google Scholar
  17. 17.
    Qiao, S., Özsoyoglu, Z.M.: RBench: application-specific RDF Benchmarking. In: SIGMOD, pp. 1825–1838. ACM (2015)Google Scholar
  18. 18.
    Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 245–260. Springer, Cham (2014). Scholar
  19. 19.
    Sejdiu, G., Ermilov, I., Lehmann, J., Mami, M.N.: DistLODStats: distributed computation of RDF dataset statistics. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 206–222. Springer, Cham (2018). Scholar
  20. 20.
    Tay, Y.C.: Data generation for application-specific benchmarking. PVLDB Challenges Vis. 4(12), 1470–1473 (2011)Google Scholar
  21. 21.
    Theoharis, Y., Tzitzikas, Y., Kotzinos, D., Christophides, V.: On graph features of semantic web schemas. IEEE TKDE 20(5), 692–702 (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  1. 1.GESIS - Leibniz-Institute for the Social SciencesMannheimGermany
  2. 2.Institute AIFB, Karlsruhe Institute of TechnologyKarlsruheGermany
  3. 3.Institute for Computer ScienceHeinrich-Heine University DüsseldorfDüsseldorfGermany

Personalised recommendations