Advertisement

Generating Synthetic RDF Data with Connected Blank Nodes for Benchmarking

  • Christina Lantzaki
  • Thanos Yannakis
  • Yannis Tzitzikas
  • Anastasia Analyti
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8465)

Abstract

Generators for synthetic RDF datasets are very important for testing and benchmarking various semantic data management tasks (e.g. querying, storage, update, compare, integrate). However, the current generators do not support sufficiently (or totally ignore) blank node connectivity issues. Blank nodes are used for various purposes (e.g. for describing complex attributes), and a significant percentage of resources is currently represented with blank nodes. Moreover, several semantic data management tasks, like isomorphism checking (useful for checking equivalence), and blank node matching (useful in comparison, versioning, synchronization, and in semantic similarity functions), not only have to deal with blank nodes, but their complexity and optimality depends on the connectivity of blank nodes. To enable the comparative evaluation of the various techniques for carrying out these tasks, in this paper we present the design and implementation of a generator, called BGen, which allows building datasets containing blank nodes with the desired complexity, controllable through various features (morphology, size, diameter, density and clustering coefficient). Finally, the paper reports experimental results concerning the efficiency of the generator, as well as results from using the generated datasets, that demonstrate the value of the generator.

Keywords

#eswc2014Lantzaki 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bizer, C., Schultz, A.: The berlin SPARQL benchmark. International Journal on Semantic Web and Information Systems (2009)Google Scholar
  2. 2.
    Chen, L., Zhang, H., Chen, Y., Guo, W.: Blank Nodes in RDF. Journal of Software (2012)Google Scholar
  3. 3.
    Coleman, T.F., More, J.J.: Estimation of Sparse Jacobian Matrices and Graph Coloring Problems. SIAM Journal on Numerical Analysis (1983)Google Scholar
  4. 4.
    Guo, Y., Pan, Z., Heflin, J.: An evaluation of knowledge base systems for large OWL datasets. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 274–288. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. In: Selected Papers from the Intern. Semantic Web Conf. ISWC (2004)Google Scholar
  6. 6.
    Gutierrez, C., Hurtado, C., Mendelzon, A.: Foundations of Semantic Web Databases. In: Proceedings of the Twenty-Third Symposium on Principles of Database Systems (PODS), Paris, France (2004)Google Scholar
  7. 7.
    Harary, F.: Graph Theory. Addison-Wesley, Reading (1969)Google Scholar
  8. 8.
    Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool (2011)Google Scholar
  9. 9.
    Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., Hogan, A.: Observing Linked Data Dynamics. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 213–227. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  10. 10.
    Mallea, A., Arenas, M., Hogan, A., Polleres, A.: On Blank Nodes. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 421–437. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Pham, M.-D., Boncz, P., Erling, O.: S3G2: A Scalable Structure-Correlated Social Graph Generator. In: Nambiar, R., Poess, M. (eds.) TPCTC 2012. LNCS, vol. 7755, pp. 156–172. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  12. 12.
    Newman, M.E.J.: Power laws, pareto distributions and zipf’s law. Contemporary Physics (2005)Google Scholar
  13. 13.
    Papadakis, G., Ioannou, E., Palpanasa, T., Niederee, C., Nejdl, W.: A blocking framework for entity resolution in highly heterogeneous information spaces. IEEE Knowledge and Data Engineering (2012)Google Scholar
  14. 14.
    Pichler, R., Polleres, A., Wei, F., Woltran, S.: dRDF: Entailment for Domain-Restricted RDF. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 200–214. Springer, Heidelberg (2008)Google Scholar
  15. 15.
    Tzitzikas, Y., Lantzaki, C., Zeginis, D.: Blank Node Matching and RDF/S Comparison Functions. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 591–607. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  16. 16.
    Watts, D.J., Strogatz, S.H.: Collective dynamics of small-world networks. Nature (1998)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Christina Lantzaki
    • 1
    • 2
  • Thanos Yannakis
    • 1
    • 2
  • Yannis Tzitzikas
    • 1
    • 2
  • Anastasia Analyti
    • 1
    • 2
  1. 1.Computer Science DepartmentUniversity of CreteGreece
  2. 2.Institute of Computer ScienceFORTH-ICSGreece

Personalised recommendations