Generating Synthetic RDF Data with Connected Blank Nodes for Benchmarking
Generators for synthetic RDF datasets are very important for testing and benchmarking various semantic data management tasks (e.g. querying, storage, update, compare, integrate). However, the current generators do not support sufficiently (or totally ignore) blank node connectivity issues. Blank nodes are used for various purposes (e.g. for describing complex attributes), and a significant percentage of resources is currently represented with blank nodes. Moreover, several semantic data management tasks, like isomorphism checking (useful for checking equivalence), and blank node matching (useful in comparison, versioning, synchronization, and in semantic similarity functions), not only have to deal with blank nodes, but their complexity and optimality depends on the connectivity of blank nodes. To enable the comparative evaluation of the various techniques for carrying out these tasks, in this paper we present the design and implementation of a generator, called BGen, which allows building datasets containing blank nodes with the desired complexity, controllable through various features (morphology, size, diameter, density and clustering coefficient). Finally, the paper reports experimental results concerning the efficiency of the generator, as well as results from using the generated datasets, that demonstrate the value of the generator.
Unable to display preview. Download preview PDF.
- 1.Bizer, C., Schultz, A.: The berlin SPARQL benchmark. International Journal on Semantic Web and Information Systems (2009)Google Scholar
- 2.Chen, L., Zhang, H., Chen, Y., Guo, W.: Blank Nodes in RDF. Journal of Software (2012)Google Scholar
- 3.Coleman, T.F., More, J.J.: Estimation of Sparse Jacobian Matrices and Graph Coloring Problems. SIAM Journal on Numerical Analysis (1983)Google Scholar
- 5.Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. In: Selected Papers from the Intern. Semantic Web Conf. ISWC (2004)Google Scholar
- 6.Gutierrez, C., Hurtado, C., Mendelzon, A.: Foundations of Semantic Web Databases. In: Proceedings of the Twenty-Third Symposium on Principles of Database Systems (PODS), Paris, France (2004)Google Scholar
- 7.Harary, F.: Graph Theory. Addison-Wesley, Reading (1969)Google Scholar
- 8.Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool (2011)Google Scholar
- 12.Newman, M.E.J.: Power laws, pareto distributions and zipf’s law. Contemporary Physics (2005)Google Scholar
- 13.Papadakis, G., Ioannou, E., Palpanasa, T., Niederee, C., Nejdl, W.: A blocking framework for entity resolution in highly heterogeneous information spaces. IEEE Knowledge and Data Engineering (2012)Google Scholar
- 14.Pichler, R., Polleres, A., Wei, F., Woltran, S.: dRDF: Entailment for Domain-Restricted RDF. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 200–214. Springer, Heidelberg (2008)Google Scholar
- 16.Watts, D.J., Strogatz, S.H.: Collective dynamics of small-world networks. Nature (1998)Google Scholar