A Collection of Benchmark Datasets for Systematic Evaluations of Machine Learning on the Semantic Web

  • Petar RistoskiEmail author
  • Gerben Klaas Dirk de Vries
  • Heiko Paulheim
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9982)


In the recent years, several approaches for machine learning on the Semantic Web have been proposed. However, no extensive comparisons between those approaches have been undertaken, in particular due to a lack of publicly available, acknowledged benchmark datasets. In this paper, we present a collection of 22 benchmark datasets of different sizes. Such a collection of datasets can be used to conduct quantitative performance testing and systematic comparisons of approaches.


Linked Open Data Machine learning Datasets Benchmarking 



The work presented in this paper has been partly funded by the German Research Foundation (DFG) under grant number PA 2373/1-1 (Mine@LOD), and the Dutch national program COMMIT.


  1. 1.
    Bloehdorn, S., Sure, Y.: Kernel methods for mining instance data in ontologies. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 58–71. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-76298-0_5 CrossRefGoogle Scholar
  2. 2.
    Boer, V., Wielemaker, J., Gent, J., Hildebrand, M., Isaac, A., Ossenbruggen, J., Schreiber, G.: Supporting linked data production for cultural heritage institutes: the Amsterdam museum case study. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 733–747. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-30284-8_56 CrossRefGoogle Scholar
  3. 3.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Emmott, A.F., Das, S., Dietterich, T., Fern, A., Wong, W.K.: Systematic construction of anomaly detection benchmarks from real data. In: Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, pp. 16–21. ACM (2013)Google Scholar
  5. 5.
    Färber, I., Günnemann, S., Kriegel, H.P., Kröger, P., Müller, E., Schubert, E., Seidl, T., Zimek, A.: On using class-labels in evaluation of clusterings. In: MultiClust: Workshop on Discovering, Summarizing and Using Multiple Clusterings (2010)Google Scholar
  6. 6.
    Jovanovik, M., Bogojeska, A., Trajanov, D., Kocarev, L.: Inferring cuisine-drug interactions using the linked data approach. Scientific reports 5 (2015)Google Scholar
  7. 7.
    Paulheim, H.: Generating possible interpretations for statistics from linked open data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 560–574. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-30284-8_44 CrossRefGoogle Scholar
  8. 8.
    Rettinger, A., Lösch, U., Tresp, V., d’Amato, C., Fanizzi, N.: Mining the semantic web. Data Min. Knowl. Disc. 24(3), 613–662 (2012)CrossRefzbMATHGoogle Scholar
  9. 9.
    Ristoski, P., Bizer, C., Paulheim, H.: Mining the web of linked data with rapidminer. Web Semant. Sci. Serv. Agents WWW 35, 142–151 (2015)CrossRefGoogle Scholar
  10. 10.
    Ristoski, P., Paulheim, H.: Analyzing statistics with background knowledge from linked open data. In: Workshop on Semantic Statistics (2013)Google Scholar
  11. 11.
    Ristoski, P., Paulheim, H.: Semantic web in data mining and knowledge discovery: a comprehensive survey. Web Semant. 36, 1–22 (2016)CrossRefGoogle Scholar
  12. 12.
    Ristoski, P., Paulheim, H., Svátek, V., Zeman, V.: The linked data mining challenge 2015. In: KNOW@ LOD (2015)Google Scholar
  13. 13.
    Ristoski, P., Paulheim, H., Svátek, V., Zeman, V.: The linked data mining challenge 2016. In: KNOW@LOD (2016)Google Scholar
  14. 14.
    Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 245–260. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-11964-9_16 Google Scholar
  15. 15.
    Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1), 31–72 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Tresp, V., Bundschus, M., Rettinger, A., Huang, Y.: Towards machine learning on the semantic web. In: da Costa, P.C.G., et al. (eds.) URSW 2005-2007. LNCS, vol. 5327, pp. 282–314. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Vries, G.K.D.: A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS, vol. 8188, pp. 606–621. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Petar Ristoski
    • 1
    Email author
  • Gerben Klaas Dirk de Vries
    • 2
  • Heiko Paulheim
    • 1
  1. 1.Research Group Data and Web ScienceUniversity of MannheimMannheimGermany
  2. 2.WizeNozeAmsterdamThe Netherlands

Personalised recommendations