LODStats: The Data Web Census Dataset

  • Ivan Ermilov
  • Jens Lehmann
  • Michael Martin
  • Sören Auer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9982)


Over the past years, the size of the Data Web has increased significantly, which makes obtaining general insights into its growth and structure both more challenging and more desirable. The lack of such insights hinders important data management tasks such as quality, privacy and coverage analysis. In this paper, we present the LODStats dataset, which provides a comprehensive picture of the current state of a significant part of the Data Web. LODStats is based on RDF datasets from, and data catalogs and at the time of writing lists over 9000 RDF datasets. For each RDF dataset, LODStats collects comprehensive statistics and makes these available in adhering to the LDSO vocabulary. This analysis has been regularly published and enhanced over the past five years at the public platform We give a comprehensive overview over the resulting dataset.


Data Catalog Semantic Technology SPARQL Endpoint Usage Count Machine Agent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was partly supported by the German Federal Ministry of Education and Research (BMBF) for the LEDS Project (GA no. 03WKCG11C) and by grant from the European Union’s Horizon 2020 research Europe flag and innovation programme for the project Big Data Europe (GA no. 644564).


  1. 1.
    Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing linked datasets. In: LDOW (2009)Google Scholar
  2. 2.
    Allemang, D., Hendler, J.: Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL. Morgan Kaufmann Publishers Inc., San Francisco (2011). ISBN: 9780123859662Google Scholar
  3. 3.
    Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-41338-4_18 CrossRefGoogle Scholar
  4. 4.
    Auer, S., Demter, J., Martin, M., Lehmann, J.: LODStats – an extensible framework for high-performance dataset analytics. In: Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS (LNAI), vol. 7603, pp. 353–362. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33876-2_31 CrossRefGoogle Scholar
  5. 5.
    Ermilov, I., Martin, M., Lehmann, J., Auer, S.: Linked open data statistics: collection and exploitation. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 242–249. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-41360-5_19 CrossRefGoogle Scholar
  6. 6.
    Greenberg, E.M.R., Bueno, J.G., de la Fuente, T., Baker, P.-Y.V., Vatant, B.: Requirements for vocabulary preservation and governance. Libr. Hi Tech 31(4), 657–668 (2013)CrossRefGoogle Scholar
  7. 7.
    Maali, F., Erickson, J., Archer, P.: Data catalog vocabulary (dcat). In: W3C Recommendation (2014)Google Scholar
  8. 8.
    Ngonga Ngomo, A.-C., Auer, S.: Limes - a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of IJCAI (2011)Google Scholar
  9. 9.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04930-9_41 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Ivan Ermilov
    • 1
  • Jens Lehmann
    • 2
  • Michael Martin
    • 1
  • Sören Auer
    • 2
  1. 1.AKSW, Institute of Computer ScienceUniversity of LeipzigLeipzigGermany
  2. 2.University of Bonn and Fraunhofer IAISBonnGermany

Personalised recommendations