LODStats – An Extensible Framework for High-Performance Dataset Analytics

  • Sören Auer
  • Jan Demter
  • Michael Martin
  • Jens Lehmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7603)


One of the major obstacles for a wider usage of web data is the difficulty to obtain a clear picture of the available datasets. In order to reuse, link, revise or query a dataset published on the Web it is important to know the structure, coverage and coherence of the data. In order to obtain such information we developed LODStats – a statement-stream-based approach for gathering comprehensive statistics about datasets adhering to the Resource Description Framework (RDF). LODStats is based on the declarative description of statistical dataset characteristics. Its main advantages over other approaches are a smaller memory footprint and significantly better performance and scalability. We integrated LODStats with the CKAN dataset metadata registry and obtained a comprehensive picture of the current state of a significant part of the Data Web.


SPARQL Query Triple Pattern SPARQL Endpoint Property Datatype Link Open Data Cloud 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing linked datasets. In: 2nd WS on Linked Data on the Web, Madrid, Spain (April 2009)Google Scholar
  2. 2.
    Anicic, D., Fodor, P., Rudolph, S., Stojanovic, N.: EP-SPARQL: a unified language for event processing and stream reasoning. In: WWW. ACM (2011)Google Scholar
  3. 3.
    Barbieri, D.F., Braga, D., Ceri, S., Valle, E.D., Grossniklaus, M.: Querying rdf streams with C-SPARQL. SIGMOD Record 39(1), 20–26 (2010)CrossRefGoogle Scholar
  4. 4.
    Beckett, D.: The design and implementation of the redland rdf application framework. In: Proc. of 10th Int. World Wide Web Conf., pp. 449–456. ACM (2001)Google Scholar
  5. 5.
    Bizer, C., Jentzsch, A., Cyganiak, R.: State of the LOD Cloud, Version 0.3 (September 2011)Google Scholar
  6. 6.
    Bolles, A., Grawunder, M., Jacobi, J.: Streaming SPARQL - Extending SPARQL to Process Data Streams. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 448–462. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  7. 7.
    Campinas, S., Ceccarelli, D., Perry, T.E., Delbru, R., Balog, K., Tummarello, G.: The Sindice-2011 dataset for entity-oriented search in the web of data. In: 1st Int. Workshop on Entity-Oriented Search (EOS), pp. 26–32 (2011)Google Scholar
  8. 8.
    Cyganiak, R., Reynolds, D., Tennison, J.: The rdf data cube vocabulary (2012),
  9. 9.
    Langegger, A., Wöß, W.: Rdfstats - an extensible rdf statistics generator and library. In: DEXA Workshops, pp. 79–83. IEEE Computer Society (2009)Google Scholar
  10. 10.
    Ngonga Ngomo, A.-C., Auer, S.: Limes - a time-efficient approach for large-scale link discovery on the web of data. In: Proc. of IJCAI (2011)Google Scholar
  11. 11.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and Maintaining Links on the Web of Data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sören Auer
    • 1
  • Jan Demter
    • 1
  • Michael Martin
    • 1
  • Jens Lehmann
    • 1
  1. 1.AKSW/BISUniversität LeipzigGermany

Personalised recommendations