LD-LEx: Linked Dataset Link Extractor (Short Paper)

  • Ciro Baron Neto
  • Dimitris Kontokostas
  • Gustavo Publio
  • Kay Müller
  • Sebastian Hellmann
  • Eduardo Moletta
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10033)


With the steady growth of linked datasets available on the web, it becomes increasingly necessary the creation of efficient approaches for analyzing, search and discover links between RDF datasets. In this paper, we describe LD-LEx, an architecture that creates the possibility of indexing RDF datasets using GridFS documents and probabilistic data structures called Bloom filter. Hence, our lightweight approach provides metadata about quantity and quality of links between datasets. Moreover, we explored these concepts indexing more than 2 billion triples from over a thousand of datasets, providing insights of Bloom filters behavior w.r.t. performance and memory footprint.


RDF Bloom filter Linksets Linked Open Data 



This papers research activities were funded by grants from the FP7&H2020 EU projects ALIGNED (GA-644055), LIDER (GA-610782), FREME (GA-644771), from the Federal Ministry of Education and Research (BMBF) project Smart Data Web (GA-01MD15010B) and CAPES foundation (Ministry of Education of Brazil) for the given scholarship (13204/13-0).


  1. 1.
    Alexander, K., Hausenblas, M.: Describing linked datasets - on the design and usage of void, the vocabulary of interlinked datasets. In: Linked Data on the Web Workshop (LDOW 2009), in Conjunction with 18th International World Wide Web Conference (WWW 09) (2009)Google Scholar
  2. 2.
    Baron Neto, C., Müller, K., Brümmer, M., Kontokostas, D., Hellmann, S.: Lod-vader: an interface to lod visualization, analyticsand discovery in real-time. In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion. International World Wide Web Conferences (2016)Google Scholar
  3. 3.
    Beek, W., Rietveld, L., Bazoobandi, H.R., Wielemaker, J., Schlobach, S.: LOD laundromat: a uniform way of publishing other people’s dirty data. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 213–228. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-11964-9_14 Google Scholar
  4. 4.
    Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefzbMATHGoogle Scholar
  5. 5.
    Brümmer, M., Baron, C., Ermilov, I., Freudenberg, M., Kontokostas, D., Hellmann, S.: DataID: towards semantically rich metadata for complex datasets. In: Proceedings of the 10th International Conference on Semantic Systems, SEM 2014, pp. 84–91. ACM (2014)Google Scholar
  6. 6.
    Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using linked data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 98–113. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-41338-4_7 CrossRefGoogle Scholar
  7. 7.
    Hose, K., Schenkel, R.: Towards benefit-based RDF source selection for SPARQL queries. In: Proceedings of the 4th International Workshop on Semantic Web Information Management, SWIM 2012, pp. 2:1–2:8. ACM, New York (2012)Google Scholar
  8. 8.
    F. Maali and J. Erickson. Data Catalog Vocabulary (DCAT). W3C recommendation, W3C, January 2014Google Scholar
  9. 9.
    Nentwig, M., Hartung, M., Ngomo, A.-C.N., Rahm, E.: A survey of current link discovery frameworks. In: Semantic Web, pp. 1–18 (2015) (Preprint)Google Scholar
  10. 10.
    Neto, C.B., Kontokostas, D., Hellmann, S., Müller, K., Brümmer, M.: Assessing quantity and quality of links between linked data datasets. In: Proceedings of the Workshop on Linked Data on the Web Co-located with the 25th International World Wide Web Conference (WWW 2016), April 2016Google Scholar
  11. 11.
    Oren, E., Guéret, C., Schlobach, S.: Anytime query answering in RDF through evolutionary algorithms. In: Sheth, A., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 98–113. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-88564-1_7 CrossRefGoogle Scholar
  12. 12.
    Putze, F., Sanders, P., Singler, J.: Cache-, Hash-, and Space-effcient bloom filters 14, 4:4.4–4:4.18 (2010)Google Scholar
  13. 13.
    Williams, G.T.: Supporting identity reasoning in SPARQL using bloom filters. In: Advancing Reasoning on the Web: Scalability and Commonsense (ARea 2008) (2008)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Ciro Baron Neto
    • 1
  • Dimitris Kontokostas
    • 1
  • Gustavo Publio
    • 1
  • Kay Müller
    • 1
  • Sebastian Hellmann
    • 1
  • Eduardo Moletta
    • 2
  1. 1.AKSW, Department of Computer ScienceUniversity of LeipzigLeipzigGermany
  2. 2.Department of Computer ScienceFederal University of TechnologyCuritibaBrazil

Personalised recommendations