A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles

  • Besnik Fetahu
  • Stefan Dietze
  • Bernardo Pereira Nunes
  • Marco Antonio Casanova
  • Davide Taibi
  • Wolfgang Nejdl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8465)


The increasing adoption of Linked Data principles has led to an abundance of datasets on the Web. However, take-up and reuse is hindered by the lack of descriptive information about the nature of the data, such as their topic coverage, dynamics or evolution. To address this issue, we propose an approach for creating linked dataset profiles. A profile consists of structured dataset metadata describing topics and their relevance. Profiles are generated through the configuration of techniques for resource sampling from datasets, topic extraction from reference datasets and their ranking based on graphical models. To enable a good trade-off between scalability and accuracy of generated profiles, appropriate parameters are determined experimentally. Our evaluation considers topic profiles for all accessible datasets from the Linked Open Data cloud. The results show that our approach generates accurate profiles even with comparably small sample sizes (10%) and outperforms established topic modelling approaches.


Profiling Metadata Vocabulary of Links Linked Data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Auer, S., Demter, J., Martin, M., Lehmann, J.: LODStats – an extensible framework for high-performance dataset analytics. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 353–362. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  4. 4.
    Böhm, C., Kasneci, G., Naumann, F.: Latent topics in graph-structured data. In: 21st ACM International Conference on Information and Knowledge Management (CIKM), pp. 2663–2666 (2012)Google Scholar
  5. 5.
    Böhm, C., Lorey, J., Naumann, F.: Creating void descriptions for web-scale data. J. Web Sem. 9(3), 339–345 (2011)CrossRefGoogle Scholar
  6. 6.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks 30(1-7), 107–117 (1998)CrossRefGoogle Scholar
  7. 7.
    Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: Ready for action? In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    d’Aquin, M., Adamou, A., Dietze, S.: Assessing the educational linked data landscape. In: Web Science (WebSci), pp. 43–46 (2013)Google Scholar
  9. 9.
    Fetahu, B., Dietze, S., Pereira Nunes, B., Antonio Casanova, M.: Generating structured profiles of linked data graphs. In: Proceedings of the 12th International Semantic Web Conference (ISWC). Springer (2013)Google Scholar
  10. 10.
    Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of dBpedia entities. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012)Google Scholar
  12. 12.
    Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using dbpedia. In: ACM International Conference on Web Search and Data Mining (WSDM), pp. 465–474 (2013)Google Scholar
  13. 13.
    Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., Hogan, A.: Observing linked data dynamics. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 213–227. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  14. 14.
    Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic annotation, indexing, and retrieval. J. Web Sem. 2(1), 49–79 (2004)CrossRefGoogle Scholar
  15. 15.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: 7th International Conference on Semantic Systems (ISWC), pp. 1–8 (2011)Google Scholar
  17. 17.
    Pereira Nunes, B., Dietze, S., Casanova, M.A., Kawase, R., Fetahu, B., Nejdl, W.: Combining a co-occurrence-based and a semantic measure for entity linking. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 548–562. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  18. 18.
    Oren, E., Delbru, R., Catasta, M., Cyganiak, R., Stenzhorn, H., Tummarello, G.: a document-oriented lookup index for open linked data. IJMSO 3(1), 37–52 (2008)CrossRefGoogle Scholar
  19. 19.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Williamson, C.L., Zurko, M.E., Patel-Schneider, P.F., Shenoy, P.J. (eds.) WWW, pp. 697–706. ACM (2007)Google Scholar
  20. 20.
    White, S., Smyth, P.: Algorithms for estimating relative importance in networks. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 266–275 (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Besnik Fetahu
    • 1
  • Stefan Dietze
    • 1
  • Bernardo Pereira Nunes
    • 2
  • Marco Antonio Casanova
    • 2
  • Davide Taibi
    • 3
  • Wolfgang Nejdl
    • 1
  1. 1.L3S Research CenterLeibniz Universität HannoverGermany
  2. 2.Department of InformaticsPUC-RioRio de JaneiroBrazil
  3. 3.Institute for Educational TechnologiesCNRPalermoItaly

Personalised recommendations