TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from the Linked Open Data Cloud

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)

Abstract

Deciding which RDF vocabulary terms to use when modeling data as Linked Open Data (LOD) is far from trivial. In this paper, we propose TermPicker as a novel approach enabling vocabulary reuse by recommending vocabulary terms based on various features of a term. These features include the term’s popularity, whether it is from an already used vocabulary, and the so-called schema-level pattern (SLP) feature that exploits which terms other data providers on the LOD cloud use to describe their data. We apply Learning To Rank to establish a ranking model for vocabulary terms based on the utilized features. The results show that using the SLP-feature improves the recommendation quality by 29–36 % considering the Mean Average Precision and the Mean Reciprocal Rank at the first five positions compared to recommendations based on solely the term’s popularity and whether it is from an already used vocabulary.

References

  1. 1.
    Heath, T., Bizer, C.: Synthesis lectures on the semantic web. In: Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool Publishers, San Rafael (2011)Google Scholar
  2. 2.
    Vandenbussche, P.Y., Atemezing, G.A., Poveda-Villalón, M., Vatant, B.: Linked Open Vocabularies (LOV): a gateway to reusable semantic vocabularies on the Web. Semantic Web J. (Preprint) 1–16 (2015)Google Scholar
  3. 3.
    Auer, S., Demter, J., Martin, M., Lehmann, J.: LODStats – an extensible framework for high-performance dataset analytics. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 353–362. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Fernandez, M., Cantador, I., Castells, P.: Core: a tool for collaborative ontology reuse and evaluation. In: 4th International Workshop on Evaluation of Ontologies for the Web (2006)Google Scholar
  5. 5.
    Schaible, J., Gottron, T., Scherp, A.: TermPicker: enabling the reuse of vocabulary terms by exploiting data from the linked open data cloud - an extended technical report. ArXiv e-prints, December 2015. http://arxiv.org/abs/1512.05685
  6. 6.
    d’Aquin, M., Baldassarre, C., Gridinoc, L., Sabou, M., Angeletou, S., Motta, E.: Watson: supporting next generation semantic web applications. In: Proceedings of the IADIS International Conference WWW/Internet 2007, pp. 363–371 (2007)Google Scholar
  7. 7.
    Cheng, G., Ge, W., Qu, Y.: Falcons: searching and browsing entities on the semantic web. In: Proceedings of the 17th International Conference on World Wide Web. ACM (2008)Google Scholar
  8. 8.
    García-Santa, N., Atemezing, G.A., Villazón-Terrazas, B.: The ProtégéLOV plugin: ontology access and reuse for everyone. In: Gandon, F., Guéret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9341, pp. 41–45. Springer, Switzerland (2015)CrossRefGoogle Scholar
  9. 9.
    Scharffe, F., Atemezing, G., Troncy, R., Gandon, F., et al.: Enabling linked-data publication with the datalift platform. In: AAAI 2012, 26th Conference on Artificial Intelligence (2012)Google Scholar
  10. 10.
    Cheng, G., Gong, S., Qu, Y.: An empirical study of vocabulary relatedness and its application to recommender systems. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 98–113. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Ramnandan, S.K., Mittal, A., Knoblock, C.A., Szekely, P.: Assigning semantic labels to data sources. In: Gandon, F., Sabou, M., Sack, H., dAmato, C., Cudre-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 403–417. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  12. 12.
    Taheriyan, M., Knoblock, C.A., Szekely, P., Ambite, J.L.: Learning the semantics of structured data sources. Web Semant. Sci. Serv. Agents World Wide Web (2016). ISSN: 1570-8268. doi:10.1016/j.websem.2015.12.003
  13. 13.
    Knoblock, C.A., et al.: Semi-automatically mapping structured sources into the semantic Web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  14. 14.
    Presutti, V., Aroyo, L.M., Gangemi, A., Adamou, A., Schopman, B., Schreiber, G.: A knowledge pattern-based method for linked data analysis. In: Proceedings of the Sixth International Conference on Knowledge Capture, pp. 173–174. ACM (2011)Google Scholar
  15. 15.
    Zhang, Z., Gentile, A.L., Blomqvist, E., Augenstein, I., Ciravegna, F.: Statistical knowledge patterns: identifying synonymous relations in large linked datasets. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 703–719. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  16. 16.
    Campinas, S., Perry, T.E., Ceccarelli, D., Delbru, R., Tummarello, G.: Introducing RDF graph summary with application to assisted SPARQL formulation. In: 23rd International Workshop on Database and Expert Systems Applications (DEXA), pp. 261–266. IEEE (2012)Google Scholar
  17. 17.
    Dudáš, M., Svátek, V., Mynarz, J.: Dataset summary visualization with LODSight. In: Gandon, F., Guéret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9341, pp. 36–40. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25639-9_7 Google Scholar
  18. 18.
    Schaible, J., Gottron, T., Scherp, A.: Survey on common strategies of vocabulary reuse in linked open data modeling. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 457–472. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  19. 19.
    Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retrieval 3(3), 225–331 (2009)CrossRefGoogle Scholar
  20. 20.
    Lodi, G., Maccioni, A., Scannapieco, M., Scanu, M., Tosco, L.: Publishing official classifications in linked open data. In: Proceedings of the 2nd International Workshop on Semantic Statistics (SemStats2014) in conjunction with the 13th International Semantic Web Conference (ISWC). Springer, Riva del Garda, Italy (2014)Google Scholar
  21. 21.
    Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., Hogan, A.: Observing linked data dynamics. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 213–227. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  22. 22.
    Käfer, T., Harth, A.: Billion Triples Challenge data set (2014). http://km.aifb.kit.edu/projects/btc-2014/
  23. 23.
    Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Inf. Retrieval 10(3), 257–274 (2007)CrossRefGoogle Scholar
  24. 24.
    Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retrieval 13(3), 254–270 (2010)CrossRefGoogle Scholar
  25. 25.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Busa-Fekete, R., Szarvas, G., Elteto, T., Kégl, B., et al.: An apple-to-apple comparison of learning-to-rank algorithms in terms of normalized discounted cumulative gain. In: 20th European Conference on Artificial Intelligence, ECAI 2012, vol. 242 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Johann Schaible
    • 1
  • Thomas Gottron
    • 2
  • Ansgar Scherp
    • 3
  1. 1.GESIS – Leibniz Institute for the Social SciencesCologneGermany
  2. 2.Innovation LabSCHUFA Holding AGWiesbadenGermany
  3. 3.ZBW – Leibniz Information Center for EconomicsKiel UniversityKielGermany

Personalised recommendations