Skip to main content

Two Approaches to the Dataset Interlinking Recommendation Problem

  • Conference paper
Web Information Systems Engineering – WISE 2014 (WISE 2014)

Abstract

Whenever a dataset t is published on the Web of Data, an exploratory search over existing datasets must be performed to identify those datasets that are potential candidates to be interlinked with t. This paper introduces and compares two approaches to address the dataset interlinking recommendation problem, respectively based on Bayesian classifiers and on Social Network Analysis techniques. Both approaches define rank score functions that explore the vocabularies, classes and properties that the datasets use, in addition to the known dataset links. After extensive experiments using real-world datasets, the results show that the rank score functions achieve a mean average precision of around 60%. Intuitively, this means that the exploratory search for datasets to be interlinked with t might be limited to just the top-ranked datasets, reducing the cost of the dataset interlinking process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berners-Lee, T.: Linked Data. In: Design Issues. W3C (July 2006)

    Google Scholar 

  2. Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova, M.A., Dietze, S.: Identifying candidate datasets for data interlinking. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 354–366. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. Lopes, G.R., Leme, L.A.P.P., Nunes, B.P., Casanova, M.A., Dietze, S.: Recommending tripleset interlinking through a social network approach. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013, Part I. LNCS, vol. 8180, pp. 149–161. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  4. Nikolov, A., d’Aquin, M.: Identifying Relevant Sources for Data Linking using a Semantic Web Index. In: WWW2011 Workshop on Linked Data on the Web, Hyderabad, India. CEUR Workshop Proceedings, vol. 813. CEUR-WS.org (March 29, 2011)

    Google Scholar 

  5. Nikolov, A., d’Aquin, M., Motta, E.: What Should I Link to? Identifying Relevant Sources and Classes for Data Linking. In: Pan, J.Z., Chen, H., Kim, H.-G., Li, J., Wu, Z., Horrocks, I., Mizoguchi, R., Wu, Z. (eds.) JIST 2011. LNCS, vol. 7185, pp. 284–299. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Kuznetsov, K.A.: Scientific data integration system in the linked open data space. Programming and Computer Software 39(1), 43–48 (2013)

    Article  Google Scholar 

  7. Mühleisen, H., Jentzsch, A.: Augmenting the Web of Data using Referers. In: WWW2011 Workshop on Linked Data on the Web, Hyderabad, India. CEUR Workshop Proceedings, vol. 813. CEUR-WS.org (March 29, 2011)

    Google Scholar 

  8. Lóscio, B.F., Batista, M., Souza, D.: Using information quality for the identification of relevant web data sources. In: The 14th International Conference on Information Integration and Web-Based Applications & Services, IIWAS 2012, Bali, Indonesia, December 3-5, pp. 36–44. ACM, New York (2012)

    Google Scholar 

  9. Wagner, A., Haase, P., Rettinger, A., Lamm, H.: Discovering related data sources in data-portals. In: Proceedings of the First International Workshop on Semantic Statistics, Co-located with the the International Semantic Web Conference (2013)

    Google Scholar 

  10. de Oliveira, H.R., Tavares, A.T., Lóscio, B.F.: Feedback-based data set recommendation for building linked data applications. In: I-SEMANTICS 2012 - 8th International Conference on Semantic Systems, I-SEMANTICS 2012, Graz, Austria, September 5-7, pp. 49–55. ACM (2012)

    Google Scholar 

  11. Toupikov, N., Umbrich, J., Delbru, R., Hausenblas, M., Tummarello, G.: Ding! dataset ranking using formal descriptions. In: Proceedings of the WWW2009 Workshop on Linked Data on the Web, LDOW 2009, Madrid, Spain. CEUR Workshop Proceedings, vol. 538. CEUR-WS.org (April 20, 2009)

    Google Scholar 

  12. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (January 2011)

    Google Scholar 

  13. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press (2002)

    Google Scholar 

  14. Lü, L., Jin, C.H., Zhou, T.: Similarity index based on local paths for link prediction of complex networks. Physical Review E 80(4), 046122 (2009)

    Google Scholar 

  15. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)

    Article  Google Scholar 

  16. Caraballo, A.A.M., Nunes, B.P., Lopes, G.R., Leme, L.A.P.P., Casanova, M.A., Dietze, S.: Trt - a tripleset recommendation tool. In: Proceedings of the ISWC 2013 Posters & Demonstrations Track, Sydney, Australia. CEUR Workshop Proceedings, vol. 1035, pp. 105–108. CEUR-WS.org (October 23, 2013)

    Google Scholar 

  17. Taibi, D., Dietze, S.: Proceedings of the LAK Data Challenge, Leuven, Belgium, April 9. CEUR Workshop Proceedings, vol. 974. CEUR-WS.org (2013)

    Google Scholar 

  18. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval - the concepts and technology behind search, 2nd edn. Pearson Education Ltd., Harlow (2011)

    Google Scholar 

  19. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (July 2008)

    Google Scholar 

  20. Hull, D.: Using statistical testing in the evaluation of retrieval experiments. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1993, pp. 329–338. ACM, New York (1993)

    Chapter  Google Scholar 

  21. Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing Linked Datasets with the VoID Vocabulary. W3C (March 2011)

    Google Scholar 

  22. do Vale Gomes, R., Casanova, M.A., Lopes, G.R., Leme, L.A.P.P.: A metadata focused crawler for linked data. In: Proceedings of the 16th International Conference on Enterprise Information Systems, ICEIS 2014, Lisbon, Portugal, April 27-30, vol. 2, pp. 489–500. SciTePress (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Rabello Lopes, G., Paes Leme, L.A.P., Pereira Nunes, B., Casanova, M.A., Dietze, S. (2014). Two Approaches to the Dataset Interlinking Recommendation Problem. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11749-2_25

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11748-5

  • Online ISBN: 978-3-319-11749-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics