Advertisement

Defining Key Semantics for the RDF Datasets: Experiments and Evaluations

  • Manuel AtenciaEmail author
  • Michel Chein
  • Madalina Croitoru
  • Jérôme David
  • Michel Leclère
  • Nathalie Pernelle
  • Fatiha Saïs
  • Francois Scharffe
  • Danai Symeonidou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8577)

Abstract

Many techniques were recently proposed to automate the linkage of RDF datasets. Predicate selection is the step of the linkage process that consists in selecting the smallest set of relevant predicates needed to enable instance comparison. We call keys this set of predicates that is analogous to the notion of keys in relational databases. We explain formally the different assumptions behind two existing key semantics. We then evaluate experimentally the keys by studying how discovered keys could help dataset interlinking or cleaning. We discuss the experimental results and show that the two different semantics lead to comparable results on the studied datasets.

Keywords

Relational Database Object Property Class Instance Predicate Selection Extended Fact 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arasu, A., Ré, C., Suciu, D.: Large-scale deduplication with constraints using dedupalog. In: ICDE, pp. 952–963 (2009)Google Scholar
  2. 2.
    Atencia, M., David, J., Scharffe, F.: Keys and pseudo-keys detection for web datasets cleansing and interlinking. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS (LNAI), vol. 7603, pp. 144–153. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Baxter, R., Christen, P., Churches, T.: A comparison of fast blocking methods for record linkage. In: KDD 2003 Workshops, pp. 25–27 (2003)Google Scholar
  4. 4.
    Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19, 1–16 (2007)CrossRefGoogle Scholar
  5. 5.
    Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Int. J. Semantic Web Inf. Syst. 7(3), 46–76 (2011)CrossRefGoogle Scholar
  6. 6.
    Hu, W., Chen, J., Qu, Y.: A self-training approach for resolving object coreference on the semantic web. In: WWW, pp. 87–96 (2011)Google Scholar
  7. 7.
    Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal 42(2), 100–111 (1999)CrossRefzbMATHGoogle Scholar
  8. 8.
    Isele, R., Bizer, C.: Learning expressive linkage rules using genetic programming. PVLDB 5(11), 1638–1649 (2012)Google Scholar
  9. 9.
    Isele, R., Jentzsch, A., Bizer, C.: Efficient multidimensional blocking for link discovery without losing recall. In: Proceedings of the 14th International Workshop on the Web and Databases (WebDB), Greece (2011)Google Scholar
  10. 10.
    Michelson, M., Knoblock, C.A.: Learning blocking schemes for record linkage. In: AAAI, pp. 440–445 (2006)Google Scholar
  11. 11.
    Ngonga Ngomo, A.-C., Lyko, K.: EAGLE: Efficient active learning of link specifications using genetic programming. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 149–163. Springer, Heidelberg (2012)Google Scholar
  12. 12.
    Nikolov, A., d’Aquin, M., Motta, E.: Unsupervised learning of link discovery configuration. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 119–133. Springer, Heidelberg (2012)Google Scholar
  13. 13.
    Nikolov, A., Motta, E.: Data linking: Capturing and utilising implicit schema-level relations. In: Proceedings of Linked Data on the Web Workshop at 19th International World Wide Web Conference (WWW 2010) (2010)Google Scholar
  14. 14.
    Patel-Schneider, P.F., Hayes, P., Horrocks, I.: OWL Web Ontology Language Semantics and Abstract Syntax Section 5. RDF-Compatible Model-Theoretic Semantics. Technical report, W3C (December 2004)Google Scholar
  15. 15.
    Pernelle, N., Sais, F., Symeonidou, D.: An automatic key discovery approach for data linking. Web Semantics: Science, Services and Agents on the World Wide Web (2013)Google Scholar
  16. 16.
    W. Recommendation. Owl 2 web ontology language: Direct semantics. In: Motik, B., Patel-Schneider, P.F., Cuenca Grau, B. (eds.) W3C (October 27, 2009), http://www.w3.org/TR/owl2-direct-semantics/
  17. 17.
    W. Recommendation. Owl 2 web ontology language: Structural specification and functional-style syntax. In: Motik, B., Patel-Schneider, P.F., Parsia, B. (eds.) W3C (October 27, 2009), http://www.w3.org/TR/owl2-syntax/
  18. 18.
    Saïs, F., Pernelle, N., Rousset, M.-C.: L2r: A logical method for reference reconciliation. In: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, Vancouver, British Columbia, Canada, pp. 329–334 (2007)Google Scholar
  19. 19.
    Saïs, F., Pernelle, N., Rousset, M.-C.: Combining a logical and a numerical method for data reconciliation. In: Spaccapietra, S. (ed.) Journal on Data Semantics XII. LNCS, vol. 5480, pp. 66–94. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  20. 20.
    Song, D., Heflin, J.: Automatically generating data linkages using a domain-independent candidate selection approach. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 649–664. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  21. 21.
    Suchanek, F.M., Abiteboul, S., Senellart, P.: Paris: Probabilistic alignment of relations, instances, and schema. The Proceedings of the VLDB Endowment (PVLDB) 5(3), 157–168 (2011)CrossRefGoogle Scholar
  22. 22.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Manuel Atencia
    • 2
    • 4
    Email author
  • Michel Chein
    • 1
    • 4
  • Madalina Croitoru
    • 1
    • 4
  • Jérôme David
    • 2
    • 4
  • Michel Leclère
    • 1
    • 4
  • Nathalie Pernelle
    • 3
  • Fatiha Saïs
    • 3
  • Francois Scharffe
    • 1
  • Danai Symeonidou
    • 3
  1. 1.LIRMMUniv. Montpellier 2Montpellier Cedex 5France
  2. 2.LIGUniv. Grenoble AlpesGrenobleFrance
  3. 3.LRIUniv. Paris SudOrsayFrance
  4. 4.InriaRennes CedexFrance

Personalised recommendations