Defining Key Semantics for the RDF Datasets: Experiments and Evaluations

Atencia, Manuel; Chein, Michel; Croitoru, Madalina; David, Jérôme; Leclère, Michel; Pernelle, Nathalie; Saïs, Fatiha; Scharffe, Francois; Symeonidou, Danai

doi:10.1007/978-3-319-08389-6_7

Defining Key Semantics for the RDF Datasets: Experiments and Evaluations

Manuel Atencia^8,10,
Michel Chein^7,10,
Madalina Croitoru^7,10,
Jérôme David^8,10,
Michel Leclère^7,10,
Nathalie Pernelle⁹,
Fatiha Saïs⁹,
Francois Scharffe⁷ &
…
Danai Symeonidou⁹

Conference paper

1047 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8577))

Abstract

Many techniques were recently proposed to automate the linkage of RDF datasets. Predicate selection is the step of the linkage process that consists in selecting the smallest set of relevant predicates needed to enable instance comparison. We call keys this set of predicates that is analogous to the notion of keys in relational databases. We explain formally the different assumptions behind two existing key semantics. We then evaluate experimentally the keys by studying how discovered keys could help dataset interlinking or cleaning. We discuss the experimental results and show that the two different semantics lead to comparable results on the studied datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arasu, A., Ré, C., Suciu, D.: Large-scale deduplication with constraints using dedupalog. In: ICDE, pp. 952–963 (2009)
Google Scholar
Atencia, M., David, J., Scharffe, F.: Keys and pseudo-keys detection for web datasets cleansing and interlinking. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS (LNAI), vol. 7603, pp. 144–153. Springer, Heidelberg (2012)
Chapter Google Scholar
Baxter, R., Christen, P., Churches, T.: A comparison of fast blocking methods for record linkage. In: KDD 2003 Workshops, pp. 25–27 (2003)
Google Scholar
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19, 1–16 (2007)
Article Google Scholar
Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Int. J. Semantic Web Inf. Syst. 7(3), 46–76 (2011)
Article Google Scholar
Hu, W., Chen, J., Qu, Y.: A self-training approach for resolving object coreference on the semantic web. In: WWW, pp. 87–96 (2011)
Google Scholar
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal 42(2), 100–111 (1999)
Article MATH Google Scholar
Isele, R., Bizer, C.: Learning expressive linkage rules using genetic programming. PVLDB 5(11), 1638–1649 (2012)
Google Scholar
Isele, R., Jentzsch, A., Bizer, C.: Efficient multidimensional blocking for link discovery without losing recall. In: Proceedings of the 14th International Workshop on the Web and Databases (WebDB), Greece (2011)
Google Scholar
Michelson, M., Knoblock, C.A.: Learning blocking schemes for record linkage. In: AAAI, pp. 440–445 (2006)
Google Scholar
Ngonga Ngomo, A.-C., Lyko, K.: EAGLE: Efficient active learning of link specifications using genetic programming. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 149–163. Springer, Heidelberg (2012)
Google Scholar
Nikolov, A., d’Aquin, M., Motta, E.: Unsupervised learning of link discovery configuration. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 119–133. Springer, Heidelberg (2012)
Google Scholar
Nikolov, A., Motta, E.: Data linking: Capturing and utilising implicit schema-level relations. In: Proceedings of Linked Data on the Web Workshop at 19th International World Wide Web Conference (WWW 2010) (2010)
Google Scholar
Patel-Schneider, P.F., Hayes, P., Horrocks, I.: OWL Web Ontology Language Semantics and Abstract Syntax Section 5. RDF-Compatible Model-Theoretic Semantics. Technical report, W3C (December 2004)
Google Scholar
Pernelle, N., Sais, F., Symeonidou, D.: An automatic key discovery approach for data linking. Web Semantics: Science, Services and Agents on the World Wide Web (2013)
Google Scholar
W. Recommendation. Owl 2 web ontology language: Direct semantics. In: Motik, B., Patel-Schneider, P.F., Cuenca Grau, B. (eds.) W3C (October 27, 2009), http://www.w3.org/TR/owl2-direct-semantics/
W. Recommendation. Owl 2 web ontology language: Structural specification and functional-style syntax. In: Motik, B., Patel-Schneider, P.F., Parsia, B. (eds.) W3C (October 27, 2009), http://www.w3.org/TR/owl2-syntax/
Saïs, F., Pernelle, N., Rousset, M.-C.: L2r: A logical method for reference reconciliation. In: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, Vancouver, British Columbia, Canada, pp. 329–334 (2007)
Google Scholar
Saïs, F., Pernelle, N., Rousset, M.-C.: Combining a logical and a numerical method for data reconciliation. In: Spaccapietra, S. (ed.) Journal on Data Semantics XII. LNCS, vol. 5480, pp. 66–94. Springer, Heidelberg (2009)
Chapter Google Scholar
Song, D., Heflin, J.: Automatically generating data linkages using a domain-independent candidate selection approach. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 649–664. Springer, Heidelberg (2011)
Chapter Google Scholar
Suchanek, F.M., Abiteboul, S., Senellart, P.: Paris: Probabilistic alignment of relations, instances, and schema. The Proceedings of the VLDB Endowment (PVLDB) 5(3), 157–168 (2011)
Article Google Scholar
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

LIRMM, Univ. Montpellier 2, Montpellier Cedex 5, France
Michel Chein, Madalina Croitoru, Michel Leclère & Francois Scharffe
LIG, Univ. Grenoble Alpes, Grenoble, France
Manuel Atencia & Jérôme David
LRI, Univ. Paris Sud, Orsay, France
Nathalie Pernelle, Fatiha Saïs & Danai Symeonidou
Inria, Rennes Cedex, France
Manuel Atencia, Michel Chein, Madalina Croitoru, Jérôme David & Michel Leclère

Authors

Manuel Atencia
View author publications
You can also search for this author in PubMed Google Scholar
Michel Chein
View author publications
You can also search for this author in PubMed Google Scholar
Madalina Croitoru
View author publications
You can also search for this author in PubMed Google Scholar
Jérôme David
View author publications
You can also search for this author in PubMed Google Scholar
Michel Leclère
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Pernelle
View author publications
You can also search for this author in PubMed Google Scholar
Fatiha Saïs
View author publications
You can also search for this author in PubMed Google Scholar
Francois Scharffe
View author publications
You can also search for this author in PubMed Google Scholar
Danai Symeonidou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel Atencia .

Editor information

Editors and Affiliations

Université Toulouse le Mirail, Toulouse, France
Nathalie Hernandez
L3S Research Center, Hannover, Germany
Robert Jäschke
LIRMM, Montpellier, France
Madalina Croitoru

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Atencia, M. et al. (2014). Defining Key Semantics for the RDF Datasets: Experiments and Evaluations. In: Hernandez, N., Jäschke, R., Croitoru, M. (eds) Graph-Based Representation and Reasoning. ICCS 2014. Lecture Notes in Computer Science(), vol 8577. Springer, Cham. https://doi.org/10.1007/978-3-319-08389-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-08389-6_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08388-9
Online ISBN: 978-3-319-08389-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics