Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking

  • Manuel Atencia
  • Jérôme David
  • François Scharffe
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7603)


This paper introduces a method for analyzing web datasets based on key dependencies. The classical notion of a key in relational databases is adapted to RDF datasets. In order to better deal with web data of variable quality, the definition of a pseudo-key is presented. An RDF vocabulary for representing keys is also provided. An algorithm to discover keys and pseudo-keys is described. Experimental results show that even for a big dataset such as DBpedia, the runtime of the algorithm is still reasonable. Two applications are further discussed: (i) detection of errors in RDF datasets, and (ii) datasets interlinking.


Relational Database Ontology Match Discriminability Threshold CEUR Workshop Proceeding Functional Approximate Dependency 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mannila, H., Raiha, K.-J.: Algorithms for inferring functional dependencies from relations. Data & Knowledge Engineering 12, 83–99 (1994)zbMATHCrossRefGoogle Scholar
  2. 2.
    Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Tane: An efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)zbMATHCrossRefGoogle Scholar
  3. 3.
    Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing linked datasets. In: Proceedings of the WWW 2009 Workshop on Linked Data on the Web. CEUR Workshop Proceedings, vol. 538. (2009)Google Scholar
  4. 4.
    Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the Semantic Web. Int. J. Semantic Web Inf. Syst. 7(3), 46–76 (2011)CrossRefGoogle Scholar
  5. 5.
    Euzenat, J., Shvaiko, P.: Ontology matching. Springer (2007)Google Scholar
  6. 6.
    Scharffe, F., Euzenat, J.: MeLinDa: an interlinking framework for the web of data. CoRR abs/1107.4502 (2011)Google Scholar
  7. 7.
    Symeonidou, D., Pernelle, N., Saïs, F.: KD2R: A Key Discovery Method for Semantic Reference Reconciliation. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2011 Workshop. LNCS, vol. 7046, pp. 392–401. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Song, D., Heflin, J.: Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 649–664. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  9. 9.
    Yu, Y., Li, Y., Heflin, J.: Detecting abnormal semantic web data using semantic dependency. In: Proceedings of the 5th IEEE International Conference on Semantic Computing (ICSC 2011), Palo Alto, CA, USA, September 18-21, pp. 154–157. IEEE (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Manuel Atencia
    • 1
    • 2
  • Jérôme David
    • 1
    • 3
  • François Scharffe
    • 4
  1. 1.INRIA & LIGFrance
  2. 2.Université de Grenoble 1France
  3. 3.Université de Grenoble 2France
  4. 4.Université de Montpellier 2 & LIRMMFrance

Personalised recommendations