Skip to main content

Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7603))

Abstract

This paper introduces a method for analyzing web datasets based on key dependencies. The classical notion of a key in relational databases is adapted to RDF datasets. In order to better deal with web data of variable quality, the definition of a pseudo-key is presented. An RDF vocabulary for representing keys is also provided. An algorithm to discover keys and pseudo-keys is described. Experimental results show that even for a big dataset such as DBpedia, the runtime of the algorithm is still reasonable. Two applications are further discussed: (i) detection of errors in RDF datasets, and (ii) datasets interlinking.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mannila, H., Raiha, K.-J.: Algorithms for inferring functional dependencies from relations. Data & Knowledge Engineering 12, 83–99 (1994)

    Article  MATH  Google Scholar 

  2. Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Tane: An efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)

    Article  MATH  Google Scholar 

  3. Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing linked datasets. In: Proceedings of the WWW 2009 Workshop on Linked Data on the Web. CEUR Workshop Proceedings, vol. 538. CEUR-WS.org (2009)

    Google Scholar 

  4. Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the Semantic Web. Int. J. Semantic Web Inf. Syst. 7(3), 46–76 (2011)

    Article  Google Scholar 

  5. Euzenat, J., Shvaiko, P.: Ontology matching. Springer (2007)

    Google Scholar 

  6. Scharffe, F., Euzenat, J.: MeLinDa: an interlinking framework for the web of data. CoRR abs/1107.4502 (2011)

    Google Scholar 

  7. Symeonidou, D., Pernelle, N., Saïs, F.: KD2R: A Key Discovery Method for Semantic Reference Reconciliation. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2011 Workshop. LNCS, vol. 7046, pp. 392–401. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Song, D., Heflin, J.: Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 649–664. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Yu, Y., Li, Y., Heflin, J.: Detecting abnormal semantic web data using semantic dependency. In: Proceedings of the 5th IEEE International Conference on Semantic Computing (ICSC 2011), Palo Alto, CA, USA, September 18-21, pp. 154–157. IEEE (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Atencia, M., David, J., Scharffe, F. (2012). Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking. In: ten Teije, A., et al. Knowledge Engineering and Knowledge Management. EKAW 2012. Lecture Notes in Computer Science(), vol 7603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33876-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33876-2_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33875-5

  • Online ISBN: 978-3-642-33876-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics