Skip to main content

Advertisement

SpringerLink
  • Log in
Book cover

International Semantic Web Conference

ISWC 2014: The Semantic Web – ISWC 2014 pp 33–49Cite as

  1. Home
  2. The Semantic Web – ISWC 2014
  3. Conference paper
SAKey: Scalable Almost Key Discovery in RDF Data

SAKey: Scalable Almost Key Discovery in RDF Data

  • Danai Symeonidou24,
  • Vincent Armant25,
  • Nathalie Pernelle24 &
  • …
  • Fatiha Saïs24 
  • Conference paper
  • 3309 Accesses

  • 26 Citations

Part of the Lecture Notes in Computer Science book series (LNISA,volume 8796)

Abstract

Exploiting identity links among RDF resources allows applications to efficiently integrate data. Keys can be very useful to discover these identity links. A set of properties is considered as a key when its values uniquely identify resources. However, these keys are usually not available. The approaches that attempt to automatically discover keys can easily be overwhelmed by the size of the data and require clean data. We present SAKey, an approach that discovers keys in RDF data in an efficient way. To prune the search space, SAKey exploits characteristics of the data that are dynamically detected during the process. Furthermore, our approach can discover keys in datasets where erroneous data or duplicates exist (i.e., almost keys). The approach has been evaluated on different synthetic and real datasets. The results show both the relevance of almost keys and the efficiency of discovering them.

Keywords

  • Keys
  • Identity Links
  • Data Linking
  • RDF
  • OWL2

Download conference paper PDF

References

  1. Arasu, A., Ré, C., Suciu, D.: Large-scale deduplication with constraints using dedupalog. In: ICDE, pp. 952–963 (2009)

    Google Scholar 

  2. Atencia, M., Chein, M., Croitoru, M., Jerome David, M.L., Pernelle, N., Saïs, F., Scharffe, F., Symeonidou, D.: Defining key semantics for the rdf datasets: Experiments and evaluations. In: ICCS (2014)

    Google Scholar 

  3. Atencia, M., David, J., Scharffe, F.: Keys and pseudo-keys detection for web datasets cleansing and interlinking. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 144–153. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  4. Dechter, R.: Constraint Processing. Morgan Kaufmann Publishers Inc., San Francisco (2003)

    Google Scholar 

  5. Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Int. J. Semantic Web Inf. Syst. 7(3), 46–76 (2011)

    CrossRef  Google Scholar 

  6. Heise, A., Jorge-Arnulfo, Q.-R., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. VLDB 7(4), 301–312 (2013)

    Google Scholar 

  7. Hu, W., Chen, J., Qu, Y.: A self-training approach for resolving object coreference on the semantic web. In: WWW, pp. 87–96 (2011)

    Google Scholar 

  8. Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal 42(2), 100–111 (1999)

    CrossRef  MATH  Google Scholar 

  9. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations. The IBM Research Symposia Series, pp. 85–103 (1972)

    Google Scholar 

  10. Nikolov, A., Motta, E.: Data linking: Capturing and utilising implicit schema-level relations. In: Proceedings of Linked Data on the Web workshop at WWW (2010)

    Google Scholar 

  11. Pernelle, N., Saïs, F., Symeonidou, D.: An automatic key discovery approach for data linking. J. Web Sem. 23, 16–30 (2013)

    CrossRef  Google Scholar 

  12. Recommendation, W.: Owl2 web ontology language: Direct semantics. In: Motik, B., Patel-Schneider, P.F., Grau, B.C. (eds.), W3C (October 27, 2009), http://www.w3.org/TR/owl2-direct-semantics

  13. Saïs, F., Pernelle, N., Rousset, M.C.: Combining a logical and a numerical method for data reconciliation. Journal on Data Semantics 12, 66–94 (2009)

    CrossRef  Google Scholar 

  14. Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: Gordian: efficient and scalable discovery of composite keys. In: VLDB, pp. 691–702 (2006)

    Google Scholar 

  15. Song, D., Heflin, J.: Automatically generating data linkages using a domain-independent candidate selection approach. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 649–664. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  16. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  17. Wang, D.Z., Dong, X.L., Sarma, A.D., Franklin, M.J., Halevy, A.Y.: Functional dependency generation and applications in pay-as-you-go data integration systems. In: WebDB (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Laboratoire de Recherche en Informatique, University Paris Sud, France

    Danai Symeonidou, Nathalie Pernelle & Fatiha Saïs

  2. Insight Center for Data Analytics, University College Cork, Ireland

    Vincent Armant

Authors
  1. Danai Symeonidou
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Vincent Armant
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Nathalie Pernelle
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Fatiha Saïs
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Yahoo Labs, Diagonal 177, 08018, Barcelona, Spain

    Peter Mika

  2. Stanford University, 1265 Welch Road, 94305, Stanford, CA, USA

    Tania Tudorache

  3. University of Zurich, DDIS, Zurich, Switzerland

    Abraham Bernstein

  4. IBM Research, Yorktown Heights, NY, USA

    Chris Welty

  5. Information Sciences Institute and Department of Computer Science, University of Southern California, Los Angeles, CA, USA

    Craig Knoblock

  6. Google, USA

    Denny Vrandečić & Natasha Noy & 

  7. VU University Amsterdam, The Netherlands

    Paul Groth

  8. University of California, Santa Barbara, CA, USA

    Krzysztof Janowicz

  9. School of Computer Science, The University of Manchester, Manchester, UK

    Carole Goble

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Symeonidou, D., Armant, V., Pernelle, N., Saïs, F. (2014). SAKey: Scalable Almost Key Discovery in RDF Data. In: , et al. The Semantic Web – ISWC 2014. ISWC 2014. Lecture Notes in Computer Science, vol 8796. Springer, Cham. https://doi.org/10.1007/978-3-319-11964-9_3

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-319-11964-9_3

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11963-2

  • Online ISBN: 978-3-319-11964-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • California Privacy Statement
  • How we use cookies
  • Manage cookies/Do not sell my data
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not logged in - 34.238.189.240

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.