The VLDB Journal

, Volume 25, Issue 4, pp 571–596 | Cite as

Possible and certain keys for SQL

  • Henning Köhler
  • Uwe Leck
  • Sebastian Link
  • Xiaofang Zhou
Regular Paper

Abstract

Driven by the dominance of the relational model and the requirements of modern applications, we revisit the fundamental notion of a key in relational databases with NULL. In SQL, primary key columns are NOT NULL, and UNIQUE constraints guarantee uniqueness only for tuples without NULL. We investigate the notions of possible and certain keys, which are keys that hold in some or all possible worlds that originate from an SQL table, respectively. Possible keys coincide with UNIQUE, thus providing a semantics for their syntactic definition in the SQL standard. Certain keys extend primary keys to include NULL columns and can uniquely identify entities whenever feasible, while primary keys may not. In addition to basic characterization, axiomatization, discovery, and extremal combinatorics problems, we investigate the existence and construction of Armstrong tables, and describe an indexing scheme for enforcing certain keys. Our experiments show that certain keys with NULLs occur in real-world data, and related computational problems can be solved efficiently. Certain keys are therefore semantically well founded and able to meet Codd’s entity integrity rule while handling high volumes of incomplete data from different formats.

Keywords

Armstrong database Axiomatization Data profiling Discovery Extremal combinatorics Implication problem Index Key Null marker SQL 

Supplementary material

778_2016_430_MOESM1_ESM.pdf (176 kb)
Supplementary material 1 (pdf 175 KB)

References

  1. 1.
    Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015)CrossRefGoogle Scholar
  2. 2.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)MATHGoogle Scholar
  3. 3.
    Aleksic, S., Celikovic, M., Link, S., Lukovic, I., Moginm, P.: Faceoff: Surrogate vs. natural keys. In: ADBIS, pp. 543–546 (2010)Google Scholar
  4. 4.
    Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of Armstrong relations for functional dependencies. J. ACM 31(1), 30–46 (1984)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Biskup, J.: Security in Computing Systems - Challenges, Approaches and Solutions. Springer, New York (2009)MATHGoogle Scholar
  6. 6.
    Brown, P., Link, S.: Probabilistic keys for data quality management. In: CAiSE, pp. 118–132 (2015)Google Scholar
  7. 7.
    Calì, A., Calvanese, D., Lenzerini, M.: Data integration under integrity constraints. In: Seminal Contributions to Information, Systems Engineering, pp. 335–352 (2013)Google Scholar
  8. 8.
    Calvanese, D., Fischl, W., Pichler, R., Sallinger, E., Simkus, M.: Capturing relational schemas and functional dependencies in RDFS. In: AAAI, pp. 1003–1011 (2014)Google Scholar
  9. 9.
    Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4), 397–434 (1979)CrossRefGoogle Scholar
  10. 10.
    Codd, E.F.: The Relational Model for Database Management, Version 2. Addison-Wesley, Boston (1990)MATHGoogle Scholar
  11. 11.
    David, J., Lhote, L., Mary, A., Rioult, F.: An average study of hypergraphs and their minimal transversals. Theor. Comput. Sci. 596, 124–141 (2015)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
  13. 13.
    Eiter, T., Gottlob, G.: Identifying the minimal transversals of a hypergraph and related problems. SIAM J. Comput. 24(6), 1278–1304 (1995)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Engel, K.: Sperner Theory. Cambridge University Press, Cambridge (1997)CrossRefMATHGoogle Scholar
  15. 15.
    Fagin, R.: A normal form for relational databases that is based on domains and keys. ACM Trans. Database Syst. 6(3), 387–415 (1981)CrossRefMATHGoogle Scholar
  16. 16.
    Fagin, R.: Horn clauses and database dependencies. J. ACM 29(4), 952–985 (1982)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Fan, W., Geerts, F., Jia, X.: A revival of constraints for data cleaning. PVLDB 1(2), 1522–1523 (2008)Google Scholar
  19. 19.
    Hannula, M., Kontinen, J., Link, S.: On independence atoms and keys. In: CIKM, pp. 1229–1238 (2014)Google Scholar
  20. 20.
    Hannula, M., Kontinen, J., Link, S.: On the finite and general implication problems of independence atoms and keys. J. Comput. Syst. Sci. 82(5), 856–877 (2016)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Hartmann, S., Kirchberg, M., Link, S.: Design by example for SQL table definitions with functional dependencies. VLDB J. 21(1), 121–144 (2012)CrossRefGoogle Scholar
  22. 22.
    Hartmann, S., Leck, U., Link, S.: On Codd families of keys over incomplete relations. Comput. J. 54(7), 1166–1180 (2011)CrossRefGoogle Scholar
  23. 23.
    Hartmann, S., Link, S.: Efficient reasoning about a robust XML key fragment. ACM Trans. Database Syst. 34(2), 10 (2009)CrossRefGoogle Scholar
  24. 24.
    Hartmann, S., Link, S.: The implication problem of data dependencies over SQL table definitions. ACM Trans. Database Syst. 37(2), 13 (2012)CrossRefGoogle Scholar
  25. 25.
    Heise, A., Quiane-Ruiz, J.A., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. PVLDB 7(4), 301–312 (2013)Google Scholar
  26. 26.
    Ileana, I., Cautis, B., Deutsch, A., Katsis, Y.; Complete yet practical search for minimal query reformulations under constraints. In: SIGMOD, pp. 1015–1026 (2014)Google Scholar
  27. 27.
    Imielinski, T., Jr, W.L.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)MathSciNetCrossRefMATHGoogle Scholar
  28. 28.
    Jha, A.K., Rastogi, V., Suciu, D.: Query evaluation with soft keys. In: PODS, pp. 119–128 (2008)Google Scholar
  29. 29.
    Köhler, H., Leck, U., Link, S., Prade, H.: Logical foundations of possibilistic keys. In: JELIA, pp. 181–195 (2014)Google Scholar
  30. 30.
    Köhler, H., Link, S.: Inclusion dependencies reloaded. In: CIKM, pp. 1361–1370 (2015)Google Scholar
  31. 31.
    Köhler, H., Link, S.: SQL schema design. In: SIGMOD (2016). doi:10.1145/2882903.2915239
  32. 32.
    Köhler, H., Link, S., Zhou, X.: Possible and certain SQL keys. PVLDB 8(11), 1118–1129 (2015)Google Scholar
  33. 33.
    Koutris, P., Wijsen, J.: The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In: PODS, pp. 17–29 (2015)Google Scholar
  34. 34.
    Levene, M., Loizou, G.: Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci. 206(1–2), 283–300 (1998)MathSciNetCrossRefMATHGoogle Scholar
  35. 35.
    Levene, M., Loizou, G.: A generalisation of entity and referential integrity. ITA 35(2), 113–127 (2001)MathSciNetMATHGoogle Scholar
  36. 36.
    Lien, Y.E.: On the equivalence of database models. J. ACM 29(2), 333–362 (1982)MathSciNetCrossRefMATHGoogle Scholar
  37. 37.
    Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data—a review. IEEE TKDE 24(2), 251–264 (2012)Google Scholar
  38. 38.
    Mannila, H., Räihä, K.-J.: Design of Relational Databases. Addison-Wesley, Boston (1992)MATHGoogle Scholar
  39. 39.
    Marchi, F.D., Petit, J.: Semantic sampling of existing databases through informative armstrong databases. Inf. Syst. 32(3), 446–457 (2007)CrossRefGoogle Scholar
  40. 40.
    Melton, J.: ISO/IEC 9075-2: 2003 (SQL/foundation). ISO standard (2003)Google Scholar
  41. 41.
    Memari, M., Link, S.: Index design for enforcing partial referential integrity efficiently. In: EDBT, pp. 217–228 (2015)Google Scholar
  42. 42.
    Memari, M., Link, S., Dobbie, G.: SQL data profiling of foreign keys. In: ER, pp. 229–243 (2015)Google Scholar
  43. 43.
    Naumann, F.: Data profiling revisited. SIGMOD Rec. 42(4), 40–49 (2013)CrossRefGoogle Scholar
  44. 44.
  45. 45.
    Pochampally, R., Sarma, A.D., Dong, X.L.: A. Meliou, and D. Srivastava. Fusing data with correlations. In: SIGMOD, pp. 433–444 (2014)Google Scholar
  46. 46.
    Ross, K. A., Srivastava, D., Sudarshan, S.: Materialized view maintenance and integrity constraint checking: Trading space for time. In: SIGMOD, pp. 447–458 (1996)Google Scholar
  47. 47.
    Saha, B., Srivastava, D.: Data quality: the other face of big data. In: ICDE, pp. 1294–1297 (2014)Google Scholar
  48. 48.
    Schaefer, T.J.: The complexity of satisfiability problems. In: STOC, pp. 216–226 (1978)Google Scholar
  49. 49.
    Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: Efficient and scalable discovery of composite keys. In: VLDB, pp. 691–702 (2006)Google Scholar
  50. 50.
    Sperner, E.: Ein Satz über Untermengen einer endlichen Menge. Math. Z. 27, 544–548 (1928)MathSciNetCrossRefMATHGoogle Scholar
  51. 51.
    Thalheim, B.: On semantic issues connected with keys in relational databases permitting null values. Elektr. Inform. Kybern. 25(1/2), 11–20 (1989)MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Henning Köhler
    • 1
  • Uwe Leck
    • 2
  • Sebastian Link
    • 3
  • Xiaofang Zhou
    • 4
    • 5
  1. 1.School of Engineering and Advanced TechnologyMassey UniversityPalmerston NorthNew Zealand
  2. 2.Department of MathematicsThe University of FlensburgFlensburgGermany
  3. 3.Department of Computer ScienceThe University of AucklandAucklandNew Zealand
  4. 4.The University of QueenslandBrisbaneAustralia
  5. 5.Soochow UniversitySuzhouChina

Personalised recommendations