Discovery of Keys from SQL Tables

  • Van Bao Tran Le
  • Sebastian Link
  • Mozhgan Memari
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7238)


Keys play a fundamental role in all data models. They allow database systems to uniquely identify data items, and therefore promote efficient data processing in most applications. Due to this role support is required to discover keys. These include keys that are semantically meaningful for the application domain, or are satisfied by a given database instance. Here, we study the discovery of keys from SQL tables. We investigate structural and computational properties of Armstrong tables for sets of SQL keys that are currently perceived as semantically meaningful. Inspections of Armstrong tables enable data engineers to consolidate their understanding of the semantics of the application domain, and communicate this understanding to other stake-holders of the database, e.g. domain experts or managers. The stake-holders may want to make changes to the tables or provide entirely different tables in order to communicate their expert views to the data engineers. For such purpose we propose data mining algorithms that discover keys from a given SQL table. Finally, we define formal measures to assess the distance between sets of SQL keys. The measures can be applied to empirically validate the usefulness of Armstrong tables, and to automate marking and feedback of non-multiple choice questions in database courses.


Domain Expert Uniqueness Constraint Data Engineer Query Optimization Computational Property 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)Google Scholar
  2. 2.
    Atzeni, P., Morfuni, N.: Functional dependencies and constraints on null values in database relations. Information and Control 70(1), 1–31 (1986)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of Armstrong relations for functional dependencies. J. ACM 31(1), 30–46 (1984)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    CA Technologies. ERwin Data Modeler - methods guide, p. 86 (2011),
  5. 5.
    Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)zbMATHCrossRefGoogle Scholar
  6. 6.
    De Marchi, F., Petit, J.-M.: Semantic sampling of existing databases through informative Armstrong databases. Inf. Syst. 32(3), 446–457 (2007)CrossRefGoogle Scholar
  7. 7.
    Demetrovics, J.: On the equivalence of candidate keys with Sperner systems. Acta Cybern. 4, 247–252 (1980)MathSciNetGoogle Scholar
  8. 8.
    Eiter, T., Gottlob, G.: Identifying the minimal transversals of a hypergraph and related problems. SIAM J. Comput. 24(6), 1278–1304 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Fagin, R.: Armstrong databases. Technical Report RJ3440(40926), IBM Research Laboratory, San Jose, California, USA (1982)Google Scholar
  10. 10.
    Hartmann, S., Kirchberg, M., Link, S.: Design by example for SQL table definitions with functional dependencies. The VLDB Journal (2011), doi:10.1007/s00778-011-0239-5Google Scholar
  11. 11.
    Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal 42(2), 100–111 (1999)zbMATHCrossRefGoogle Scholar
  12. 12.
    Imielinski, T., Lipski Jr., W.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Langeveldt, W.-D., Link, S.: Empirical evidence for the usefulness of Armstrong relations in the acquisition of meaningful functional dependencies. Inf. Syst. 35(3), 352–374 (2010)CrossRefGoogle Scholar
  14. 14.
    Mannila, H., Räihä, K.-J.: Design by example: An application of Armstrong relations. J. Comput. Syst. Sci. 33(2), 126–141 (1986)zbMATHCrossRefGoogle Scholar
  15. 15.
    Mannila, H., Räihä, K.-J.: Algorithms for inferring functional dependencies from relations. Data Knowl. Eng. 12(1), 83–99 (1994)zbMATHCrossRefGoogle Scholar
  16. 16.
    Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: Efficient and scalable discovery of composite keys. In: VLDB, pp. 691–702 (2006)Google Scholar
  17. 17.
    Zaniolo, C.: Database relations with null values. J. Comput. Syst. Sci. 28(1), 142–166 (1984)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Van Bao Tran Le
    • 1
  • Sebastian Link
    • 2
  • Mozhgan Memari
    • 1
  1. 1.School of Information ManagementVictoria University of WellingtonNew Zealand
  2. 2.Department of Computer ScienceUniversity of AucklandNew Zealand

Personalised recommendations