The average length of keys and functional dependencies in (random) databases

  • J. Demetrovics
  • G. O. H. Katona
  • D. Miklos
  • O. Seleznjev
  • B. Thalheim
Contributed Papers Probabilistic Methods
Part of the Lecture Notes in Computer Science book series (LNCS, volume 893)

Abstract

Practical database applications engender the impression that sets of constraints are rather small and that large sets are unusual and caused by bad design decisions. Theoretical investigations show, however, that minimal constraint sets are potentially very large. Their size can be estimated to be exponential in terms of the number of attributes. The gap between belief and theory causes non-acceptance of theoretical results. However, beliefs are related to average cases.

The theory known so far considered worst case complexity. This paper aims at developing a theory of average case complexity. Several statistic models and asymptotics of corresponding probabilities are investigated for random databases. We show that exponential complexity of independent key sets and independent sets of functional dependencies is rather unusual. Depending on the size of relations almost all minimal keys have a length which mainly depends on the size. The number of minimal keys of other length is exponentially small compared with the number of minimal keys of the derived length. Further, if a key is valid in a relation then it is probably the minimal key. The same results hold for functional dependencies.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahlswede, R., Wegener, I. (1979). Suchprobleme, Teubner B.G., Stuttgart.Google Scholar
  2. 2.
    Albrecht M., Altus M., Buchholz B., Düsterhöft A., Schewe K.-D., Thalheim B. (1994), Die intelligente Tool Box zum Datenbankentwurf RAD. Datenbank-Rundbrief, 13, FG 2.5. der GI, Kassel.Google Scholar
  3. 3.
    Andreev, A. (1982), Tests and pattern recognition. PhD thesis, Moscov State University, 1982.Google Scholar
  4. 4.
    Armstrong, W.W. (1974). Depending structures of database relationships. Information Processing-74, North Holland, Amsterdam, 580–583.Google Scholar
  5. 5.
    Beeri C., Dowd M., Fagin R., Statman R. (1984), On the structure of Armstrong relations for functional dependencies. Journal of ACM, Vol.31, No.1, 30–46.Google Scholar
  6. 6.
    Bekessy A., Demetrovics J., Hannak L., Frankl P., Katona G. (1980), On the number of maximal dependencies in a database relation of fixed order. Discrete Math., 30, 83–88.Google Scholar
  7. 7.
    Bender, E.A. (1974) Asymptotic methods in enumeration. SIAM Review, 16, 4, 485–515.Google Scholar
  8. 8.
    Billingsley, P. (1975) Convergence of Probability Measures, Wiley, N.Y.Google Scholar
  9. 9.
    Codd E.F. (1970), A relational model for large shared data banks. Comm. ACM 13, 6, p. 197–204.Google Scholar
  10. 10.
    Demetrovics J. (1979), On the Equivalence of Candidate Keys with Sperner sets. Acta Cybernetica, Vol. 4, No. 3, Szeged, 247–252.Google Scholar
  11. 11.
    Demetrovics J., Katona G.O.H. (1983), Combinatorial problems of database models. Colloquia Mathematica Societatis Janos Bolyai 42, Algebra, Combinatorics and Logic in Computer Science, Gyor (Hungary), 331–352.Google Scholar
  12. 12.
    Demetrovics J., Katona G.O.H., and Miklos (1994), Functional Dependencies in Random Relational Databases. Manuscript, Budapest.Google Scholar
  13. 13.
    Demetrovics J., Libkin L.O., and Muchnik I.B. (1989), Functional dependencies and the semilattice of closed classes. Proc. MFDBS-89, LNCS 364, 136–147.Google Scholar
  14. 14.
    Feller, W. (1968) An Introduction to Probability Theory and its Applications, Wiley, N.Y.Google Scholar
  15. 15.
    Gottlob G. (1987), On the size of nonredundant FD-covers. Information Processing Letters, 24, 6, 355–360.MathSciNetGoogle Scholar
  16. 16.
    Mannila H., Räihä K.-J. (1982), On the relationship between minimum and optimum covers for a set of functional dependencies. Res. Rep. C-1982-51, University of Helsinki.Google Scholar
  17. 17.
    Mannila H., Räihä K.-J. (1992), The design of relational databases. Addison-Wesley, Amsterdam.Google Scholar
  18. 18.
    Sachkov, V.N. (1982). An Introduction to Combinatorics Methods of Discrete Mathemathics, Moscow, Nauka.Google Scholar
  19. 19.
    Seleznjev O., Thalheim B. (1988), On the number of minimal keys in relational databases over nonuniform domains. Acta Cybernetica, Szeged, 8, 3, 267–271.Google Scholar
  20. 20.
    Seleznjev O., Thalheim B. (1994), Probability Problems in Database Theory. Preprint 1-3/1994, Cottbus Technical University.Google Scholar
  21. 21.
    Thalheim B. (1987), On the number of keys in relational databases. Proc. FCT-87-Conf., Kazan, LNCS 1987.Google Scholar
  22. 22.
    Thalheim B. (1989), On Semantic Issues Connected with Keys in Relational Databases Permitting Null Values. Journal Information Processing and Cybernetics, EIK, 25, 1/2, 11–20.Google Scholar
  23. 23.
    Thalheim B. (1991), Dependencies in Relational Databases. Leipzig, Teubner Verlag.Google Scholar
  24. 24.
    Thalheim B. (1992), On the number of keys in relational and nested relational databases. Discrete Applied Mathematics, 38.Google Scholar
  25. 25.
    Tou, J., Gonsales, R. (1974). Pattern Recognition Principles, Add.-Wesley, London.Google Scholar

Copyright information

© Springer-Verlag 1995

Authors and Affiliations

  • J. Demetrovics
    • 1
  • G. O. H. Katona
    • 2
  • D. Miklos
    • 2
  • O. Seleznjev
    • 3
  • B. Thalheim
    • 4
  1. 1.Comp. & Autom. Inst.Hungarian AcademyBudapest
  2. 2.Mathematical Inst.Hungarian AcademyBudapest
  3. 3.Dept. of Mathematics and MechanicsMoscow State UniversityMoscow
  4. 4.Computer Science Inst.Cottbus Technical UniversityCottbus

Personalised recommendations