Database Normalization as a By-product of Minimum Message Length Inference

  • David L. Dowe
  • Nayyar Abbas Zaidi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6464)

Abstract

Database normalization is a central part of database design in which we re-organise the data stored so as to progressively ensure that as few anomalies occur as possible upon insertions, deletions and/or modifications. Successive normalizations of a database to higher normal forms continue to reduce the potential for such anomalies. We show here that database normalization follows as a consequence (or special case, or by-product) of the Minimum Message Length (MML) principle of machine learning and inductive inference. In other words, someone (previously) oblivious to database normalization but well-versed in MML could examine a database and - using MML considerations alone - normalise it, and even discover the notion of attribute inheritance.

Keywords

Minimum Message Length MML Database Normalization Machine Learning Data Mining Intelligent Databases 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Codd, E.: A relational model of data for large shared data banks. Communications of the ACM (1979)Google Scholar
  2. 2.
    Date, C.: An Introduction to Database Systems. Addison-Wesley Longman, Amsterdam (1999)MATHGoogle Scholar
  3. 3.
    William, K.: A simple guide to five normal forms in relational database theory. Communications of the ACM (1983)Google Scholar
  4. 4.
    Wallace, C.S., Boulton, D.M.: An information measure for classification. Computer Journal 11, 185–194 (1968)CrossRefMATHGoogle Scholar
  5. 5.
    Wallace, C.S., Freeman, P.R.: Estimation and inference by compact coding. Journal of the Royal Statistical Society series B 49(3), 240–252 (1987)MathSciNetMATHGoogle Scholar
  6. 6.
    Wallace, C.S., Dowe, D.L.: Minimum message length and Kolmogorov complexity. Computer Journal 42(4), 270–283 (1999)CrossRefMATHGoogle Scholar
  7. 7.
    Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Information Science and Statistics. Springer, Heidelberg (May 2005) MATHGoogle Scholar
  8. 8.
    Comley, J.W., Dowe, D.L.: Minimum message length and generalized Bayesian nets with asymmetric languages. In: Grünwald, P., Pitt, M.A., Myung, I.J. (eds.) Advances in Minimum Description Length: Theory and Applications, pp. 265–294. M.I.T. Press, Cambridge (April 2005)Google Scholar
  9. 9.
    Dowe, D.L.: Foreword re C. S. Wallace. Computer Journal 51(5), 523–560 (2008); Christopher Stewart WALLACE (1933-2004) memorial special issueCrossRefGoogle Scholar
  10. 10.
    Dowe, D.L.: MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: Wood, J., Forster, M.R., Bandyopadhyay, P. (eds.) Handbook of the Philosophy of Science - (HPS Volume 7) Philosophy of Statistics, vol. 7, pp. 861–942. Elsevier, Amsterdam (2010)Google Scholar
  11. 11.
    Wallace, C.S., Dowe, D.L.: MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing 10, 73–83 (2000)CrossRefGoogle Scholar
  12. 12.
    Dowe, D.L.: Minimum Message Length and statistically consistent invariant (objective?) Bayesian probabilistic inference - from (medical) “evidence”. Social Epistemology 22(4), 433–460 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • David L. Dowe
    • 1
  • Nayyar Abbas Zaidi
    • 1
  1. 1.Clayton School of I.T.Monash UniversityClaytonAustralia

Personalised recommendations