Database Normalization as a By-product of Minimum Message Length Inference
Database normalization is a central part of database design in which we re-organise the data stored so as to progressively ensure that as few anomalies occur as possible upon insertions, deletions and/or modifications. Successive normalizations of a database to higher normal forms continue to reduce the potential for such anomalies. We show here that database normalization follows as a consequence (or special case, or by-product) of the Minimum Message Length (MML) principle of machine learning and inductive inference. In other words, someone (previously) oblivious to database normalization but well-versed in MML could examine a database and - using MML considerations alone - normalise it, and even discover the notion of attribute inheritance.
KeywordsMinimum Message Length MML Database Normalization Machine Learning Data Mining Intelligent Databases
Unable to display preview. Download preview PDF.
- 1.Codd, E.: A relational model of data for large shared data banks. Communications of the ACM (1979)Google Scholar
- 3.William, K.: A simple guide to five normal forms in relational database theory. Communications of the ACM (1983)Google Scholar
- 8.Comley, J.W., Dowe, D.L.: Minimum message length and generalized Bayesian nets with asymmetric languages. In: Grünwald, P., Pitt, M.A., Myung, I.J. (eds.) Advances in Minimum Description Length: Theory and Applications, pp. 265–294. M.I.T. Press, Cambridge (April 2005)Google Scholar
- 10.Dowe, D.L.: MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: Wood, J., Forster, M.R., Bandyopadhyay, P. (eds.) Handbook of the Philosophy of Science - (HPS Volume 7) Philosophy of Statistics, vol. 7, pp. 861–942. Elsevier, Amsterdam (2010)Google Scholar