Introducing SQL and Relational Databases

Normalization

Your browser needs to be JavaScript capable to view this video

Try reloading this page, or reviewing your browser settings

This video segment explains how to avoid anomalies by normalizing tables.

Keywords

  • anomalies
  • normal forms
  • constraints
  • keys
  • domains

About this video

Author(s)
Allen Taylor
First online
12 January 2019
DOI
https://doi.org/10.1007/978-1-4842-3841-7_9
Online ISBN
978-1-4842-3841-7
Publisher
Apress
Copyright information
© Allen Taylor 2019

Video Transcript

Speaker: Anomalies such as those covered in the previous segment can ruin the reliability of a database becoming more and more serious over time. As we’ve seen satisfying the requirements for being a relation is not sufficient to protect the database table from anomalies. Given how important this issue is, it’s been extensively studied. As various possible anomalies were discovered, additional constraints were placed on the relations, each one to prevent a specific type of anomaly. These constraints put the table in successively higher normal forms. A table that satisfies the definition of a relation is said to be in First Normal Form, abbreviated as 1NF. Over a period of years, a succession of normal forms was defined to address a succession of newly discovered sources of anomalies. These new normal forms are: Second Normal Form, Third Normal Form, Boyce-Codd Normal Form, Fourth Normal Form, and Fifth Normal Form.

That’s an odd set of names. Boyce-Codd Normal Form seems out of place. This is due to a historical accident. Relations in Second Normal Form must first be in First Normal Form. And they must satisfy and additional constraint, which is that every non-key attribute must depend on the entire key. This is automatically satisfied if the relation has a single attribute key. For a relation with a composite key, all non-key attributes must be dependent on the entire key, not just part of it. To be in Third Normal Form, a relation must be in Second Normal Form. And in addition, it must have no transitive dependencies. A transitive dependency is when A determines B, which in turn determines C. Ted Codd, the inventor of the relational model, went onto defined two additional normal forms that have even more stringent constraints: Fourth Normal Form and Fifth Normal Form. This was fine, but then working with Raymond Boyce, Codd discovered a new normal form that was a little more stringent than Third Normal Form, but not as stringent as Fourth Normal Form. Boyce-Codd Normal Form had to be stuck into the hierarchy of normal forms between third and fourth. Any relation in one normal form is automatically in the next lower normal form. So any relation in 2NF is automatically in 1NF. Any relation in 3NF is automatically in 2NF and so on. Fifth Normal Form is sufficient to protect a relation from any source of anomalies that we can think of, but that is not sufficient for database experts. There might be a problem that we have not yet thought of that creates anomalies in relations in Fifth Normal Form. After decades of trying, nobody has been able to develop a mathematical proof that relations in Fifth Normal Form guarantee that the relation is anomaly free. This led to Domain/Key Normal Form (DKNF). A relation in DKNF is proven to be anomaly free. The requirements for Domain/Key Normal Form are easy to state. A relation is in Domain/Key Normal Form if every constraint on the relation is a logical consequence of the definitions of keys and domains. A constraint is a rule that governs the static values of attributes. You must be able to tell whether the constraint is satisfied. Static values are those that do not vary with time. A key is a unique identifier of a tuple. A domain is the set of values that an attribute can assume.

Let’s look at the location table that we talked about in the previous segment. Is it in DKNF? That depends on constraints, keys and domains. It is reasonable to apply the following constraint: The location ID attribute may not contain any duplicates. The keys are the integers in the location ID field. The domains are, for a location ID, the set of integers. For a location, the set of locations where Acme has operations. For positions, the set of job categories of Acme employees. So is every constraint on the relation a logical consequence of the definitions of keys and domains? Let’s look. There’s only one constraint on the relation and it is a logical consequence of the fact that location ID is a key. Thus the condition is satisfied and the location table is in DKNF. If you can assure that any relation you create is in DKNF, you can be confident that you won’t have to worry about possible anomalies. However, as we’ve seen, assuring that a table is in DKNF is not always easy. A good practice that will keep you safe from anomalies almost all of the time is to examine the table in question and see if it deals with one and only one main idea or set of facts. If it does, in all probability, you will not have to worry about data corruption due to the structure of your database.

In this video, we’ve discussed the relational model for databases. It’s a highly theoretical topic and we’ve only scratched the surface of it. However, you now know some of the important terms, what they mean, and you have an appreciation for how the structure of a model impacts the reliability of the database derived from that model. Probably the biggest mistake you can make is to start coding a database without designing an accurate and appropriate relational model first.