Skip to content

Database Normalization

Database normalization is the process of structuring a database, usually a relational database, in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity.

Definitions

  • Candidate key (CK): any set of columns that have a unique combination of values in each row satisfying that removing any column would possibly produce duplicate rows.
  • Primary key (PK): specific choice of a minimal set of attributes (columns) that uniquely specify a tuple (row). "Which attributes identify a record."

Normal forms

Note: Primary Keys in bold.

UNF

  • Unique PK.

1NF

Each column of a table must have a single value.

Example

Convert this:

Customer Customer phone Product
Bob 00000000 [apple, orange]
Alice 11111111 orange

Into this:

Customer Customer phone Product
Bob 00000000 apple
Alice 11111111 orange
Bob 00000000 orange

2NF

Each column value should only depend on the candidate key.

Example

Convert this:

Product name Box weight Quality Price
Apple 1 kg 1 2
Apple 1 kg 2 1.5
Orange 5 kg 1 4
Orange 5 kg 2 2

To this:

Product name Box weight
Apple 1 kg
Orange 5 kg
Product name Quality Price
Apple 1 2
Apple 2 1.5
Orange 1 4
Orange 2 2

In the original table, the Box weight depended only on the candidate key Product name, but the price depended also on the quality. That's why it should be separated into two tables to avoid duplication.

3NF

There should be no transitive functional dependencies. If a column can be determined from another (functional dependency) the dependent column should be in another table.

Example

Convert this:

Customer Customer phone Product Price
Bob 00000000 apple 1
Alice 11111111 orange 2
Bob 00000000 orange 2

To this:

Product ID Name Price
1 apple 1
2 orange 2
Customer ID Name Phone
1 Bob 1
2 Alice 2
Customer ID Product ID
1 1
2 2
1 2