Saturday, February 27, 2010

Database Normalization and Normal Forms

Database can be designed with following approaches:

1. Top-down approach

The Entity-Relationship (E/R) modelling technique is the top-down approach. It involves identifying entities, relationships and attributes, drawing E/R diagram and mapping the diagram to tables.
Generally tables derived from E/R diagram conforms to at least first normal form. This approach is the best for validation of existing designs.

2. Bottom-up approach

Normalizing the database is the bottom-up approach. Normalization is a step by step decomposition of complex tables into simple tables. It reduces redundancy using the principle of non-loss decomposition. Non-loss decomposition is a process of breaking large/complex tables into smaller/simple tables without loss of information.
Redundancy is the unnecessary repetition of data that can lead to inconsistencies in data and update anomalies.
Generally, normalization follows the process of drawing E/R diagrams.

Note: A fully normalized record consist set of attributes that describe the entity and a primary key that identifies the entity.

Functional Dependency

Normalization theory is based on the theory of functional dependency.
Functional dependency can be explained like –

Given a relation R, attribute A is functionally dependent on B if each value of A in R is associated with precisely one value of B.

OR

Given a relation R, attribute A is functionally dependent on B only if whenever two tuples of R agree on their B value, they must agree in their A value as well.

For Example –
There is an entity named CUSTOMER with following attributes: CODE, NAME, ADDRESS and PHONE_NO.

Given a particular value of CODE, there is precisely one corresponding value of NAME, ADDRESS and PHONE_NO. So, the attributes NAME, ADDRESS and PHONE_NO are functionally dependent on the attribute CODE.

Functional dependency does not require an attribute to be the key in order to functionally determine other attributes.

For Example-
In the SALES entity that has attributes: CUSTOMER_CODE, CITY, PRODUCT_CODE, and QUANTITY.

CUSTOMER_CODE functionally determines CITY because city to which the shipment is made is also where the customer is located.

Note: Functional dependency represents many-to-many relationships.

Normal Forms

Normalization results in the formation of tables that satisfied certain specified constraints, and represent certain normal forms. Normal forms are table structures with minimum redundancy.

Some of the most important and widely used normal forms are:

1.First Normal Form (1st NF)
2.Second Normal Form (2nd NF)
3.Third Normal Form (3rd NF)
4.Boyce-Codd Normal Form (BCNF)

Note: A relation that is in the first normal form may also be in second normal form or third formal form.

First Normal Form

A table is said to be in the first normal form (1 NF) when each cell of the table contains precisely one value. To put data in 1 NF, all repeating information has to be removed.

Second Normal Form

A table is said to be in 2 NF when it is in the 1 NF and every attribute in the record is functionally dependent upon the whole key, and not just part of the key.

Example- A table has following attributes: EMP_CODE, DEPT, PROJ_CODE, and HOURS.

The primary key here is composite(EMP_CODE+PROJ_CODE). But the attributes of this table depend upon only part of the primary key.

• EMP_CODE+PROJ_CODE functionally determines HOURS
• EMP_CODE functionally determines DEPT but it has no dependency on PROJ_CODE.

Solution: Break this table into 2 tables:
Table A with attributes EMP_CODE and DEPT
Table B with attributes EMP_CODE, PROJ_CODE and HOURS.

Third Normal Form

A table is said to be in 3 NF when it is in 2 NF and every non-key attribute is functionally dependent on just the primary key.

Example- A table has following attributes: EMP_CODE, DEPT and DEPT_HEAD_CODE

The primary key here is EMP_CODE. The attribute DEPT is dependent on EMP_CODE and the attribute DEPT_HEAD_CODE is dependent on DEPT. So there is an indirect dependency on primary key.

Solution: Break the table into 2 tables:
Table A with attributes EMP_CODE and DEPT
Table B with attributes DEPT and DEPT_HEAD_CODE
Now each non-key attribute depends on the key, the whole key and nothing but the key

Boyce-Codd Normal Form

A table is said to be in BCNF if every determinate is a candidate key.

The definition of 3 NF is inadequate for the tables with multiple candidate keys where they are composite keys or they are overlapped (has at least one attribute in common)

Example- Table A has following attributes: EMP_CODE, EMP_EMAIL, PROJ_CODE and HOURS
Table B, which is master for table A has following attributes: EMP_CODE, EMP_EMAIL, EMP_PHONE, EMP_FAX etc.

Observations:
• Table A has multiple candidate keys- attributes EMP_CODE and EMP_EMAIL are candidate keys.
• Table B has composite candidate keys- EMP_CODE and PROJ_CODE functionally determine HOURS.
• And EMP_EMAIL and PROJ_CODE functionally determine HOURS as well.
• Composite keys are overlapped – PROJ_CODE is common.

Solution: Break table into 2 tables:
Table B1 with attributes EMP_CODE and EMP_EMAIL
Table B2 with attributes EMP_CODE, PROJ_CODE and HOURS