DbSchema | SQL Server Database Normalization Techniques
SQL Server: How to Apply Database Normalization Techniques in sqlcmd and DbSchema
Table of Contents
- Introduction
- Prerequisites
- What is Database Normalization?
- Purpose of Using Database Normalization
- Restrictions and Permissions for Database Normalization
- Advantages and Limitations of Database Normalization
- Principles of Database Normalization
- Understanding the Normal Forms
- Applying Normal Forms in sqlcmd and DbSchema
- Conclusion
- References
Introduction
With the exponential growth of data, organizing it in a meaningful and efficient manner has become extremely important. This is where Database Normalization comes into play, an important concept for any database developer. This article provides a comprehensive guide to understanding and applying Database Normalization techniques
in SQL Server using tools like sqlcmd and DbSchema. We will take a deep dive into different Normal Forms and also discuss how to use them in real-world scenarios.
Prerequisites
- Basic understanding of SQL Server
- Familiarity with SQL commands and syntax
- Installed instances of SQL Server and DbSchema
- Basic understanding of Database Management Systems
For installation and establishing connection you can read our article SQL Server-How to create a database?
What is Database Normalization?
Database Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts data into a tabular form, removing duplicated data from the relation tables. The end result is a set of tables that are well-structured and reliable for the long run.
Purpose of Using Database Normalization
The primary purpose of Database Normalization is to:
- Eliminate redundant data
- Ensure data dependencies make sense
- Improve data consistency
- Protect the integrity of databases
Restrictions and Permissions for Database Normalization
To perform database normalization in SQL Server, you need certain permissions. These include:
- The CREATE TABLE permission in the database
- The ALTER permission on the schema
However, there are restrictions such as:
- Tables must not contain sparse columns
- Tables should be on the same filegroup
Advantages and Limitations of Database Normalization
Advantages
- Reduces data redundancy and improves data integrity
- Makes database schema flexible
- Facilitates efficient data retrieval with quicker queries
Limitations
- Over-normalization can lead to excess complexity and impact performance
- It might require more joins, which can be costly in terms of resources
Principles of Database Normalization
Normalization involves arranging attributes in relations based on dependencies between attributes. The key principles of normalization include:
- Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. This refers to the concept of functional dependency.
- The values in an attribute should be atomic.
- There should be no redundant data.
Understanding the Normal Forms
First Normal Form (1NF)
This form imposes the following rules:
- Each table should have atomic (indivisible) values.
- Each record needs to be unique.
Second Normal Form (2NF)
- All requirements for 1st NF must be met.
- Redundant data across multiple rows of a table should be moved to a separate table.
Third Normal Form (3NF)
- All requirements for 2nd NF must be met.
- There should be no transitive dependencies for non-prime attributes.
Boyce-Codd Normal Form (BCNF or 4NF)
- All requirements for 3rd NF must be met.
- For any dependency A → B, A should be a super key.
Fifth Normal Form (5NF)
- All requirements for 4th NF must be met.
- There should be no possibility of generating more information than contained in the concatenated keys and the related columns.
Sixth Normal Form (6NF)
It is used for temporal databases like time-series databases.
Applying Normal Forms in sqlcmd and DbSchema
For this tutorial, let’s consider a sample database Bookstore. For the sake of simplicity, we will use a single Books table, and we will normalize it step-by-step.
1NF
Firstly, we must ensure that each cell of our table contains atomic (indivisible) values. For example, in our Books
table, we should separate the Authors column if a book can have multiple authors.
In sqlcmd:
To do this in sqlcmd, you would run:
CREATE TABLE Authors(
AuthorId INT PRIMARY KEY,
Name NVARCHAR(100)
);
CREATE TABLE Books(
BookId INT PRIMARY KEY,
Title NVARCHAR(100),
AuthorId INT FOREIGN KEY REFERENCES Authors(AuthorId)
);
To know more about creating a table you can read our article SQL Server-How to create a table?
In DbSchema:
In DbSchema, you would follow a similar process, but with a visual interface. Right-click in the layout area, select Create new table, then define the columns as above.
2NF
In this stage, we want to remove data redundancy by moving the data to separate tables and establish a relationship between them using foreign keys.
In sqlcmd:
CREATE TABLE Publishers(
PublisherId INT PRIMARY KEY,
Name NVARCHAR(100)
);
CREATE TABLE Books(
BookId INT PRIMARY KEY,
Title NVARCHAR(100),
AuthorId INT FOREIGN KEY REFERENCES Authors(AuthorId),
PublisherId INT FOREIGN KEY REFERENCES Publishers(PublisherId)
);
In DbSchema:
You would follow similar steps in DbSchema to create the Publishers table and define the relationship with Books.
3NF
In the 3rd NF, we remove transitive dependencies. Assume there’s a column PublisherAddress in the Books table, which depends on PublisherId, not BookId. We remove it from Books and move to Publishers.
Boyce-Codd Normal Form (BCNF)
BCNF addresses a scenario where a table has more than one candidate key and there are dependencies between those candidate keys. This can lead to anomalies in the database.
For instance, let’s assume there is a table ClassRoom which contains columns ClassRoomID, Subject, Teacher. Both ClassRoomID and Subject could be considered candidate keys for this table, because a classroom will have only one subject and a subject is taught by one teacher only.
This relation violates BCNF rules because Teacher is dependent on the candidate key Subject (non-prime attribute). To normalize this relation to BCNF, we need to break this table up into two tables: ClassRoom and Subject.
In sqlcmd:
CREATE TABLE ClassRoom(
ClassRoomID INT PRIMARY KEY,
Subject NVARCHAR(100),
);
CREATE TABLE Subject(
Subject NVARCHAR(100) PRIMARY KEY,
Teacher NVARCHAR(100)
);
In DbSchema:
In DbSchema, create the tables using the GUI as previously explained.
Fourth Normal Form (4NF)
4NF aims to handle multivalued dependencies. Assume a Students table in our Bookstore database, which records the books each student has borrowed and their respective return dates. Each student can borrow multiple books and each book can have multiple return dates.
This table could potentially have redundant data and lead to update anomalies. To normalize it to the 4NF, we need to split it into two tables.
In sqlcmd:
CREATE TABLE StudentBooks(
StudentId INT PRIMARY KEY,
BookId INT FOREIGN KEY REFERENCES Books(BookId),
);
CREATE TABLE BookReturnDates(
BookId INT PRIMARY KEY REFERENCES Books(BookId),
ReturnDate DATE
);
In DbSchema:
Again, in DbSchema, these tables are created through the GUI.
Fifth Normal Form (5NF)
5NF or Project-Join Normal Form (PJNF) deals with cases where information can be reconstructed from smaller pieces of information that can be maintained with less redundancy.
For example, assume we have a BookAuthorPublisher table in our Bookstore database with BookId, AuthorId, and PublisherId as composite key. However, a book can be written by multiple authors and can be published by multiple publishers.
To resolve this, we will decompose this table into three tables: Books, Authors, and Publishers.
In sqlcmd:
CREATE TABLE BookAuthors(
BookId INT PRIMARY KEY REFERENCES Books(BookId),
AuthorId INT FOREIGN KEY REFERENCES Authors(AuthorId)
);
CREATE TABLE BookPublishers(
BookId INT PRIMARY KEY REFERENCES Books(BookId),
PublisherId INT FOREIGN KEY REFERENCES Publishers(PublisherId)
);
In DbSchema:
As before, use DbSchema’s interface to create these tables.
Note:
It’s important to note that normalizing a database to 4NF or 5NF is quite rare as most of the real-world databases are normalized up to 3NF or BCNF. The higher normal forms are more theoretical and are less likely to be found in practical application, but they’re useful for understanding how data can be efficiently organized.
Results from the Query:
First Normal Form (1NF)
After executing 1NF, your Authors table might look like this:
AuthorId | _Name_ |
---|---|
1 |
John Smith |
2 |
Mary Johnson |
3 |
James Williams |
And your Books table might look like this:
BookId | _Title_ | AuthorId |
---|---|---|
1 |
Programming 101 | 1 |
2 |
Design Patterns | 2 |
3 |
Learn SQL | 1 |
Second Normal Form (2NF)
After executing 2NF, you might have a new Publishers table:
PublisherId | **Name** |
---|---|
1 |
TechBooks Publishing |
2 |
Educational Reads |
And your Books table might now look like this:
BookId | Title | AuthorId | PublisherId |
---|---|---|---|
1 |
Programming 101 | 1 | 1 |
2 |
Design Patterns | 2 | 2 |
3 |
Learn SQL | 1 | 1 |
Third Normal Form (3NF)
After executing 3NF, your Publishers table might now look like this:
PublisherId | Name | PublisherAddress |
---|---|---|
1 |
TechBooks Publishing | 123 Tech St, NY |
2 |
Educational Reads | 456 Education Lane, LA |
Fourth Normal Form (4NF)
After executing 4NF, your StudentBooks table might look like this:
StudentId | BookId |
---|---|
1 |
1 |
2 |
1 |
2 |
2 |
And you might have a new BookReturnDates table:
BookId | ReturnDate |
---|---|
1 |
2023-07-20 |
1 |
2023-07-21 |
2 |
2023-07-22 |
Fifth Normal Form (5NF)
After executing 5NF, your BookAuthors table might look like this:
BookId | AuthorId |
---|---|
1 |
1 |
2 |
2 |
3 |
1 |
1 |
2 |
And your BookPublishers table might look like this:
BookId | PublisherId |
---|---|
1 |
1 |
2 |
2 |
3 |
1 |
1 |
2 |
Conclusion
Database normalization is a crucial process to create reliable and efficient databases. While the process can seem complex, tools like sqlcmd and DbSchema, along with a solid understanding of the Normal Forms, can make it manageable. Despite its limitations, normalization is a best practice for database design, and it is an essential skill for any database professional.
References
- Microsoft Docs: sqlcmd
- DbSchema: Interactive Diagrams
- Database Normalization: Database Normalization