Database Design Fundamentals

doi:10.1007/978-981-19-3032-4_7

Huawei Technologies Co., Ltd.

14k Accesses

Abstract

Database design refers to constructing a suitable database schema for specific application objects according to the characteristics of database system, establishing database and corresponding applications, so that the whole system can effectively collect, store, process and manage data to meet the usage requirements of various users.

You have full access to this open access chapter, Download chapter PDF

Database design refers to constructing a suitable database schema for specific application objects according to the characteristics of database system, establishing database and corresponding applications, so that the whole system can effectively collect, store, process and manage data to meet the usage requirements of various users.

This chapter introduces the relevant concepts, overall objectives and problems to be solved in database design , and details the specific work in stages including requirement analysis, conceptual design, logical design and physical design according to the New Orleans design methodology. Finally, the specific means of implementation of database design are introduced with relevant cases.

Through this chapter, the reader is able to describe the characteristics and uses of the data models, enumerate the types of data models, describe the criteria of the third normal form (NF) data model, describe the common concepts in the logical model, distinguish the corresponding concepts in the logical and physical models, and enumerate the common means of anti-NF in the physical design process.

7.1 Database Design Overview

Database design refers to the construction of an optimized database logical model and physical structure for a given application environment, and the establishment of the database and its application system accordingly, so that it can effectively store and manage data to meet the application needs of various users. It is worth noting that there is no "optimal" standard for database design , and different designs and optimizations need to be made for different applications. The OLTP and OLAP scenarios are very different, and there are corresponding differences in the methods and optimization tools for database design .

Readers need to first understand what the most common methods and techniques are, and then use them in conjunction with different practical scenarios.

7.1.1 Difficulties of Database Design

In practical applications, database design will encounter many difficulties, mainly the following.

(1)
The lack of service knowledge and industry knowledge of technical staff familiar with the database.

Database design needs to be flexibly adjusted for different applications, which requires the relevant personnel to have a good understanding of the application usage scenario and service background, while the technical personnel familiar with the database often lack service knowledge and industry knowledge.
(2)
People who are familiar with service knowledge often lack understanding of database products.

Relatively speaking, people familiar with service knowledge and service process often lack understanding of database products and are not familiar with database design process. Therefore, in the process of data model design, the two party need to fully communicate with each other in order to do a good job of database design .
(3)
There is no way to clarify the scope of service requirements of the database system for the application service in the initial stage.

In the initial stage of the project, the application service is not particularly clear, and the users' requirements are not established. And the database system is gradually improved along with the users' requirements, which is also a difficult point in database design .
(4)
User requirements are constantly adjusted and modified during the design process, and even after the database model is landed, new requirements will appear, which will have an impact on the existing database structure.

Because of the uncertainty of the requirements, database adjustment is frequent, these will cause some trouble to the database design , so the database design is a spiral forward work, which needs to be constantly adjusted, improved and optimized to better meet the needs of the application.

7.1.2 Goal of Database Design

Database design is the technology of establishing database and its application system, which is the core technology in the development and construction of information system. The goal of database design is to provide information infrastructure and efficient operation environment for users and various application systems. Efficient operation environment means to achieve high efficiency in database data access, database storage space utilization, and database system operation and management. The goal of database design must set the time range and target boundary range, and the design goal without restrictive conditions will fail because of the too large range. Reasonable development of database system goals is a very difficult thing. The goals that are too large or too high will result in unachievable goals, and targets that are too small will be unacceptable to the customer. Therefore, the goals should be planned reasonably in stages and levels so as to form sustainable solutions for the construction process, ultimately meeting the needs of the users and achieving the goals.

7.1.3 Methods of Database Design

In October 1978, database experts from more than 30 countries have dedicated their time to discussing database design methods in New Orleans, USA. They applied the ideas and methods of software engineering to propose a database design specification, which is the famous New Orleans design methodology, currently recognized as a more complete and authoritative design method for database specification. The New Orleans design methodology divides the database design into four phases, as shown in Fig. 7.1.

These four phases are requirement analysis, conceptual design, logical design, and physical design. The requirement analysis phase mainly analyzes user requirements and produces requirement statements; the conceptual design phase mainly analyzes and defines information and produces conceptual models; the logical design phase mainly designs based on entity connections and produces logical models; and the physical design phase mainly designs physical structures based on physical characteristics of database products and produces physical models.

In addition to the New Orleans design methodology, there are also database design methods based on E-R diagrams, and design methods based on the 3NF. They are all specific techniques and methods used in different phases of database design , which will be described in detail in later chapters.

7.2 Requirements Analysis

7.2.1 Significance of Requirement Analysis

In real life, the whole building without a good foundation is crooked. Experience has proven that poor requirement analysis can directly lead to incorrect design. If many problems are not discovered until the system testing stage and then go back to correct them, it will be costly, so the requirement analysis stage must be given high priority.

The requirement analysis phase mainly collects information and analyzes and organizes it to provide sufficient information for the subsequent phases. This stage is the most difficult and time-consuming stage, but is also the basis of the whole database design . If the requirement analysis is not done well, the whole database design may be reworked.

The following points should be done in the requirement analysis phase.

(1)
Understand the operation of the existing system, such as the service carried by the existing system, the service process and the deficiencies.
(2)
Determine the functional requirements of the new system, that is, to understand the end-user's ideas, functional requirements and the desired results.
(3)
Collect the basic data and related service processes that can achieve the objectives, so as to prepare for a better understanding of service processes and user requirements.

7.2.2 Tasks of the Requirement Analysis Stage

The main task of the requirement analysis phase is first to investigate user service behaviors and processes, then to conduct system research, collect and analyze requirements, determine the scope of system development, and finally prepare a requirement analysis report.

The phase of investigation of user service behaviors and processes requires understanding of user expectations and goals for the new system and the main problems of the existing system. In the stage of system research, collecting and analyzing requirements, determine the scope of system development, with the main tasks being divided into the following three parts.

(1)
Information research. It is necessary to determine all the information to be used in the designed database system and to clarify the sources, methods, data formats and contents of the information. The main goal of the requirement analysis phase is to clarify what data is to be stored in the designed database, what data needs to be processed, and what data needs to be used for the next system.
(2)
Processing requirements. Translate the user's service functional requirements into a requirement statement that defines the functional points of the database system to be designed. That is, convert the requirements described by users in service language into design requirements that can be understood by computer systems or developers; it is necessary to describe the operational functions of data processing, the sequence of operations, the frequency and occasion of execution of operations, and the connection between operations and data, as well as to specify the response time and processing methods required by users. These contents form the necessary part of the user requirement specification.
(3)
Understand and record user requirements in terms of security and integrity. In the stage of writing requirement analysis report, it needs to go through the process of system research, collection and processing, and generally the output product in this stage is the requirement analysis report, including user requirement specification and data dictionary . The data dictionary here is a summary document of the data items and data of the existing services, not the data dictionary inside the database product.

7.2.3 Methods of Requirement Analysis

The focus of requirement analysis is to sort out the "information flow" and "service flow" of users. The "service flow" refers to the current status of the service, including service policies, organization, service processes, etc. The "information flow" refers to the data flow, including the source, flow and focus of data, the process and frequency of data generation and modification, and the relation between data and service processing. External requirements should be clarified during the requirement analysis phase, including but not limited to data confidentiality requirements, query response time requirements, output report requirements, etc.

According to the actual situation and the possible support from users, the requirement investigation can be done by a combination of means, for example, viewing the design documents and reports of existing systems, talking with service personnel, and questionnaire surveys. If conditions permit, sample data from existing service systems should also be collected as part of the design process to verify some service rules and understand the quality of data.

	During the requirement analysis process, do not make assumptions or guesses about the user's ideas. Always check with the user for assumptions or unclear areas.

7.2.4 Data Dictionary

The data dictionary is the result obtained after the introduction of requirement analysis, data collection and data analysis. Unlike the data dictionary in the database, the data dictionary here mainly refers to the description of the data, not the data itself, and includes the following contents.

(1)
Data items: They mainly includes data item name, meaning, data type, length, value range, unit and logical relation with other data items, which are the basis of model optimization in logic design stage.
(2)
Data structure: Data structure reflects the combination relation between data items, and a data structure can be composed of several data items and data structures.
(3)
Data flow: The data dictionary is required to represent the data flow, that is, the transmission path of data in the system, including data source, flow direction, average flow, peak flow, etc.
(4)
Data storage: This includes data access frequency, retention time duration, and data access methods.
(5)
Processing process: This includes the function of the data processing process and processing requirements. Function refers to what the processing process is used to do, and the requirements include how many transactions are processed per unit of time, how much data volume involved, time response requirements, etc.

There is no fixed document specification for the format of data dictionary , in practice, it can refer to the above content items and can be reflected through different descriptive documents or in the model file. So the data dictionary is a concept at the abstract level, a collection of documents. And in the requirement analysis phase, the most important output is the user requirement specification, where the data dictionary often exists as an annex or appendix to provide a reference for the model designers in their subsequent work.

7.3 Conceptual Design

7.3.1 Conceptual Design and Conceptual Model

The task of the conceptual design phase is to analyze the requirements proposed by the users, synthesize, summarize and abstract the user requirements, and form a conceptual-level abstract model independent of the concrete DBMS , i.e., the conceptual data model (hereinafter referred to as the conceptual model). The conceptual model is a high level abstract model, independent of any specific database product, not be bound by any database product characteristics. At this stage, the conceptual model is independent of the physical attributes of any particular database product.

The conceptual model has developed the following four main features.

(1)
It can truly and fully reflect the real world, including the connection between things and things, as a real model of the real world.
(2)
It is easy to understand, enabling discussion with users who are not familiar with the database.
(3)
It is easy to change, when the application environment and application requirements change, the conceptual model can be modified and expanded.
(4)
It is easy to convert to a relational data model.

The latter two are the basic conditions for the smooth progress of the next stage of work.

7.3.2 E-R Approach

The conceptual model is a conceptual-level abstract model that is independent of the concrete database management system , generated by analyzing the requirements proposed by users and synthesizing, summarizing and abstracting the user requirements. The model can directly organize the real world according to the concrete data model, but many factors must be considered at the same time, and the design work is complicated with unsatisfactory effect, so an approach is needed to describe the information structure of the real world.

In 1976 E-R (Entity-Relation) approach was proposed. This approach quickly became one of the commonly used methods in conceptual models because of its simplicity and practicality, and is now a common approach to describing information structures. The tool used in the E-R approach is called E-R diagram, which mainly consists of three elements - entity, attribute and linkage, which is widely used in the conceptual design stage. The database concept represented by E-R diagram is very intuitive and easy to understand by users.

An entity is a collection of real-world objects that have common attributes and can be distinguished from each other. For example, teachers, students, and courses are all entities, as shown in Fig. 7.2. In an E-R diagram, specific entities are generally represented by rectangular boxes. Each specific record value in an entity, such as each specific student in the student entity, is called an instance of the entity.

Attributes are data items that describe the nature or characteristics of an entity, and all instances belonging to the same entity have the same attributes. For example, the student number, name and gender shown in Fig. 7.3 are all attributes. In the conceptual model, attributes are generally represented by rectangular boxes with rounded corners.

	In practice, the conceptual model can also be designed not to the attribute level in detail, but to the entity level. If the conceptual model will increase the workload is all the attributes are planned out in detail. The E-R diagram of the conceptual model should delineate the linkages between entities clearly and express them clearly in the practical application project. So it is sufficient that the general conceptual model reaches the level that reflects the linkages between entities.

The linkages within and between entities are usually represented by diamond-shaped boxes. In most cases, the data model is concerned with the linkages between entities. The linkages between entities are usually divided into three categories.

(1)
One-to-one linkage (1:1): Each instance in entity A has at most one instance linked to it in entity B, and vice versa. For example, a class has a Class Advisor, this linkage is recorded in the form of 1:1.
(2)
One-to-many linkage (1:n): Each instance in entity A has n instances linked to it in entity B, while each instance in entity B has at most 1 instance linked to it in entity A, which is recorded as 1:n. For example, there are n students in a class.
(3)
Many-to-many linkage (m:n): Each instance in entity A has n instances linked to it in entity B, while each instance in entity B has m instance linked to it in entity A, which is recorded as m:n. Take for example the linkage between students and elective courses. A student can take more than one course, and a course can be taken by more than one student.

Simply put, conceptual design is the conversion of realistic conceptual abstractions and linkages into the form of an E-R diagram, as shown in Fig. 7.4.

7.4 Logical Design

7.4.1 Logical Design and Logical Models

Logical design is the process of converting a conceptual model into a concrete data model. According to the basic E-R diagram established in the conceptual design phase, the selected target data model (hierarchical, mesh, relational, or object-oriented) is converted into the corresponding logical-layer target data model, and what is obtained is the logical data model (hereinafter referred to as logical model). For relational databases, this conversion has to conform to the principles of the relational data model.

The most important work in the logical design phase is to determine the attributes and primary keys of the logical model. The primary key identifies the unique primary keyword in the table, also known as a code. A primary key can consist of a single field or multiple fields. The more common way of logical design work is to use E-R design tool and IDEF1X method for logical model building. Commonly used E-R diagram representations include IDEF1X, Crow's Foot for IE models, Unified Modeling Language (UML) class diagrams, etc.

7.4.2 IDEF1X Method

The logical model of this book adopts the IDEF1X (Integration DEFinition for Information Modeling) method. IDEF, which stands for Integration DEFinition method, was established in the US Air Force ICAM (Integrated Computer Aided Manufacturing) project, and three methods were initially developed - functional modeling (IDEF0), information modeling (IDEF1), and dynamic modeling (IDEF2). Later, as information systems were developed one after another, IDEF cluster methods were introduced, such as data modeling method (IDEF1X), process description acquisition method (IDEF3), object-oriented design method (IDEF4), OO design method using C++ (IDEF4C++), entity description acquisition method (IDEF5), design theory acquisition method (IDEF6), and Human-system interaction design method (IDEF8), service constraint discovery method (IDEF9), network design method (IDEF14), etc. IDEF1X is an extended version of IDEF1 in the IDEF family of methods, which adds some rules to the E-R method to make the semantics richer.

The IDEF1X method has several features when used for logic modeling.

(1)
It supports the semantic structure necessary for the development of conceptual and logical models, and has good scalability.
(2)
It has concise and consistent structure in semantic concept representation.
(3)
It is easy to understand, enabling service personnel, IT technicians, database administrators and designers to communicate based on the same language.
(4)
It can be generated automatically. Commercial modeling software supports the IDEF1X model design methodology and can be quickly converted to and from models at all levels.

7.4.3 Entities and Attributes in the Logic Model

According to the characteristics of entities, they can be divided into two categories.

(1)
Independent entity, which is usually represented by a rectangular box with right-angle corners. An independent entity is an entity that exists independently that does not depend on other entities.
(2)
Dependent entity, which is usually represented by a rectangular box with round corners. Dependent entities must depend on other entities, and the primary key in a dependent entity must be part or all of the primary key of an independent entity.

The primary key of the independent entity will appear in and become part of the primary key of the dependent entity, as shown in Fig. 7.5, where the chapter entity depends on the book entity. For example, many books have Chap. 2. If there is no book as one of the ID primary keys to distinguish the Chap. 2 of different books, only one record of Chap. 2 will appear in the chapter entity. But in fact, the title, page number and word count of Chap. 2 of different books are different, so the chapter entity depends on the book entity in order to function.

Attributes are the characteristics of the entity, containing the following types to be noted.

(1)
Primary key. The primary key is an attribute or group of attributes that identifies the uniqueness of an entity instance. For example, the name of a student entity cannot be used as a primary key because there may be cases of duplication of name. The school number or ID number can be used as an attribute that uniquely identifies the student, i.e., it can be used as a primary key.
(2)
Optional key. It can identify other attributes or groups of attributes of the entity.
(3)
Foreign key. Two entities are linked, and the foreign key of one entity is the primary key of the other entity. You can also call the primary key entity the parent entity and the entity with the foreign key the child entity.
(4)
Non-key attribute. It is an attributes other than primary key and foreign key attributes inside an entity.
(5)
Derived attribute. It is a field that can be counted or derived from other fields.

The primary key of the book entity shown in Fig. 7.6 is the book ID, while other attributes are non-key attributes. The primary key of the chapter is the book ID plus the chapter number, while other attributes are non-key attributes. The book ID in the chapter entity is a foreign key.

How to distinguish the relation between primary key, foreign key and index? A primary key uniquely identifies an instance, have no duplicate values, which is a non-null attribute, and should not be updated. Its role is to determine the uniqueness of a record and ensure data integrity, so an entity can have only one primary key.

A foreign key is generally the primary key of another entity, which can be duplicated or null for this entity, and its role is to establish data reference consistency with the relation between two entities. So an entity can have more than one foreign key. For example, attribute A is a foreign key in table X, and it is duplicable in table X. Because it is a foreign key, it must be a primary key in another table. Suppose A is in table Y (if any) as a primary key, then attribute A is not allowed to be duplicated.

Indexes are physical objects of the database and can be divided into unique indexes and non-unique indexes by uniqueness. A unique index is an object built on a table with no duplicate values and can have a null value. A non-unique index is an object built on a table, which can be null and can have duplicate values. The purpose of indexes is to improve query efficiency and thus speed up queries. The relation between primary key, foreign key and index are shown in Table 7.1.

	Primary keys and foreign keys are logical concepts in the logical model, while indexes are physical objects. Many databases can create primary keys when building a table, at which time the attributes of the primary keys are unique non-null indexes.

Table 7.1 Relation between primary key, foreign key and index

Database Design Fundamentals

Abstract

7.1 Database Design Overview

7.1.1 Difficulties of Database Design

7.1.2 Goal of Database Design

7.1.3 Methods of Database Design

7.2 Requirements Analysis

7.2.1 Significance of Requirement Analysis

7.2.2 Tasks of the Requirement Analysis Stage

7.2.3 Methods of Requirement Analysis

7.2.4 Data Dictionary

7.3 Conceptual Design

7.3.1 Conceptual Design and Conceptual Model

7.3.2 E-R Approach

7.4 Logical Design

7.4.1 Logical Design and Logical Models

7.4.2 IDEF1X Method

7.4.3 Entities and Attributes in the Logic Model

7.4.4 NF Theory

7.4.5 Logic Design Considerations

7.5 Physical Design

7.5.1 Physical Design and Physical Models

7.5.2 Denormalization of the Physical Model

7.5.3 Maintaining Data Integrity

7.5.4 Establishing a Physicalized Naming Convention

7.5.5 Physicalizing Tables and Fields

7.5.6 Using Modeling Software

7.5.7 Physical Model Products

7.6 Database Design Case

7.6.1 Scenario Description

7.6.2 Regularization Processing

7.6.3 Data Types and Length

7.6.4 Denormalization

7.6.5 Index Selection

7.7 Summary

7.8 Exercises

Author information

Authors and Affiliations

Consortia

Huawei Technologies Co., Ltd.

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation