Relational database
A
relational database is simply a
database that conforms to the
relational model. The term refers to the
data, and the structure of that data. The software used to create a relational database is called the
Relational Database Management System (RDBMS), but sometimes that software is mistakenly called the relational database.
The term was originally defined and coined by
E.F. Codd.
Codd's definition is now not the only usage of the term, as many modern DBMS manufacturers have adopted a more relaxed usage of the term.
As mentioned above, a relational database is a database that conforms to the relational model. A relational database could also be defined as a
set of relations or
a database built in an RDBMS.An RDBMS is sometimes incorrectly called a relational database. But, strictly speaking,
Oracle,
Microsoft SQL Server, and
MySQL are
not relational databases. Under popular usage of the term, these software packages are called "Relational Database Management Systems" (RDBMS), and as such they can be used to create relational databases. There is some disagreement as to whether or not they can be considered "relational", because they do not fully conform to the relational model. For a more full explanation on the requirements for a DBMS to be considered "relational", see
RDBMS#Current Usage.
Strictly speaking, a relational database is merely a collection of
relations (frequently called
tables). Other items are frequently considered part of the database, as they help to organize and structure the data, in addition to forcing the database to conform to a set of requirements.
Relations or Tables
A
relation is defined as a set of
tuples that all have the same
attributes. This is usually represented by a
table, which is data organized in
rows and
columns. In a relational database, all of the data stored in a column should be in the same
domain (i.e.
data type). In the relational model, the tuples should not have any ordering. This means both that there should be no order to the tuples, and that the tuples should not impose an order of the attributes. In other words, neither the rows nor the columns should have an order.
While this is the desired result, it is not universally achieved. The SQL standard requires columns to have a defined order. All data stored in a computer has to have an order, as the memory of a computer is linear. Also, when the data is returned, there must be an order in which the data is returned (because all transfer protocols are linear, and coincidentally enough, humans read in a linear fashion). The point here is that this order must never make a logical difference in the system. Frequently orders are imposed which impact performance, but they should never change the result of a query on the database. In practice, several of the DBMSs that are considered "relational" impose an order that makes a logical difference.
Constraints
Constraints are a way of providing restrictions on the kinds of data that can be stored in the relations. These are usually defined (formally) in the form of expressions that result in a
boolean value, indicating whether or not the constraint holds. Constraints are a way of implementing
business rules into the database.
Under the strictest sense, constraints are not considered part of the relational database, but because of the integral role which they play in organizing data, they are usually considered part of the database.
Keys
A tuple usually represents some object, and the data associated with that object, whether that object is a physical object, or a concept. A
key is a kind of constraint which requires that the object, or critical information about the object, isn't duplicated. For example, a family might like to have a constraint such that no two people in the immediate family have the same name. If information about family members were stored in a database, a key could be placed over the family member's name. In a University, they have no such luxury. Each student is typically assigned a Student ID, which is then made the key. Keys can have more than one column, for example, a nation may impose a restriction that a province can't have two cities by the same name. So, when cities are stored in a relation, there would be a key defined over province and city name. This would allow two different provinces to have a town called
Springfield (because their province would be different), but not two cities with the same name in the same province. A key over more than one attribute is called a
compound key. Theoretically, a key can even be over zero attributes. This would enforce that there cannot be more than one tuple in the relation.
Most relations have at least one key defined on it. Because a relation is defined in the relational model as being a set, it can't have duplicate rows. Some DBMSs don't enforce this. If a DBMS does enforce this, it means that there is always at least one key on each relation, namely the key involving all of the attributes of the relation.
A key could be defined formally by requiring that the
cardinality of the relation should be equal to the cardinality of the relation
projected over the columns of the key.
A key, in this context, refers to any set of attributes which uniquely span the relation. In particular, this is called a
superkey. A
candidate key is a minimal superkey, meaning that, none of the attributes in the key could be removed from the key, and still have that attribute set be a key. Many DBMSs have a concept of a
primary key. The primary key (usually a candidate key) is the key most often used to identify a tuple. In some RDBMSs, the primary key of a base relvar is the
storage key (sometimes
clustered key), meaning that that is how the data is stored physically. If the value of the primary key is actual interesting data with logical ties to the data (like a name) for the tuple, it is called a
natural key. If the key is generated and doesn't have any logical connection to the rest of the data in the tuple, it is called a
surrogate key. Other candidate keys that were not chosen as the primary key are called
alternate keys.
Foreign Keys
A
foreign key is not a key by the previous definition. Rather, a foreign key is a
reference to a key in another table. Meaning that the referencing tuple has, as part of its attributes, the values of a key in the referenced tuple that corresponds to the relationship.
For example, a company has several departments, and each employee is a member of one department. This is enforced in the database by a foreign key. Imagine there is a relation for departments. Each department has a department name, and a
surrogate key for the department called "DepartmentID". The employee relation would have "DepartmentID" as an attribute, with a foreign key reference to the aforementioned surrogate key of the department relation. The DBMS would then enforce that each employee could not be created without specifiying a valid department tuple, and department tuples could not be deleted if there are employees referencing that department.
A foreign key could be described formally as "For all tuples in the referencing relation projected over the referencing attributes, there must exist a tuple in the referenced relation projected over those same attributes such that the values in each of the referencing attributes match the corresponding values in the referenced attributes".
Transition Constraints
A transition constraint is a way of enforcing that the data doesn't enter an impossible state because of a previous state. For example, it shouldn't be possible for a person to change from being "married" to being "single, never married". The only valid states after "married" might be "divorced", "widowed", or "deceased".
Other constraints
Other constraints of various different kinds can be created to enforce various kinds of business rules. They can be as simple as "the number of cars an individual owns must be non-negative" or complex patterns like "If the work that an employee performs is 'Hazardous Materials Transport' then that employee's age must be 18 years of age, and the employee's certifications must include 'Hazmat endorsement', and company insurance for that employee must include life insurance."
Other Relvars
A relvar is a "relation variable". In a relational database, all data is stored and accessed via relations. The data that is actually stored in the database are stored as relations. These relations are sometimes called "base relvars". This is equivalent to a "table". Other relvars do not have their data stored in them, but are a result of applying
relational operations, to other relvars. These relvars are sometimes called "derived relvars", meaning that their information is derived from other sources. These are equivalent to "
views" or "queries". Derived relvars are convenient in that though they may grab information from several relvars, it is presented externally as a single relvar for a simpler perspective. Also, it can be used as an
abstraction layer.
Derived relvars are not always considered part of a relational database, partially because they are not essential to the functioning of the database.
Stored Procedures
A stored procedure is executable
code that is associated with the database. Stored procedures usually store how to perform common operations, like inserting a tuple into a relation, or gathering statistical information about usage patterns. Frequently they are used as an
application programming interface for security or simplicity. These are usually written as
Imperative programming code extending the
Data Definition Language and/or the
Data Manipulation Language for the DBMS.
Stored procedures are not always considered part of a relational database, partially becacuse they are not essential to the functioning of the database.
Indices
An index is a way of providing quicker access to the data in a relational database. Indices can be created on any combination of attributes on a relation. Then when tuples in a relation need to be looked up, similar to how a book's index works, the index can be accessed. Rather than having to check all of the tuples, the index tells the DBMS where the tuple is. Indices are usually implemented via
B+ trees.
Indices are usually not considered part of the database, as they are considered an implementation detail, though indices are usually maintained by the same group that maintains the other parts of the database.
Queries made against the relational database, and the derived relvars in the database are expressed in a
relational calculus or a
relational algebra. Some relvars merely
restrict the tuples that are returned,
rename the attributes, or
remove some of the attributes (often:
project) from the result set. These three operations are called
unary operations. Other operations that can be performed involve combining more than one relvar together. Examples of these are
set union,
set intersection,
cartesian product (often called a "cross product"), and various different kinds of
joins. These are all
binary operations.
Normalization is a process of altering the structure of the database to make the database conform to one or more
best practices, to assist in performance and ease of data manipulation. By far, the most common forms of normalization applied to databases are called the
normal forms. The most common normal forms that databases are normalized to are:
*First normal form:Ensures that the RDBMS can access all of the data using relational means.
*Second normal form
*Third normal form
*Boyce-Codd normal form
*Fourth normal form
*Fifth normal form:All reduce
functional dependencies, and therefore, various kinds of data duplication, and therefore, various difficulties of correctly modifying the data.
International Data Corporation reports that
as of 2005 the worldwide market for RDBMSs is 9.4% over the prior year at $14.6 billion.
[2005 RDBMS Market] Even though relational systems are popular, there are other options.
Older models
Flat file databases are simple. This prevents them from storing complicated information easily, but in flat file systems can provide better performance. Databases based on the
hierarchical model and the
network model are older, and not as robust as the relational model.
Objects
Object databases allow for even more complicated data structures, and are easier to integrate with
Object-oriented programming tools. These features increase the cost of development.
The relational model has a strong basis in
predicate logic and
set theory, so RDBMSs can find optimization patterns that most humans typically can't. In object databases, programmers can implement complicated access patterns themselves, and might be able to get better performance than an RDBMS, but with substantially increased query writing time.
Relational
Most commercial RDBMSs do not strictly conform to the relational model. In recent years, there has been siginificant growth on DBMSs that more faithfully adhere to the relational model.
Dataphor,
Duro, and
Rel are three such implementations. They believe that being stricter adherents to
Codd's 12 rules will make their systems easier to use.
Hugh Darwen in presentations mentions that DBMSs that fail to adhere to the relational model are making logical differences, and each of these logical differences are big differences.
[Slideshow for his presentation The Askew Wall]Application
*
Database*
Database normalization*
Database table*
Database tuple*
Dimensional database*
Flat file database*
Temporal databaseImplementation
*
Relational database management system (RDBMS)
* (SQL)
Theory
*
Relation*
Relation construction**
Relation composition**
Relation reduction*
Cartesian product*
Projection*
Relational algebra*
Relational modelHistory
*
De Morgan, Augustus (1806â€"1871)
*
Boole, George (1815â€"1864)
*
Peirce, Charles Sanders (1839â€"1914)
*
Codd, Edgar Frank (1923â€"2003)
*
Ullman, Jeffrey D. (1942â€" )
*
Date, Christopher J.*
Darwen, Hugh*
Cantor, Georg*
A Relational Model of Data for Large Shared Data Banks: By
Edgar F. CoddRelational Database Principles by
Colin Ritchie. ISBN 0826457134
Database in Depth by
Chris Date. ISBN 0596100124. A new book (2005) by perhaps the best-known author in this field.
*
Limitations of the relational model*
Introduction to Relational Database Design*
15 Seconds: Introduction to Relational Databases*
Introduction to Relational Databases*
Database Normalization