A database is a collection of data, or information, organised in such a way as to make it readily accessible and useful. A database management system (DBMS) is a system that provides all the functionality needed to create, modify and maintain a database or collection of databases. DBMSs were developed in the late 50's and early 60's in order to help manage the increasingly compnlex and voluminous information demanded in certain corporate environments. In a database client/server architecture, a client application usually requests data and data-related services (such as sorting/filtering) from a database server. The database server, also known as the sQL engine, responds to the client's requests and provides secured access to shared data. A client application can, with a single SQL statement, retrieve and modify a set of server database records. The SQL database engine can filter the query result sets, resulting in considerable data communication savings.
Relational database systems are the most common DBMS today. These relational DBMSs organise data into seperate structures called tables,which can be linked via common information to make data storage more efficient. A DBMS is like a traditional filing system in that it stores individual groups and pieces of information. Like a filing system, a DBMS consists of seperate components, like the cabinet, drawers and folders. A relational DBMS has the following basic components:
The relational model of database management was developed at IBM's San Jose Research Lab in the early 1970s by E.F. Codd. Structured Query Language (SQL) is the most popular (and standard) language for interacting with databases.
Note: For studying this chapter you should use the SQL Query Tool. This is a HTML form front-end to the course database which you can use for practicing your SQL skills. When performing queries on, for example, the CUSTOMERS, CARS and SALES tables below, please use the prefix 'EE_ADMIN'. E.g. SELECT * FROM EE_ADMIN.CUSTOMERS
This just has to do with putting the tables in another schema to prevent users mistakenly deleting the sample tables and has no other effect.
Previous databases structures, such as network databases and hierarchical databases were used previous to relational databases. While we will not go into detail regarding these structures, it will suffice to say that each had it's own set of disadvantages. In fact it was these disadvantages, that led to intense interest in the new relational data model when it was first described by Dr. Codd in 1970. The relational model proposed by Dr. Codd was an attempt to simplify database structure and indeed, by the mid-80s, the relational model was clearly emerging as the "new wave" in data management. Today, neither hierarchical or network databases have any major role to play in the database market.
In response to requests of definition of "what is a relational database?", Dr. Codd wrote an article in 1985 setting forth 12 rules to be followed by any database that called itself "truly relational". These twelve rules have since been accepted as the definition of a truly relational DBMS. However, rather than going in depth into Codd's 12 Rules it's easier to use a more informal definition:
Unlike the columns, the rows in a table do not have any particular order. In fact, if you use two consecutive database queries to display the contents of a table, there is no guarantee that the rows will be listed in the same order twice. You can however, SORT the data using SQL but we'll come to that shortly. The outputted order, whether sorted or not have no relation to the actual arrangement of the rows within the table. It is possible also for a table to have 0 rows. This is called an empty table. The ANSI/ISO standard does not limit the number of rows in a table, and most database implementations will allow a table to grow until it exhausts the available disk space on the computer. If a maximum is set by the vendor, it is always a very generous (billions!).
Because the rows of a relational table are unordered, you cannot select a specific row by its position in the table. There is no "first row", "last row" or "thirteenth row" of a table. So how are individual, specific rows specified?
In a well-designed relational database, every table has some column or combination of columns whose values uniquely identify each row in the table. This column (or columns) is called the primary key of the table. In practice "ID numbers", such as employee numbers, student numbers, staff usernames etc. are often chosen as primary keys. Considering again our CUSTOMERS table there is only one option for primary key, which is the ID column. One might imagine that we could use SURNAME as a primary key, since all of the entries are indeed unique. However, if our customer-base expands and another "Smith" becomes a customer, then this primary key definition breaks down.
Sometimes, multiple columns are used as primary keys, where the unique key is defined by a combination of columns. As an example of this consider the following table:
Anyone familiar with the Irish car registration system, will recognise the structure of this table. Each registration consists of the year (last two digits only), a code which represents the region/county (Dublin=D, Cork=C, Meath=MH) and finally a number. The numbers are not completely unique, but are unique within an individual year for an individual county. Although the year is normally defined as '00' or '02', we could just as easily represent them as an integer in the database using '0' or '2' in these cases. As can be seen in the table, none of the columns on their own are unique, but a combination of the first three columns (YEAR, COUNTYCODE and CNUMBER) provides the primary key. The primary key has a different unique value for each row in a table, so no two rows of a table with a primary key are exact duplicates of one another. Naturally, the entire registration could be kept in one column - however this would mean that you could not perform queries relating to the year of the vehicle or the registration region. A table where every row is different from all other rows is called a relation in mathematical terms. The name "relational database" comes from this term, because relations (tables with distinct rows) are at the heart of a relational database.
A column in one table whose value matches the primary key in some other table is called a foreign key. First, consider the two tables we have already introduced, the CUSTOMERS table and the CARS table; currently there is no real relationship between the two tables. We have a table of our company customers and we have a table containing all of the cars that have come through our business. Let us introduce a third table 'SALES' through which we will also introduce the concept of foreign keys:
Now, let us consider this SALES table and the CUSTOMERS table - we have formed an effective relationship between these two tables. The PURCHASER column of the SALES table represents the ID of the customer who has purchased this particular car from our business. The values that this column contains are actually matched by the ID details in our CUSTOMERS table. The diagram below shows the relationship between the two example tables:
In this example, we can see that Joe Murphy (our customer) has purchased two cars from us. The PURCHASER column is a foreign key for the CUSTOMERS table.
Together a primary key and a foreign key create a parent/child relationship between the tables that contain them. Just as a combination of columns can server as the primary key of a table, a foreign key can also be a combination of columns. In fact, the foreign key will always be a compound (multi-column) key when it references a table with a compound primary key. Obviously, the number of columsn and the data types of the columns in the foreign key and the primary key must be identical to one another. Additionally, a table can contain more than one foreign key if it is related to more than one other table. Again, let us consider the 'CARS' table and introduce it to the example above:
The primary key for the CARS table is the combination of the YEAR, COUNTYCODE and CNUMBER columns. Now, the SALES table has two foreign keys which are:
The example tables above show how multiple parent/child relationships can exist created by the foreign keys. Foreign keys are a fundamental part of the relational model because they create relationships among tables in the database. Unfortunately, as with primary keys, foreign key support was missing from earlier database implementations, although they now appear in all commercial database products.
SQL, which is now an ISO standard, is a powerful language consisting of a few commands. It was developed by also by IBM in the mid-1970s to serve as an "English-like" front-end query language to the "System R" relational database prototype. Even though the SQL language is English-like, it is firmly rooted in the solid mathematical foundation of set theory and predicate calculas. The main body of the SQL language consists of about 40 statements, the most important of which are displayed in a table below. SQL is a declarative language by which a user or application can specify what data they want. It does not dictate to the database engine how to retrieve the data, nor does it contain any information about how to navigate through the database. The details of how the database has organised the data are hidden from the user, who simply declares what data he or she is interested in manipulating and the desired result of the operation.
As stated above, the following table lists the major SQL statements and a brief description:
When you specify a table name in an SQL statement, SQL assumes that you are referring to one of your own tables (ie. one that you created). Usually, table names are chosen that are both short and descriptive. Most DBMS brands allow different users to create tables with the same name, so for example user Jack and user Jill can both create a table named VARIOUS. The DBMS uses the appropriate table, depending on who is requesting data from VARIOUS (ie. when Jack queries the VARIOUS table, he queries his own, and likewise with Jill). With the proper permission, you can refer to tables owned by other users, by using a qualified table name. A qualified table name specifies both the name of the table's owner and the name of the table, seperated by a "." So if Jack wishes to access Jill's VARIOUS table, he can simply useJILL.VARIOUS. A qualified table name can generally be used in an SQL statement wherever a table name can appear. The ANSI/ISO SQL2 standard generalises the notion of a qualified table even further. Schemas are covered a few sections below.
When you specify a column name in an SQL statement, SQL can normally determine from the context which column you intend. However, if the statement involves two different columns with the same name from two different tables, you must use a qualified column name to identify the column you intend. Similar to table names it can be done by referencing VARIOUS.COLNAME1 (an example using the encountered tables we could use CUSTOMERS.EMAIL). Again, it is possible to reference a column from another users table by using JILL.VARIOUS.COLNAME2.
The ANSI/ISO SQL standard specifies the various types of data that can be stored in a SQL-based database and manipulated by the SQL language. These include fixed-length character strings, variable length character strings, integers, floating point numbers, booleans, Dates and a range of other types.
Data Manipulation Language
This section delves into the actual uses and symantics of the most important SQL statements. Students are expected to understand, know how to use and combine the various statements.
The Select Statement
The SELECT statement retrieves data from a database and returns it in the form of query results. The SELECT statement can contain a number of additional clauses, describing the information which should be obtained. In any SELECT statement the SELECT and FROM clauses of the statement are required, with any further being optional. The SELECT clauses lists the data items to be retrieved by the SELECT statement. The items may be columns from the database, or columns to be calculated by SQL as it performs the query. The FROM clauses lists the tables that contain the data to be retrieved by the query (can be multiple tables). These tables are called the source tables of the query, because they are the source of all of the data in the query results. The WHERE clauses tells SQL to include only certain rows of data in the query results. A search condition is used to specify the desired rows. So for example, taking the data from the table above: (repeated here)
Table 6.5. CUSTOMERS Table
ID
|
SURNAME
|
FIRSTNAME
|
EMAIL
|
COUNTRY
|
PHONE
|
2174
|
Smith
|
Peter
|
smithp@ lucent.com
|
Ireland
|
+353 1 4382321
|
2177
|
Murphy
|
Joe
|
joe.murphy@ microsoft.com
|
Ireland
|
+353 97 34312
|
2099
|
Jones
|
Mary
|
mary.jones@ hotmail.com
|
United States
|
+555 33 4444444
|
2200
|
McFadden
|
Jack
|
jack.mcfadden@ dcu.ie
|
Scotland
|
+666 11 222222
|
2311
|
Corcoran
|
Sheila
|
scorco@ ireland.com
|
Ireland
|
+353 1 8903242
|
1983
|
Ryan
|
Ann
|
ann.ryan@ yahoo.com
|
England
|
+777 1 3234343
|
The statement:
SELECT SURNAME
FROM CUSTOMERS
Will return:
SURNAME
-------
Smith
Murphy
Jones
McFadden
Corcoran
Ryan
To retrieve multiple columns, they should be seperated by commas:
SELECT FIRSTNAME,SURNAME
FROM CUSTOMERS
Will return:
FIRSTNAME SURNAME
--------- -------
Peter Smith
Joe Murpy
Mary Jones
Jack McFadden
Sheila Corcoran
Ann Ryan
Now we wish to be more specific in the results we wish to return, by using a WHERE clause:
SELECT ID,SURNAME
FROM CUSTOMERS
WHERE COUNTRY='Ireland'
Will return:
ID SURNAME
--------- -------
2174 Smith
2177 Murpy
2311 Corcoran
Using the ORDER BY clauses we can sort the query results based on the data in one or more columns. It is is omitted, the query results are not sorted. So for example, adding to the last statement:
SELECT ID,SURNAME
FROM CUSTOMERS
WHERE COUNTRY='Ireland'
ORDER BY SURNAME
Will return:
ID SURNAME
--------- -------
2311 Corcoran
2177 Murpy
2174 Smith
Two additional clauses GROUP BY and HAVING can also be used within SELECT statements. They are covered briefly in the later notes.
The result of an SQL query is always a table of data, just like the tables in the database. Usually, the query results will be a table with several columns and several rows. As a rule of shorthand, rather than having to always list every column you wish to retrieve, it is possible to use * to indicate all columns. So for example:
SELECT *
FROM CUSTOMERS
WHERE COUNTRY='Scotland'
Will return:
ID SURNAME FIRSTNAME EMAIL COUNTRY PHONE
----- ------ --------- ----- ------- -----
2200 McFadden Jack jack.mcfadden@dcu.ie Scotland +666 11 222222
The INSERT Statement
SQL is a complete data manipulation language that is used not only for database queries, but also to modify and update data in the database. Compared to the complexity of the SELECT statement, which supports SQL queries, the SQL statements that modify and create database contents are somewhat simple. However, database updates pose some challenges for a DBMS beyond those presented by database queries. The DBMS must protect the integrity of the stored data during changes, ensuring that only valid data is introducted into the database. The DBMS must also coordinate simultaneous updates by multiple users, ensuring that the users and their changes do not interfere with one another.
In short the INSERT statement adds new rows of data to a table.
A new row of data is typically added to a relational database when a new entity represented by the row "appears in the outside world". So for example, in the table presented in the SELECT example, if a new "customer" began purchasing from our company, we must add a new row to the CUSTOMER table to store the customer's information. If they were to make an order from our range of supplies, we might add a new row to the ORDERS table to store a record that this order was placed. In either case, the new row is added to maintain the database as an accurate model of the real world. The smallest unit of data that can be added to a relational database is a single row. In general, an SQL-based DBMS provides three ways of adding new rows of data to a database:
-
A single-row INSERT statement adds a single new row of data to a table.
-
A multi-row INSERT statement extracts rows of data from another part of the database and adds them to a table. It is commonly used in end-of-month or end-of-year processing when "old" rows of a table are moved to an inactive table.
-
A bulk load utility adds data to a table from a file that is outside of the database. It is commonly used to initially load the database or to incorporate data downloaded from another computer system or collected from many sites.
Single-Row INSERT Statement
The single-row INSERT statement adds a new row to a table. The INTO clause specifies the table that receives the new row (the target table) and the VALUES clause specifies the actual data values that the new row will contain. The column list indicates which data value goes into which column of the new row.
Yet again, consider the CUSTOMERS table (rather than displaying the table here again, just scroll/read back up). If we wanted to add an individual row of data (a new customer!) to the table, we would use the following INSERT statement:
INSERT INTO CUSTOMERS (ID, SURNAME, FIRSTNAME, EMAIL, COUNTRY, PHONE)
VALUES (2312,'Kinsella','George','kinselg@intel.com','Ireland','+35313442103')
Note that the 2312 is not surrounded by ' symbols - these are used when dealing with character types. The ID in this situation is stored as an integer type. Conceptually, the INSERT statement bulds a single row of data that matches the column structure of the table, fills it with the data from the VALUES clause, and then adds the new row to the table. The rows of the table are unordered, so there is no notion of inserting the row "at the top" or "at the bottom" or "between two rows" of the table. After the INSERT statement, the new row is simply a part of the table. A SELECT statement performed on the table later, may include the new row, but it may appear anywhere among the rows of query results (unless ORDER BY is used).
Single-entry INSERTS are most commonly performed through interactive SQL on tables that don't grow very often. In practice however, data about a new customer, order, etc. are almost always added to a database through a forms-oriented data entry program. When the data entry is complete and the form submitted, the application layer will generate the SQL required to perform the database elements. More on this later.
The purpose of the column list in the INSERT statement is to match the data values in the VALUES clause with the columns that are to receive them. The list of values and the list of columns must both contain the same number of items, and the data type of each value must be compatible with the data type of the corresponding column, or an error will occur. As a convenience, SQL allows you to omit the column list from the INSERT statement, in the situation where all columns are being inserted. When the column list is omitted, SQL automatically generates a column list consisting of all columns of the table, in left-to-right sequence. Using this shortcut, the previous INSERT statement could be rewritten equivalently as:
INSERT INTO CUSTOMERS
VALUES (2312,'Kinsella','George','kinselg@intel.com','Ireland','+35313442103')
One important factor here is that when you omit the column list, the NULL keyword must be used in the values list to explicitly assign NULL values to columns, as shown in the example. In addition, the sequence of data values must correspond exactly to the sequence of columns in the table. When the column names are explicitly defined in the INSERT statement, it automatically assigns a NULL value to any column whose name is missing from the column list in the INSERT statement. So therefore, the following are both correctly syntaxed SQL which perform the same effect:
INSERT INTO CUSTOMERS
VALUES (2312, 'Kinsella', 'George', NULL, 'Ireland', NULL)
INSERT INTO CUSTOMERS (ID, SURNAME, FIRSTNAME, COUNTRY)
VALUES (2312, 'Kinsella', 'George', 'Ireland')
Most database systems, during the generation of the table structure allow certain columns to be set as NOT NULL, which means that any attempt to omit data from that column when inserting rows, will result in an SQL error.
Multi-Row INSERT Statement
This form of the INSERT statement adds multiple rows of data to its target table. In this form of the INSERT statement, the data values for the new rows are not explicitly specified within the statement text. Instead, the source of new rows is a database query, specified in the statement. This can be seen in the multi-row INSERT statement syntax diagram below:
Figure 6.8. Multi-row INSERT Syntax

As an example, let us consider the situation where we wish to split up our CUSTOMERS table into regionalised tables. So if we wanted to populate a table called IRISHCUSTOMERS we could copy the information in the following way:
INSERT INTO IRISHCUSTOMERS (ID, SURNAME, FIRSTNAME, EMAIL, COUNTRY, PHONE)
SELECT ID, SURNAME, FIRSTNAME, EMAIL, COUNTRY, PHONE
FROM CUSTOMERS
WHERE COUNTRY='Ireland'
Using the shorthand discussed into the single-row discussion above, we can write this SQL statement in a shorter form:
INSERT INTO IRISHCUSTOMERS
SELECT *
FROM CUSTOMERS
WHERE COUNTRY='Ireland'
The INSERT statement identifies the table to receive the new rows (IRISHCUSTOMERS) and the columns to receive the data, just like the single-row INSERT statement. The remainder of the statement is a query that retrieves data from the CUSTOMERS table. Conceptually, SQL first performs the query against the ORDERS table and then inserts the query results, row by row, into the IRISHCUSTOMERS table.
Bulk Load Inserting
Data to be inserted into a database is often downloaded from another computer system, or collected from other sites and stored in a sequential file. To load the data into a table, you could write a program with a loop that reads each record of the file and uses the single-row INSERT statement to add the row to the table. However, the overhead of having the DBMS repeatedly execute single-row INSERT statements can be quite high. For this reason, all commercial DBMS products include a bulk load feature that loads data from a file into a table at high speed. The ANSI/ISO standard does not address this function, and it is usually provided as a standalone utility program rather than as part of the SQL language. Each vendor's utility provides a different set of features, functions and commands.
The DELETE Statement
A row of data is typically deleted from a database when the entity represented by the row "disappears from the outside world". For example, is a customer was to cancel an order, we might delete the respective entry from the ORDERS table, or if the CUSTOMER was to leave the company, you might delete their record from the CUSTOMERS table. In every DELETE case, the row is deleted to maintain the database as an accurate model of the real world. The smallest unit of data that can be deleted from a relational database is a single row (ie. you cannot delete individual fields ... you can update them but we come to that later!).
The DELETE statement syntax can be seen below:
Figure 6.9. DELETE Statement Syntax

In its simplest form, we can use the DELETE statement to delete all rows of a specific table. For example:
DELETE FROM CUSTOMERS
This statement would delete all entries from the CUSTOMERS table without exception. It does not delete the actual table, but rather all of the data within. The definition of the CUSTOMERS table and its columns are still stored in the database. Naturally, it is more common that individual or groups of rows are deleted and in order to do this, we use the WHERE clause. In the following case, the WHERE clause selects several rows of the CUSTOMERS table, and removes all of the selected rows from the table. For this example, we might be DELETING the Irish customers from the CUSTOMERS table, after we have moved them to the IRISHCUSTOMERS table previously:
DELETE FROM CUSTOMERS
WHERE COUNTRY='Ireland'
Likewise, we could equally just delete a single row of data by selecting the unique parameter:
DELETE FROM CUSTOMERS
WHERE ID=2099
which would just remove the "Mary Jones" entry.
Conceptually, SQL applies the WHERE clauses to each row of the CUSTOMERS table, deleting those where the search condition yields a TRUE result and retaining those where the search condition yields a FALSE or NULL result. Because this type of DELETE statement searches through a table for the rows to be deleted, it is sometimes called a SEARCH DELETE statement. Because of the potential damage from DELETE statements, it is important to always specify a search condition, and to be careful that it actually selects the rows you want. When using interactive SQL, it is a good idea first to use the WHERE clause in a SELECT statement to display the selected rows, make sure that they are the ones you want to delete, and only then use the WHERE clause in a DELETE statement.
The UPDATE Statement
Typically, the values of data items stored in a database are modified when corresponding changes occur in the outside world. For example, in our CUSTOMERS table, if one of the customers submits a new email address we need to update their particular record. Data values in the database are updated to maintain the database as an accurate model of the real world. The smallest unit of data that can be modified in a database is a single column of a single row.
The UPDATE statement modifies the values of one or more columns in selected rows of a single table. The target table to be updated is named in the statement, and you must have the required permission to update the table. The WHERE clause selects the rows of the table to be modified. The SET clause specifies which columns are to be updated and calculates the new values for them. So for example:
UPDATE CUSTOMERS
SET EMAIL = 'Peter.Smith@newcompany.com', COUNTRY='Germany'
WHERE ID = 2174
In this example, the WHERE clause identifies a single row of the CUSTOMERS table, and the SET clause assigns new values to two of the columns in that row. The WHERE clauses is exactly the same one you would use in a SELECT or DELETE statement to identify a row. Like the DELETE statement, the UPDATE statement can update several rows at once with the proper search condition, such as:
UPDATE CUSTOMERS
SET COUNTRY = 'Eire'
WHERE COUNTRY = 'Ireland'
In this case, the WHERE clause selects several rows of the CUSTOMERS table, and the value of the COUNTRY columns are modified in all of them. Conceptually, SQL processes the UPDATE statement by going through the CUSTOMERS table row by row, updating those rows for which the search condition yields a TRUE result and skipping over those for which the search condition yields a FALSE or NULL results. Because it searches the table, this form of the UPDATE statement is sometimes called a searched UPDATE statement.
Similar to the DELETE statement, the UPDATE statement in its simplest form can be used to update all rows in the target table, by omitting the WHERE clause. For example:
UPDATE CUSTOMERS
SET COUNTRY = 'Eire'
will modify all existing rows of the CUSTOMERS table by setting COUNTRY to be 'Eire' in every case. Unlike the DELETE statement, in which the WHERE clause is almost never omitted, the UPDATE statement without a WHERE clause performs a useful function. It basically performs a bulk update of the entire table.
More Advanced Queries
In order to try and withhold the database section from becoming the giant it has already become, this section will be covered briefly and to the point through example. Students are expected to know how these SQL statements are used, but not vast levels of detail in the inner workings of the statements. To demonstrate the examples below, we use the following set of tables, which we introduced earlier in the foreign key section:
Figure 6.11. Three Example Tables
Duplicate Rows
If a query includes the primary key of a table in its select list, then every row of query results will be unique. If the primary key is not included in the query results, duplicate results can occur. Eg.
SELECT CC
FROM SALES
Would result in multiple identical county code rows ('D', 'MH' etc.). If we wished this query to return each duplicate only once (let's say we wished to see what car counties were represented by our sales) we could do the following:
SELECT DISTINCT COUNTYCODE
FROM SALES
Range Test (BETWEEN)
The range test checks whether a data value lies between two specified values. It is possible to use NOT as a prefix to provide the opposite. Examples:
SELECT * FROM SALES
WHERE VALUE BETWEEN 0 AND 10000
SELECt * FROM SALES
WHERE VALUE NOT BETWEEN 0 AND 10000
Set Membership Test
The set membership test (IN) tests whether a data value matches one of a list of target values. So for example:
SELECT * FROM CUSTOMERS
WHERE COUNTRY IN ('Ireland','England')
SELECT * FROM CARS
WHERE COUNTYCODE IN ('D','MH','C','L')
Pattern Matching Test
A simple comparison test can be used to retrieve rows where the contents of a text column match some particular text. However, you can use the LIKE pattern matching test to retrieve the data based on a partial match of the customers name. The pattern is a string that may include one or more wildcard characters. So for example using the percent (%) sign as a wildcard character to match any sequence of zero or more characters:
SELECT * FROM CUSTOMERS
WHERE COUNTRY LIKE '%land'
SELECT * FROM CUSTOMERS
WHERE PHONE LIKE '+353 1 %'
Null Value Test
Sometimes it is useful to check explicitly for NULL values in a search condition and handle them directly. SQL provides a special NULL value test (IS NULL) to handle this task. So to obtain CUSTOMERS with unknown emails:
SELECT * FROM CUSTOMERS
WHERE EMAIL IS NULL
Compound Search Conditions (AND, OR, and NOT)
Simple search conditions can return a value of TRUE, FALSE or NULL. Using the rules of logic, you can combine these simple SQL search conditions to form more complex ones. As examples:
SELECT * FROM CUSTOMERS
WHERE COUNTRY='Ireland'
AND SURNAME='Smith'
SELECT * FROM CUSTOMERS
WHERE COUNTRY='Ireland'
OR EMAIL LIKE '%dcu.ie'
SELECT * FROM SALES
WHERE NOT VALUE < 10000
Combining Query Results (UNION)
Occasionally, it's convenient to combine the results of two or more queries into a single table of query results. For example, consider the situation where we have three seperate but identically structured tables, representing different users in a system (for ease of maintainance). However, we might want to obtain all of the users in the system, which we could do by:
SELECT * FROM STAFFUSERS
UNION
SELECT * FROM STUDENTUSERS
UNION
SELECT * FROM RESEARCHUSERS
These statements can be far more complicated, each containing query searches of their own. There are however, three restrictions on the use of the UNION operation:
-
The two tables must contain the same number of columns
-
The data type of each column must match the data types in the corresponding table in the UNION
-
Neither table can be sorted by the ORDER BY clause. However, the combined query results can be sorted.
Note: although this example shows the use of UNIONS, if you were making regular queries using these three tables, it would make more sense to create a View. More on these later!
Column Functions
SQL column functions take an entire column of data as its argument and procudes a single data item that summarizes the column. The following column functions are available in the standard: SUM(), AVG(), MIN(), MAX(), COUNT() and COUNT(*). The first four of these are essentially obvious. COUNT() counts the number of values in a column, and COUNT(*) counts the rows of query results (which is more commonly used). Therefore, as examples:
SELECT AVG(VALUE)
FROM SALES
SELECT COUNT(*)
FROM CUSTOMERS
WHERE COUNTRY='Ireland'
The first of these would return the average of the VALUE column in the SALES table. The second example would return the number of customers from Ireland - ie. the number of query result rows.
Grouped Queries and Group Search Conditions
Typically, column functions return a single row of data, such as an average, sum etc. So we consider/imagine a standard sales environment with salespeople and their sales for the year. Performing a simply average would result in the following:
SELECT AVG(SALES_TOTAL)
FROM COMPANYSALES
AVG(SALES_TOTAL)
----------------
€ 8532.39
However, if we wished to get the average sales for the year for each salesperson. We could use the following:
SELECT SALESPERSON_ID, AVG(SALES_TOTAL)
FROM COMPANYSALES
GROUP BY SALESPERSON_ID
SALESPERSON_ID AVG(SALES_TOTAL)
-------------- ----------------
A32 € 6512.31
A36 € 10334.92
A23 € 3932.11
B14 € 11048.19
... ....
This second query produces several summary rows - one row for each group, summarizing the orders taken by a single salesperson.
Just as the WHERE clause can be used to select and reject the individual rows that participate in a query, the HAVING clause can be used to select and reject row groups. This can be best explained by extending the first example above:
SELECT SALESPERSON_ID, AVG(SALES_TOTAL)
FROM COMPANYSALES
GROUP BY SALESPERSON_ID
HAVING SUM(SALES_TOTAL) > 100000
This works identically to the normal GROUP situation, except the additional HAVING claus eliminates any groups that don't meet its criteria. Therefore, only any groups with an total sales of greater than 100,000 euro would be displayed by this statement.
Multi-Table Queries (Joins)
Many useful queries request data from two or more tables in a database. The process of forming pairs of rows by matching the contents of related columns is called joining the tables. The resulting table (containing data from two of the original tables) is called a join between the two tables. Joins are the foundation of multi-table query processing in SQL. Let us say for example that we wish to obtain a table of information showing the registration of cars sold, the firstname and surname of the customer who purchased each car and the value of that purchase. Graphically what we want to achieve is the following:
Figure 6.12. Two Table Query Example
In order to do this, we use the following SQL:
SELECT YEAR,CC,NUM,SURNAME,FIRSTNAME,VALUE
FROM SALES, CUSTOMERS
WHERE PURCHASER=ID
The syntax of this statement is similar to those in standard select statements with two differences: firstly, the FROM clause lists two tables instead of just one. Secondly, the search condition (WHERE PURCHASER=ID) compares columns from two different tables. We call these two columns the matching columns for the two tables.
The most common multi-table queries involve two tables that have a natural parent/child relationship. In this example, the parent class in the CUSTOMERS table, with the ID as primary key and the child class is the SALES table with the PURCHASER as the foreign key.
It is possible to further add row selection criteria into multi-table joins. For example, let us consider the case where we are only interested in the above table for Cork (or 'C') registered cars. Therefore, we can add the additional criteria in the following way:
SELECT YEAR,CC,NUM,SURNAME,FIRSTNAME,VALUE
FROM SALES, CUSTOMERS
WHERE PURCHASER=ID
AND CC='C'
When using joins it is frequently important to use table aliases. Consider the situation where we are joining from multiple tables, with identical column names (legal!), or accessing identically-named tables from different user schemas. Therefore, we use table aliases, which we show below in both their longer and shorter forms:
SELECT STAFF.SURNAME, STAFF.FIRSTNAME,
ANOTHERUSER.MODULES.CODE, ANOTHERUSER.MODULES.NAME
FROM STAFF,ANOTHERUSER.MODULES
WHERE STAFF.ID=ANOTHERUSER.MODULES.OWNERID
A much more effective way of writing this is:
SELECT S.SURNAME, S.FIRSTNAME, M.CODE, M.NAME
FROM STAFF S,ANOTHERUSER.MODULES M
WHERE S.ID=M.OWNERID
These types of joins are otherwise known as inner joins. A second type of join, known as an outer join exists, but is not covered in this module for the sake of berevity. For those of you who are interested, essentially it is a way of handling unmatched rows resulting from a join, and is known as an "information-preserving" join. However, only inner joins will be covered in this module as there is already copious SQL to keep you busy!
Data Integrity
The term data integrity refers to the correctness and completeness of the data in a database. When the contents of a database are modified with the INSERT, DELETE or UPDATE statements, the integrity of the stored data can be lost in many different ways. As examples: an item could be ordered from a company, but the available stock might not be decremented or a power failure could cause changes to a database.
To preserve the consistancy and correctness of its stored data, a relational DBMS typically imposes one or more data integrity constraints. These constraints restrict the data values that can be inserted into the database or created by a database update. Several different types of data integrity constraints are commonly found in relational databases, including:
-
Required data - some columns in a database must contain a valid data value in every row, they are not allowed to contain missing or NULL values.
-
Entity Checking - the primary key of a table must contain a unique value in each row, which is different from the values in all other rows. Duplicate values are illegal, because they wouldn't allow the database to distinguish one product from another. The DBMS can be asked to enforce this unique values constraint.
-
Referential Integrity - A foreign key in a relational database links each row in the child table containing the foreign key to the row of the parent table containing the matching primary key value. The DBMS can be asked to enforce this foreign key/primary key constraint.
-
Consistency - Frequently real-world transactions require multiple updates to the database. Considering the previous example, if an item is ordered an entry should be added to the ORDERS table, and the level of stock decreased in the STOCK table. In this example, both the INSERT and UPDATE must take place in order for the database to remain in a consistent and correct state. The DBMS can be asked to enforce this type of consistency rule or to support applications that implement such rules.
The ANSI/ISO SQL standard specifies some of the simpler data integrity constraints. For example, the required data constraint is supported by the ANSI/ISO standard and implemented in a uniform way across almost all commerical SQL products. More complex constraints, such as business rule constraints, are not specified by the ANSI/ISO standard, and there is a wide variation in the techniques and SQL syntax used to support them.
Required Data
The simplest data integrity constraint requires that a column contain a non-NULL value. The ANSI/ISO standard and most commercial SQL products support this constraint by allowing you to declare that a column is NOT NULL when the table containing the column is first created. When a column is declared NOT NULL, the DBMS enforces the constraint by ensuring the following:
-
Every INSERT statement that adds a new row or rows to the table must specify a non-NULL data value for the column. An attempt to insert a row containing a NULL value results in an error.
-
Every UPDATE statement that updates the column must assign it a non-NULL data value.
One disadvantage of the NOT NULL constraint is that it must usually be specified when a table is first created. Usually this disadvantage is not a major one, because it's obvious when the table is first created which columns should allow NULLs and which should not.
Entity Integrity
A table's primary key must have a unique value for each row of the table, or the database will lose its integrity as a model of the outside world. For this reason the requirement that primary keys have unique values is called the entity integrity constraint. When a primary key is specified for a table, the DBMS automatically checks the uniqueness of the primary key values for every INSERT and UPDATE statement performed on the table. An attempt to insert a row with a duplicate primary key, or change a row to have an identical primary key to another, will fail with an error message.
It is sometimes appropriate to require a column that is not the primary key of a table to contain a unique value in every row. For example, let's suppose that we want to require all customers to have an individual email address - typically it would not make sense that two seperate customers would have the same email address. This can be achieved by imposing a uniqueness constraint on the EMAIL column. The DBMS enforces the uniqueness constraint in the same way that it enforces the primary key constraint. Any attempt to insert or update a row in the table that violates the uniqueness constraint will fail.
One issue arises with NULL values when they occur in the primary key or unique column. Because of the NULL value, the DBMS cannot conclusively decide whether the primary key does or does not duplicate one that is already in the table. For this reason, the SQL standard requires that every column that is a part of a primary key must be declared NOT NULL (hence there is typically no need to specify this). The same restrictions applies for every column that is named in a uniqueness constraint.
Referential Integrity
In order to explain the concept of referential integrity we consider the example provided in the foreign key section. In this example, we considered two tables, SALES and CUSTOMERS. The PURCHASER column is a foreign key for the CUSTOMERS table. Rather than force you to search back through the notes, below is the diagram showing the two tables and the parent/child relationship:
Figure 6.13. Foreign Key Example Revisited
If we tried to insert a new data entry into the SALES table using:
INSERT INTO SALES (SALE_ID,YEAR,CC,NUM,VALUE,SALESPERSON,PURCHASER)
VALUES (742737, 98, "D", "8432", 11200, "Alan Malone", 2167)
Technically, there is nothing wrong with this INSERT statement (ie. it is correctly syntaxed). However, where the problem occurs, is that there is no entry in the CUSTOMERS table with ID = 2167. The newly inserted row clearly "breaks" the parent/child relationship between the SALES and CUSTOMERS table. In fact, in this example, it is more likely that the data entry was performed incorrectly and the intention was to have ID = 2177.
Therefore, because of the existance of the foreign key relationship, every legal value in the PURCHASER column in SALES should be forced to match some value that appears in the ID column of the CUSTOMERS table. This rule is known as a referential integrity constraint. It ensures the integrity of the parent/child relationships created by foreign keys and primary keys.
There are four types of database updates that can corrupt the referential integrity of the parent/child relationships in a database.
-
Inserting a new child row - when an INSERT statement adds a new row to the child table, its foreign key values must match one of the primary key values in the parent table. If the foreign key value does not match any primary key, inserting the row would corrupt the database, because there would be a child without a parent (an "orphan"). As per the example above.
-
Updating the foreign key in a child row - Slightly different form, which relates to the case where the foreign key is modified by an UPDATE statement, then the new value must match a primary key value in the parent table.
-
Deleting a parent row - if a row of the parent table (eg. CUSTOMERS) that has one or more children (eg. in the SALES table) is deleted, the child rows will become orphans. The foreign key values in these rows will no longer match any primary key value in the parent table. Note, that deleting a row from a child table never poses a problem, the parent of this row simply has one less child after the deletion.
-
Updating the primary key in a parent row - Slightly different form, which relates to when the primary key of a row in the parent table is modified, all the current children of that row become orphans because their foreign keys no longer match a primary key value.
The referential integrity features of ANSI/ISO SQL standard handle each of these four situations. In order to handle the first two situations, the values of the foreign key columns are checked before the INSERT or UPDATE statements are permitted. The SQL2 standard expanded a set of rules known as delete and update rules that cover both deleting of parent rows and updating of primary keys.
Delete and Update Rules
The SQL2 standard allows you to specify an associated delete rule and an associated update rule. The delete rule tells the DBMS what to do when the user tries to delete a row of the parent table. These four delete rules can be specified:
-
RESTRICT / NO ACTION - this delete rule prevents you from deleting a row from the parent table if the row has any children. The SQL2 standard somewhat confusingly, actually calls the RESTRICT rule, NO ACTION.
-
CASCADE - tells the DBMS that when a parent row is deleted, all of its child rows should also be automatically deleted from the child table.
-
SET NULL - tells the DBMS that when a parent row is deleted, the foreign key values in all of its child rows should automatically be set to NULL (unknown).
-
SET DEFAULT - tells the DBMS that when a parent row is deleted, the foreign key values in all of its child rows should automatically be set to the default value for that particular column.
Just as the delete rule tells the DBMS what to do when a user tries to delete a row of the parent table, the update rule tells the DBMS what to do when a user tries to update the value of one of the primary key columns in the parent table. Again there are four possibilities:
-
RESTRICT / NO ACTION - this rule prevents you from updating the primary key of a row in the parent table if that row has any The SQL2 standard somewhat confusingly, actually calls the RESTRICT rule, NO ACTION.
-
CASCADE - tells the DBMS that when a primary key value is changed in a parent row, the corresponding foreign key value in all of its child rows should automatically be changed in the child table, to match the new primary key.
-
SET NULL - tells the DBMS that when a primary key value in a parent row is updated, the foreign key values in all of its child rows should automatically be set to NULL (unknown).
-
SET DEFAULT - tells the DBMS that when a primary key value in a parent row is updated, the foreign key values in all of its child rows should automatically be set to the default value for that particular column.
The cascade rule, can be a "multi-level" rule. It is possible that when deleting a parent row, it results in the deletion of one or more rows in the child table. However, this child table could in turn be a parent of its own child table. In this situation the deletion gets cascaded down the this child table in turn. This would be an example of two levels of cascade rules, but is not limited to the number of levels it could cascade.
Transactions (Consistency)
The last form of data integrity constraint we are dealing with is that of consistency. Many real-world transactions cause multiple updates to a database and all of these updates must occur in order for the database to remain in a consistent and correct state. DBMS systems handle these situations using transactions. A transaction is a sequence of one or more SQL statements that together form a logical unit of work. The sQL statements that form the transaction are typically closely related and perform interdependent actions. Each statement in the transaction performs some part of the task, but all of them are required to complete the task. Grouping the statements as a single transaction tells the DBMS that the entire statement sequence should be executed atomically - all of these statements must be completed for th database to be in a consistent state.
SQL supports database transactions through two SQL transaction processing statements:
-
COMMIT - This statement signals the successful end of a transaction. It tells the DBMS that the transaction is now complete.
-
ROLLBACK - This statement signals the unsuccessful end of a transaction. It tells the DBMS that the user does not want to complete the transaction;instead the DBMS should back out any changes made to the database during the transaction. In effect, the DBMS restores the database to its state before the transaction began.
As an example, let us consider the following table structure:
Figure 6.15. Example Sales System Tables
If an order is placed, we would want to:
- Add the order entry to the ORDERS table
- Decrement the relevant STOCK entry by the quantity ordered and finally
- Increment the sales total for the sales representative in the SALESREPS table
So now, consider the following diagram which shows some of the possible situations which may occur:
Figure 6.16. Transactions Example
As may be seen, all three SQL statements must be completed for the transaction to be deemed succesful and for database consistency to be retained. If a failure occurs at any stage, then the database must be "rolled back" to its initial state before the transaction begun.
The SQL2 standard defines the SQL statement syntaxes as being COMMIT WORK and ROLLBACK WORK or simply COMMIT and ROLLBACK.Other popular DBMSs have their own syntax such as BEGIN TRANSACTION,COMMIT TRANSACTION and ROLLBACK TRANSACTIONS. Each DBMS regardless of the syntax operates transactions in much the same manner.
Data Definition Language
Most database users do not have to worry about creating a database; they use interactive SQL to access a database of corporate information or access a database that has been created by someone else. In a typical corporate database, for example, the database administrator may give you permission to retrieve and perhaps to update the stored data. However, the administrator will not allow you to create new databases or to modify the structure of the existing tables.
Unfortunately for you, you will also be expected to know how to create databases and define their structure!
The SELECT, INSERT, DELETE, UPDATE, COMMIT and ROLLBACK statements (we will deal with the last two in the transactions section) are all concerned with manipulating the data in a database. These DML statements can modify the data stored in the database, but they cannot change its structure. None of these statements creates or deletes tables or columns for example. Changes to the structure of the database are handled by a different set of SQL statements, usually called the SQL Data Definition Language, or DDL. Using DDL statements it is possible to:
-
Define and create a new table
-
Remove a table that is no longer needed
-
Change the definition of an existing table
-
Define a virtual table (view) of data
-
Establish security controls for a database
-
Build an index to make table access faster
-
Control the physical storage of data by the DBMS
The core of the Data Definition Language is based on three SQL verbs:
-
CREATE, which defines and creates a database object
-
DROP, which removes an existing database object
-
ALTER, which changes the definition of a database object
In all major SQL-based DBMS products, these three DDL verbs can be used while the DBMS is running - the database structure is dynamic. The DBMS can be creating, dropping or changing the definition of tables in the database, for example, while it is simultaneously providing access to the database for its users. This is a major advantage of SQL and relational databases over earlier systems, where the DBMS had to be stopped before one could change the structure of the database. It means that a relational database can grow and change easily over time. Production use of a database can continue while new tables and applications are added.
CREATE TABLE Statement
The CREATE TABLE statement, defines a new table in the database and prepares it to accept data. The various clauses of the statement specify the elements of the table definition. In practice, creating a table is relatively straightforward. When you execute a CREATE TABLE statement, you become the owner of the newly created table, which is given the name specified in the statement. The table name must be a legal SQL name, and it must not conflict with the name of one of your existing tables (for that user). The newly created table is empty, but the DBMS prepares it to accept data added with the INSERT statement.
The columns of the newly created table are defined in the body of the CREATE TABLE statement. The column definitions appear in a comma-seperated list enclosed in parentheses. The order of the column definitions determines the left-to-right order of the columns in the table. Each column definition specifies:
-
The column name, which is used to refer to the column in SQL statements. Every column in the table must have a unique name, but the names may duplicate those of columns in other tables.
-
The data type of the column, identifying the kind of data that the column stores. Some data types, such as VARCHAR and DECIMAL, require additional information, such as the length or number of decimal places in the data. This additional information is enclosed in parentheses following the keyword that specifies the data type.
-
Whether the column contains required data. The NOT NULL clause prevents NULL values from appearing in the column; otherwise, NULL values are allowed.
-
An optional default value for the column. The DBMS uses this value when an INSERT statement for the table does not specify a value for the column.
The CREATE TABLE statement for a given table can vary slightly from one DBMS brand to another, because each DBMS supports its own set of data types and uses its own keywords to identify them in the column definitions. Here is the sample CREATE TABLE statement used to create the CUSTOMERS table:
CREATE TABLE CUSTOMERS
(ID INTEGER NOT NULL UNIQUE,
SURNAME VARCHAR(30) NOT NULL,
FIRSTNAME VARCHAR(30) NOT NULL,
EMAIL VARCHAR(80),
COUNTRY VARCHAR(30) DEFAULT 'Ireland',
PHONE VARCHAR(25))
Let us also consider the example of our SALES table (from the previous section).
CREATE TABLE SALES
(SALE_ID INTEGER,
YEAR INTEGER NOT NULL,
CC VARCHAR(2) NOT NULL,
NUM INTEGER NOT NULL,
VALUE INTEGER NOT NULL,
SALESPERSON VARCHAR(100),
PURCHASER INTEGER NOT NULL)
The definition of each column within a table tells the DBMS whether or not the data for the column is allowed to be "missing" - that is, whether the column is allowed to have a NULL value. In most DBMS brands, and in the SQL standard, the default is to allow missing data for a column. If a column must contain a legal data value for every row of a table, then its definition must include the NOT NULL clause. Sybase and Microsoft SQL Server use the opposite convention, assuming that NULL values are not allowed unless the column is explicitly declared NULL or the default "null mode" defined for the database is set to allow NULLs by default.
As seen in the first CREATE TABLE example above, it is possible to define "default" values for columns. If a column has a default value, it is specified within the column definition. Even if a column is defined as NOT NULL but has a default set, its omission in an INSERT statement would not cause an issue, since it would simply default to the chosen value.
In addition to defining the columns of a table, the CREATE TABLE statement identifies the table's primary key and the table's relationships to other tables in the database. The PRIMARY KEY and FOREIGN KEY clauses handle these functions. The PRIMARY KEY clause specifies the column or columns that form the primary key for the table. The FOREIGN KEY clause specifies a foreign key in the table and the relationship that it creates to another (parent) table in the database. The FOREIGN KEY clause specifies:
-
The column or columns that form the primary key, all of which are columns of the table being created.
-
The table that is referenced by the foreign key.
-
An optional name for the relationship.
-
How the DBMS should treat a NULL value in one or more columns of the foreign key.
-
An optional delete rule for the relationship
-
An optional update rule for the relationship
So now to more accurately define our example tables we need to define our primary and foreign key definitions for the two tables. The copy of the diagram should help in understanding the definition:
Figure 6.18. Foreign Key Example Revisited
So now our definition is more precisely:
CREATE TABLE CUSTOMERS
(ID INTEGER,
SURNAME VARCHAR(30) NOT NULL,
FIRSTNAME VARCHAR(30) NOT NULL,
EMAIL VARCHAR(80),
COUNTRY VARCHAR(30) DEFAULT 'Ireland',
PHONE VARCHAR(25),
PRIMARY KEY (ID))
CREATE TABLE SALES
(SALE_ID INTEGER,
YEAR VALID_YEAR NOT NULL,
CC VARCHAR(2) NOT NULL,
NUM INTEGER NOT NULL,
VALUE INTEGER NOT NULL,
SALESPERSON VARCHAR(100),
PURCHASER INTEGER,
PRIMARY KEY (SALE_ID),
CONSTRAINT CONSTNAME1
FOREIGN KEY (PURCHASER)
REFERENCES CUSTOMERS
ON DELETE SET NULL)
So now, we notice that the primary key is easily set in the CUSTOMERS table by simply adding in a single extra line. The ID column need not be exlusively set as being UNIQUE now, as it being the primary key ensures this. In the SALES table, the primary key is defined as being our SALE_ID and defined as such. Additionally, we define a Foreign Key constraint (CONSTNAME1 is our defined name for this constraint) on the PURCHASER column which references the CUSTOMERS table. We have additionally defined the delete rule to SET NULL, which means that if the CUSTOMER row is deleted, then any relevant entries in the PURCHASER column of the SALES table will be set to be NULL (or unknown). It is probably a good time now to view the syntax diagram for CREATE table shown above - it should be somewhat more understandable at this stage - hopefully!
Note: To define a multiple column primary key, such as that for the CARS table, you would use "PRIMARY KEY (YEAR,COUNTYCODE,CNUMBER)".
Full Working Example of Customers,Cars and Sales
If anyone is interested in recreating the main data tables we have used to date, this code will generate the appropriate tables, with their correct relationships and populate them with appropriate data.
CREATE TABLE CUSTOMERS
(ID INTEGER,
SURNAME VARCHAR(30) NOT NULL,
FIRSTNAME VARCHAR(30) NOT NULL,
EMAIL VARCHAR(80),
COUNTRY VARCHAR(30) DEFAULT 'Ireland',
PHONE VARCHAR(25),
PRIMARY KEY (ID))
insert into customers values (2174,'Smith','Peter','smithp@lucent.com','Ireland', '+353 1 4382321');
insert into customers values (2177,'Murphy','Joe','joe.murphy@microsoft.com','Ireland', '+353 97 34312');
insert into customers values (2099,'Jones','Mary','mary.jones@hotmail.com','United States', '+555 33 4444444');
insert into customers values (2200,'McFadden','Jack','jack.mcfadden@dcu.ie','Scotland', '+666 11 222222');
insert into customers values (2311,'Corcoran','Sheila','scorco@ireland.com','Ireland', '+353 1 8903242');
insert into customers values (1983,'Ryan','Ann','ann.ryan@yahoo.com','England', '+777 1 3234343');
CREATE TABLE CARS
(YEAR INTEGER,
COUNTYCODE VARCHAR(2),
CNUMBER INTEGER,
MODEL VARCHAR(80),
COLOUR VARCHAR(30),
TRADEPRICE INTEGER,
PRIMARY KEY (YEAR,COUNTYCODE,CNUMBER))
insert into cars values (0,'D',7733,'Honda Civic','Silver',10000);
insert into cars values (0,'L',1733,'Volkswagon Bora','Blue',9400);
insert into cars values (99,'MH',90430,'Toyota Avensis','Green',9600);
insert into cars values (90,'C', 15432, 'Toyota Corolla', 'Green', 5600);
insert into cars values (95,'KY', 1061, 'Ford Fiesta', 'Grey', 3900);
insert into cars values (0,'D', 32909, 'Ford Focus', 'Black', 10300);
insert into cars values (2,'DL', 6328, 'Peugot 306', 'Red', 12000);
insert into cars values (2,'C', 4423, 'Honda Civic', 'Silver', 14500);
insert into cars values (3, 'MH', 7733, 'Suzuki Swift', 'White', 8100);
CREATE TABLE SALES
(SALE_ID INTEGER,
YEAR INTEGER,
CC VARCHAR(2) NOT NULL,
NUM INTEGER NOT NULL,
VALUE INTEGER NOT NULL,
SALESPERSON VARCHAR(100),
PURCHASER INTEGER,
PRIMARY KEY (SALE_ID),
CONSTRAINT CONSTNAME1
FOREIGN KEY (PURCHASER)
REFERENCES CUSTOMERS
ON DELETE SET NULL,
CONSTRAINT CONSTNAME2
FOREIGN KEY (YEAR,CC,NUM)
REFERENCES CARS
ON DELETE SET NULL)
insert into sales values (844122, 90, 'C', 15432, 6100, 'Alan Malone', 1983);
insert into sales values (584329, 0, 'D', 7733, 12400, 'Lisa Smith', 2200);
insert into sales values (993321, 3, 'MH', 7733, 9500, 'Alan Malone', 2099);
insert into sales values (666481, 0, 'D', 32909, 12500, 'Roger Barry', 2177);
insert into sales values (598020, 2, 'C', 4423, 15800, 'Alan Malone', 2174);
insert into sales values (384739, 99, 'MH', 90430, 11200, 'Roger Barry', 2311);
insert into sales values (348993, 0, 'L', 1733, 10000, 'Lisa Smith', 2177);
insert into sales values (667781, 2, 'DL', 6328, 12400, 'Roger Barry', 1983);
CREATE TABLE Example 2
As another example, let us again consider the three tables introduced in the triggers section. The diagram is shown below. We could define the three tables in the following way:
Figure 6.19. Example Sales System Tables
CREATE TABLE STOCK
(ID VARCHAR(10),
NAME VARCHAR(1000) NOT NULL,
QTY INTEGER DEFAULT 0,
PRICE INTEGER NOT NULL,
PRIMARY KEY (ID))
CREATE TABLE SALESREPS
(IDENT INTEGER,
NAME VARCHAR(50) NOT NULL,
SALES INTEGER NOT NULL,
PRIMARY KEY (IDENT),
UNIQUE (NAME))
CREATE TABLE ORDERS
(ORDERID INTEGER,
ITEMID VARCHAR(10) NOT NULL,
QTY INTEGER DEFAULT 1,
AMOUNT INTEGER NOT NULL,
REP_ID INTEGER,
PRIMARY KEY (ORDERID),
CONSTRAINT CNST1
FOREIGN KEY (ITEMID)
REFERENCES STOCK
ON DELETE CASCADE,
CONSTRAINT CNST2
FOREIGN KEY (REP_ID)
REFERENCES SALESREPS
ON DELETE SET NULL)
In this example, we can see how the ORDERS table has two seperate foreign keys constraints. The first of these results in any entries in the ORDERS table being deleted if the respective ITEMID is removed from the STOCK table. The latter results in the REP_ID being set to NULL if the respective entry is deleted from the SALESREPS table.
In the example, we also show the usage of the remaining two constraint types: a uniqueness constraint and a check-constraint. The uniqueness constraint is defined by UNIQUE (NAME) but can generally be rewritten as follows:
NAME VARCHAR(50) NOT NULL UNIQUE,
The check constraint we defined with the name VALID_QTY for the STOCK table. The DBMS simply ensures that this constraint is met, whenever data is later INSERTed or UPDATEd.
Database Schema Diagrams:
A database model provides a data model for the structure of a database and how the information is organised. These are commonly used to give an overview regarding what data is stored in various tables and the relationships between the various tables. For example, we can show the following diagram for the primary tables in this chapter.
As can be seen in the diagram, we can determine exactly what information is stored and can see the highlighting of the primary keys and the relationships between the tables.