14 Chapter 1: Oracle Server Technologies and the Relational Paradigm The relational paradigm is highly efficient in many respects for many types of data, but it is not appropriate for all applications. As a general rule, a relational analysis should be the first approach taken when modeling a system. Only if it proves inappropriate should one resort to nonrelational structures. Applications where the relational model has proven highly effective include virtually all Online Transaction Processing (OLTP) systems and Decision Support Systems (DSS). The relational paradigm can be demanding in its hardware requirements and in the skill needed to develop applications around it, but if the data fits, it has proved to be the most versatile model. There can be, for example, problems caused by the need to maintain the indexes that maintain the links between tables and the space requirements of maintaining multiple copies of the indexed data in the indexes themselves and in the tables in which the columns reside. Nonetheless, relational design is in most circumstances the optimal model. A number of software publishers have produced database management systems that conform (with varying degrees of accuracy) to the relational paradigm; Oracle is only one. IBM was perhaps the first company to commit major resources to it, but their product (which later developed into DB2) was not ported to non-IBM platforms for many years. Microsoft’s SQL Server is another relational database that has been limited by the platforms on which it runs. Oracle databases, by contrast, have always been ported to every major platform from the first release. It may be this that gave Oracle the edge in the RDBMS market place. A note on terminology: confusion can arise when discussing relational databases with people used to working with Microsoft products. SQL is a language and SQL Server is a database, but in the Microsoft world, the term SQL is often used to refer to either. Data Normalization The process of modeling data into relational tables is known as normalization and can be studied at university level for years. There are commonly said to be three levels of normalization: the first, second, and third normal forms. There are higher levels of normalization: fourth and fifth normal forms are well defined, but any normal data analyst (and certainly any normal human being) will not need to be concerned with them. It is possible for a SQL application to address un-normalized data, but this will usually be inefficient as that is not what the language is designed to do. In most cases, data stored in a relational database and accessed with SQL should be normalized to the third normal form. Understand Relational Structures 15 There are often several possible normalized models for an application. It is important to use the most appropriate—if the systems analyst gets this wrong, the implications can be serious for performance, storage needs, and development effort. As an example of normalization, consider an un-normalized table called BOOKS that stores details of books, authors, and publishers, using the ISBN number as the primary key. A primary key is the one attribute (or attributes) that can uniquely identify a record. These are two entries: ISBN Title Authors Publisher 12345 Oracle11g OCP SQLFundamentals 1 ExamGuide John Watson, Roopesh Ramklass McGraw-Hill, Spear Street, San Francisco, CA 94105 67890 Oracle11g New Features ExamGuide Sam Alapati McGraw-Hill, Spear Street, San Francisco, CA 94105 Storing the data in this table gives rise to several anomalies. First, here is the insertion anomaly: it is impossible to enter details of authors who are not yet SCENARIO & SOLUTION Your organization is designing a new application. Who should be involved? Everyone! The project team must involve business analysts (who model the business processes), systems analysts (who model the data), system designers (who decide how to implement the models), developers (you), database administrators, system administrators, and (most importantly) end users. It is possible that relational structures may not be suitable for a particular application. How can this be determined, and what should be done next? Can Oracle help? Attempt to normalize the data into two-dimensional tables, linked with one-to-many relationships. If this really cannot be done, consider other paradigms. Oracle may well be able to help. For instance, maps and other geographical data really don’t work relationally. Neither does text data (such as word processing documents). But the Spatial and Text database options can be used for these purposes. There is also the possibility of using user-defined objects to store nontabular data. 16 Chapter 1: Oracle Server Technologies and the Relational Paradigm published, because there will be no ISBN number under which to store them. Second, a book cannot be deleted without losing the details of the publisher: a deletion anomaly. Third, if a publisher’s address changes, it will be necessary to update the rows for every book he has published: an update anomaly. Furthermore, it will be very difficult to identify every book written by one author. The fact that a book may have several authors means that the “author” field must be multivalued, and a search will have to search all the values. Related to this is the problem of having to restructure the table of a book that comes along with more authors than the original design can handle. Also, the storage is very inefficient due to replication of address details across rows, and the possibility of error as this data is repeatedly entered is high. Normalization should solve all these issues. The first normal form is to remove the repeating groups, in this case, the multiple authors: pull them out into a separate table called AUTHORS. The data structures will now look like the following. Two rows in the BOOKS table: ISBN TITLE PUBLISHER 12345 Oracle11g OCP SQLFundamentals 1 ExamGuide McGraw-Hill, Spear Street, San Francisco, California 67890 Oracle11g New Features ExamGuide McGraw-Hill, Spear Street, San Francisco, California And three rows in the AUTHOR table: NAME ISBN John Watson 12345 Roopesh Ramklass 12345 Sam Alapati 67890 The one row in the BOOKS table is now linked to two rows in the AUTHORS table. This solves the insertion anomaly (there is no reason not to insert as many unpublished authors as necessary), the retrieval problem of identifying all the books by one author (one can search the AUTHORS table on just one name) and the problem of a fixed maximum number of authors for any one book (simply insert as many or as few AUTHORS as are needed). Understand Relational Structures 17 This is the first normal form: no repeating groups. The second normal form removes columns from the table that are not dependent on the primary key. In this example, that is the publisher’s address details: these are dependent on the publisher, not the ISBN. The BOOKS table and a new PUBLISHERS table will then look like this: BOOKS ISBN TITLE PUBLISHER 12345 Oracle11g OCP SQLFundamentals 1 ExamGuide McGraw-Hill 67890 Oracle11g New Features ExamGuide McGraw-Hill PUBLISHERS PUBLISHER STREET CITY STATE McGraw-Hill Spear Street San Francisco California All the books published by one publisher will now point to a single record in PUBLISHERS. This solves the problem of storing the address many times, and also solves the consequent update anomalies and the data consistency errors caused by inaccurate multiple entries. Third normal form removes all columns that are interdependent. In the PUBLISHERS table, this means the address columns: the street exists in only one city, and the city can be in only one state; one column should do, not three. This could be achieved by adding an address code, pointing to a separate address table: PUBLISHERS PUBLISHER ADDRESS CODE McGraw-Hill 123 ADDRESSES ADDRESS CODE STREET CITY STATE 123 Spear Street San Francisco California 18 Chapter 1: Oracle Server Technologies and the Relational Paradigm One characteristic of normalized data that should be emphasized now is the use of primary keys and foreign keys. A primary key is the unique identifier of a row in a table, either one column or a concatenation of several columns (known as a composite key). Every table should have a primary key defined. This is a requirement of the relational paradigm. Note that the Oracledatabase deviates from this standard: it is possible to define tables without a primary key—though it is usually not a good idea, and some other RDBMSs do not permit this. A foreign key is a column (or a concatenation of several columns) that can be used to identify a related row in another table. A foreign key in one table will match a primary key in another table. This is the basis of the many-to-one relationship. A many-to-one relationship is a connection between two tables, where many rows in one table refer to a single row in another table. This is sometimes called a parent- child relationship: one parent can have many children. In the BOOKS example so far, the keys are as follows: TABLE KEYS BOOKS Primary key: ISBN Foreign key: Publisher AUTHORS Primary key: Name + ISBN Foreign key: ISBN PUBLISHERS Primary key: Publisher Foreign key: Address code ADDRESSES Primary key: Address code These keys define relationships such as that one book can have several authors. There are various standards for documenting normalized data structures, developed by different organizations as structured formal methods. Generally speaking, it really doesn’t matter which method one uses as long as everyone reading the documents understands it. Part of the documentation will always include a listing of the attributes that make up each entity (also known as the columns that make up each table) and an entity-relationship diagram representing graphically the foreign to primary key connections. A widely used standard is as follows: ■ Primary key columns identified with a hash (#) ■ Foreign key columns identified with a back slash (\) ■ Mandatory columns (those that cannot be left empty) with an asterisk (*) ■ Optional columns with a lowercase “o” Understand Relational Structures 19 The BOOKS tables can now be described as follows: Table BOOKS #* ISBN Primary key , required o Title Optional \* Publisher Foreign key , link to the PUBLISHERS table Table AUTHORS #* Name Together with the ISBN, the primary key #\o ISBN Part of the primary key, and a foreign key to the BOOKS table. Optional, because some authors may not yet be published. Table PUBLISHERS #* Publisher Primary key \o Address code Foreign key, link to the ADDRESSES table Table ADDRESSES #* Address code Primary key o Street o City o State The second necessary part of documenting the normalized data model is the entity-relationship diagram. This represents the connections between the tables graphically. There are different standards for these; Figure 1-3 shows the entity- relationship diagram for the BOOKS example using a very simple notation limited to showing the direction of the one-to-many relationships, using what are often called crow’s feet to indicate which sides of the relationship are the many and the one. It can be seen that one BOOK can have multiple AUTHORS, one PUBLISHER can publish many books. Note that the diagram also states that both AUTHORS and PUBLISHERS have exactly one ADDRESS. More complex notations can be used to show whether the link is required or optional, information which will match that given in the table columns listed previously. AUTHORS BOOKS PUBLISHERS ADDRESSES FIGURE 1-3 An entity- relationship diagram 20 Chapter 1: Oracle Server Technologies and the Relational Paradigm This is a very simple example of normalization, and is not in fact complete. If one author were to write several books, this would require multiple values in the ISBN column of the AUTHORS table. That would be a repeating group, which would have to be removed because repeating groups break the rule for first normal form. A major exercise with data normalization is ensuring that the structures can handle all possibilities. A table in a real-world application may have hundreds of columns and dozens of foreign keys. The standards for notation vary across organizations—the example given is very basic. Entity-relationship diagrams for applications with hundreds or thousands of entities can be challenging to interpret. EXERCISE 1-2 Perform an Extended Relational Analysis This is a paper-based exercise, with no specific solution. Consider the situation where one author can write many books, and one book can have many authors. This is a many-to-many relationship, which cannot be fit into the relational model. Sketch out data structures that demonstrate the problem, and develop another structure that would solve it. Following is a possible solution. The un-normalized table of books with many authors could look like this: BOOKS #* Title \* Authors There could be two rows in this table: Title Authors 11gSQLFundamentalsExamGuide John Watson, Roopesh Ramklass 10g DBA ExamGuide John Watson, Damir Bersinic And that of authors could look like this: AUTHORS #* Name \* Books Understand Relational Structures 21 There could be three rows in this table: Name Books John Watson 11gSQLFundamentalsExam Guide, 10g DBA ExamGuide Roopesh Ramklass 11gSQLFundamentalsExamGuide Damir Bersinic 10g DBA ExamGuide This many-to-many relationship needs to be resolved into many-to-one relationships by taking the repeating groups out of the two tables and storing them in a separate books-per-author table. It will also become necessary to introduce some codes, such as ISBNs to identify books and social security numbers to identify authors. This is a possible normalized structure: BOOKS #* ISBN o Title AUTHORS #* SSNO o Name BOOKAUTHORS #\* ISBN Part of the primary key and a foreign key to BOOKS #\* SSNO Part of the primary key and a foreign key to AUTHORS The rows in these normalized tables would be as follows: BOOKS ISBN Title 12345 11gSQLFundamentalsExamGuide 67890 DBA ExamGuide 22 Chapter 1: Oracle Server Technologies and the Relational Paradigm AUTHORS SSNO Name 11111 John Watson 22222 Damir Bersinic 33333 Roopesh Ramklass BOOKAUTHORS ISBN SSNO 12345 11111 12345 22222 67890 11111 67890 33333 Figure 1-4 shows the entity-relationship diagram for the original un-normalized structure, followed by the normalized structure. As a further exercise, consider the possibility that one publisher could have offices at several addresses, and one address could have offices for several companies. Authors will also have addresses, and this connection too needs to be defined. These enhancements can be added to the example worked through previously. FIGURE 1-4 Un-normalized and normalized data models First, an un-normalized many-to-many relationship: The many-to-many relationship resolved, by interposing another entity: BOOKS BOOKS AUTHORS AUTHORS BOOKAUTHORS Summarize the SQL Language 23 CERTIFICATION OBJECTIVE 1.03 Summarize the SQL Language SQL is defined, developed, and controlled by international bodies. Oracle Corporation does not have to conform to the SQL standard but chooses to do so. The language itself can be thought as being very simple (there are only 16 commands), but in practice SQL coding can be phenomenally complicated. That is why a whole book is needed to cover the bare fundamentals. SQL Standards Structured Query Language (SQL) was first invented by an IBM research group in the ’70s, but in fact Oracle Corporation (then trading as Relational Software, Inc.) claims to have beaten IBM to market by a few weeks with the first commercial implementation: Oracle 2, released in 1979. Since then the language has evolved enormously and is no longer driven by any one organization. SQL is now an international standard. It is managed by committees from ISO and ANSI. ISO is the Organisation Internationale de Normalisation, based in Geneva; ANSI is the American National Standards Institute, based in Washington, DC. The two bodies cooperate, and their SQL standards are identical. Earlier releases of the Oracledatabase used an implementation of SQL that had some significant deviations from the standard. This was not because Oracle was being deliberately different: it was usually because Oracle implemented features that were ahead of the standard, and when the standard caught up, it used different syntax. An example is the outer join (detailed in Chapter 8), which Oracle implemented long before standard SQL; when standard SQL introduced an outer join, Oracle added support for the new join syntax while retaining support for its own proprietary syntax. Oracle Corporation ensures future compliance by inserting personnel onto the various ISO and ANSI committees and is now assisting with driving the SQL standard forward. SQL Commands These are the 16 SQL commands, separated into commonly used groups: . Fundamentals Exam Guide, 10g DBA Exam Guide Roopesh Ramklass 11g SQL Fundamentals Exam Guide Damir Bersinic 10g DBA Exam Guide This many-to-many relationship needs. like this: BOOKS ISBN TITLE PUBLISHER 12345 Oracle 11g OCP SQL Fundamentals 1 Exam Guide McGraw-Hill 67890 Oracle 11g New Features Exam Guide McGraw-Hill