1. Trang chủ
  2. » Công Nghệ Thông Tin

Beginning Database Design- P10 potx

20 269 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 20
Dung lượng 757,17 KB

Nội dung

It makes perfect sense to begin by demonstrating denormalization from the highest Normal Form downward. Denormalizing Beyond 3NF Figure 6-1 shows reversal of the normalization processing applied in Figure 4-28, Figure 4-29, and Figure 4-30. Removing nullable fields to separate tables is a common method of saving space, particularly in databases with fixed record lengths. Many modern databases allow variable record lengths. If variable length records are allowed, removal of NULL valued fields is pointless because the space saved is either none, or completely negligible. Figure 6-1: Denormalizing NULL valued table fields. Customer customer_id customer_name address phone fax email exchange ticker balance_outstanding last_date_activity days_credit Edition ISBN publisher_id (FK) publication_id (FK) print_date pages Rank ISBN (FK) rank ingram_units Edition ISBN publisher_id publication_id print_date pages list_price format Rank ISBN (FK) rank Ingram ISBN (FK) ingram_units Denormalize beyond 3rd NF Transform Denormalize beyond 3rd NF Transform Multiple NULL values can be separated into multiple tables Potentially NULL values were separated out 153 Advanced Relational Database Modeling 11_574906 ch06.qxd 10/28/05 11:39 PM Page 153 A fixed-length record is where each record always occupies the same number of characters. In other words, all string and number values are a fixed length. For example, a string value of 30 characters, can never be NULL, but will contain 30 space characters. It follows that a 30-character string with a 10-character name contains the name followed by 20 trailing space characters. Figure 6-2 shows reversal of normalization processing applied in Figure 4-31. Figure 6-2 shows a particularly upsetting application of BCNF where all candidate keys are separated into separate tables. A candidate key is any field that potentially can be used as a primary key (unique identifier) for the original entity, in this case a customer (the CUSTOMER table). Applying this type of normalization in a commercial environment would result in incredibly poor performance and is more of a mathematical nicety rather than a commercial necessity. Figure 6-2: Denormalizing separation of candidate keys into separate tables. Once again Figure 6-3 shows another application of the reversal of BCNF. Application of BCNF in Figure 4-32 and Figure 4-33 created three tables, each with unique combinations of unique values from the PROJECTS table on the left. Accessing of unique records in the PROJECT table can be handled with application coding more effectively, without the downside of creating too much granularity in table structures. Customer customer_id customer_name address phone fax email exchange ticker balance_outstanding last_date_activity days_credit Customer customer_id balance_outstanding last_activity days_credit Customer_Stock_Ticker customer_id (FK) exchange ticker Customer_Phone customer_id (FK) phone Customer_Address customer_id (FK) address Customer Email customer_id (FK) email Customer_Fax customer_id (FK) fax Customer Name customer_id (FK) customer_name Denormalize BCNF Transform Denormalize BCNF Transform 154 Chapter 6 11_574906 ch06.qxd 10/28/05 11:39 PM Page 154 Figure 6-3: Denormalization of candidate keys, created for the sake of uniqueness in tables. Figure 4-34, Figure 4-35, Figure 4-36, and Figure 4-38 show a typical 4NF transformation where multiple valued lists in individual fields are separated out into separate tables. This type of denormalization is shown in Figure 6-4. Figure 6-4: Denormalization of multiple valued lists. The problem with denormalizing the structure shown in Figure 6-4 is that the relationships between EMPLOYEE and SKILL tables, plus EMPLOYEE and CERTIFICATIONS tables, are many-to-many, and not one-to-many. Even in a denormalized state, each EMPLOYEE record must have some kind of collection of SKILLS and CERTIFICATIONS values. A better solution might be a combination of collection arrays in the EMPLOYEE table, and 2NF static tables for skills and certifications as shown in Figure 6-5. Employee manager Employee employee skills certification Denormalize 4th NF Transform Denormalize 4th NF Transform Employee_Certification employee (FK) certification Employee_Skill employee (FK) skill It is interesting to note that the relational database modeling tool, ERWin, would not allow the MANAGER table to have more than the MANAGER field in its primary key. For 5NF, the MANAGER table could contain either the PROJECT or EMPLOYEE field as a subset part of the primary key. ERWin perhaps “thinks” that 5NF in this case is excessive, useless, or invalid. Projects project manager employee Manager manager Employee employee manager (FK) Project project manager (FK) Denormalize BCNF Transform Denormalize BCNF Transform 155 Advanced Relational Database Modeling 11_574906 ch06.qxd 10/28/05 11:39 PM Page 155 Figure 6-5: Denormalization of multiple valued lists using collections and 2NF. Figure 4-39 to Figure 4-42 shows a 5NF transformation. As already noted, ERWin does not appear to allow construction of 5NF table structures of this nature. The reason is suspect! Once again, as shown in Figure 6-6, application of this type of normalization is overkill. It is better to place this type of layering into application coding, leaving the EMPLOYEE table as it is, shown in the upper left of the diagram in Figure 6-6. Figure 6-6: Denormalization of 5NF cyclic dependencies. Employee project employee manager Denormalize 5th NF Transform Denormalize 5th NF Transform Project_Manager project manager Manager_Employee manager employee Project_Employee project employee Employee employee skills certification Skill skill_id skill Certification certification_id certification Employee employee skills certifications 2nd NF plus transform + collection array 2nd NF plus transform + collection array Object-relational database collection arrays of SKILLS and CERTIFICATION 156 Chapter 6 11_574906 ch06.qxd 10/28/05 11:39 PM Page 156 Denormalizing 3NF The role of 3NF is to eliminate what are called transitive dependencies. A transitive dependency is where a field is not directly determined by the primary key, but indirectly determined by the primary key, through another field. Most of the Normal Form layers beyond 3NF are often impractical in a commercial environment because applications can often do better at that level. What happens in reality is that 3NF occupies a gray area, fitting in between what should not be done in the database model (beyond 3NF), and what should done in the database model (1NF and 2NF). There are a number of different ways of interpreting 3NF, as shown in Figure 4-21, Figure 4-23, Figure 4-24, and Figure 4-25. All of these example interpretations of 3NF are completely different. Figure 6-7 shows the denormalization of a many-to-many join resolution table. As a general rule, a many-to-many join resolution table is usually required by applications when it can be specifically named, as for the ASSIGNMENT table shown in Figure 6-7. If it was nonsensical to call the new table ASSIGNMENT, and it was called something such as EMPLOYEE_TASK, chances are that the extra table is unnecessary. Quite often these types of tables are created without forethought as to application requirements. If a table like this is not essential to application requirements, it is probably unnecessary. The result of too many new tables is more tables in joins and slower queries. Figure 6-7: Denormalize unnecessary 3NF many-to-many join resolution tables. Employee employee Denormalize 3rd NF Transform Denormalize 3rd NF Transform Task task E mployee em ployee Task task Assig nm e nt e m ployee (FK) task (FK) Employee Task 157 Advanced Relational Database Modeling 11_574906 ch06.qxd 10/28/05 11:39 PM Page 157 Figure 6-8 shows another version of 3NF where common fields are extracted into a new table. Once again, this type of normalization is quite often more for mathematical precision and clarity, and quite contrary to commercial performance requirements. Of course, there is still a transitive dependency in the new FOREIGN_EXCHANGE link table itself, because EXCHANGE_RATE depends on CURRENCY, which in turn depends on CURRENCY_CODE. Normalizing further would complicate something even more than it is already. Figure 6-8: Denormalization of 3NF amalgamated fields into an extra table. Figure 6-9 shows a classic 3NF transitive dependency resolution, or the creation of a new table. The 3NF transformation is providing mathematical precision; however, practical commercial value is dubious because a new table is created, containing potentially a very small number of fields and records. The bene- fit will very likely be severely outweighed by the loss in performance, as a result of bigger joins in queries. Denormalize 3rd NF Transform Denormalize 3rd NF Transform Customer customer currency_code currency exchange_rate address Supplier supplier currency_code currency exchange_rate address Customer customer_id currency_code (FK) address Foreign Exchange currency_code currency exchange_rate Supplier customer_id currency_code (FK) address Currency data common to both 158 Chapter 6 11_574906 ch06.qxd 10/28/05 11:39 PM Page 158 Figure 6-9: Denormalization of 3NF transitive dependence resolution table. Figure 6-10 shows a 3NF transformation removing a total value of one field on the same table. The value of including the total amount on each record, containing the elements of the expression as well, is determined by how much a total value is used at the application level. If the constituents of the totaling expression are not required, perhaps only the total value should be stored. Again, this is a matter to be decided only from the perspective of application requirements. Figure 6-10: Denormalization of 3NF calculated fields. Denormalize 3rd NF Transform Denormalize 3rd NF Transform Stock stock description min max qtyonhand price total value Stock stock description min max qtyonhand price TOTALVALUE dependent on QTYONHAND and PRICE Denormalize 3rd NF Transform Denormalize 3rd NF Transform Employee employee department city Employee employee department (FK) Department department city 1. City depends on department 2. Department depends on employee 3. Thus city indirectly or transitively dependent on employee 159 Advanced Relational Database Modeling 11_574906 ch06.qxd 10/28/05 11:39 PM Page 159 Denormalizing 2NF The role of 2NF is to separate static data into separate tables, removing repeated static values from transactional tables. Figure 6-11 shows an example of over-application of 2NF. The lower right of the diagram shows an extreme of four tables, created from what is essentially a more-than-adequately normalized COMPANY table at the upper left of the diagram. Figure 6-11: Denormalization of 2NF into a single static table. Listing listing exchange (FK) ticker Classification classification Exchange exchange classification (FK) Company company listing (FK) address phone fax email Company company listing (FK) address phone fax email Listing listing classification exchange ticker Company company address phone fax email classification exchange ticker Company is static data- too much normalization Insanely over normalized Over normalization 160 Chapter 6 11_574906 ch06.qxd 10/28/05 11:39 PM Page 160 Denormalizing 1NF Just don’t do it! Data warehouse fact tables can be interpreted as being in 0th Normal Form, but the connections to dimensions are 2NF. So, denormalization of 1NF is not advisable. Try It Out Denormalize to 2NF Figure 6-12 shows a highly normalized table structure representing bands, their released CDs, tracks on the CDs, ranks of tracks, charts the tracks are listed on, plus the genres and regions of the country those charts are located in. 1. The RANK and TRACK tables are one-to-one related (TRACK to RANK: one-to-zero or one). This implies a BCNF or 4NF transformation, zero or one meaning a track does not have to be ranked. Thus, a track’s rank can be NULL valued. Push the RANK column back into the TRACK table and remove the RANK table. 2. The three tables BAND_ADDRESS, BAND_PHONE, and BAND_EMAIL were created because of each prospective band attribute being a candidate primary key in itself. Reverse the BCNF transfor- mation, pushing address, phone, and email details back into the BAND table. 3. The CHART, GENRE, and REGION tables are an absurd application of multiple layers of 2NF transformation, separating static information, from what is effectively parent static information. Chart, genre, and region details can all be pushed back into the TRACK table. Figure 6-12: Normalized chart toppers. Chart chart genre (FK) Region region Genre genre region (FK) Track track_id chart (FK) cd_id (FK) track length Band band_id name CD listing classification exchange Band_Address band_id (FK) address Band_Phone band_id (FK) phone Band_Email band_id (FK) email Rank track_id (FK) rank 161 Advanced Relational Database Modeling 11_574906 ch06.qxd 10/28/05 11:39 PM Page 161 How It Works Figure 6-13 shows what the tables should look like in 2NF. Figure 6-13: Denormalized chart toppers. Denormalization Using Specialized Database Objects Many databases have specialized database objects for certain types of tasks. Some specialized objects allow for physical copies of data, copying data into a denormalized form. ❑ Materialized views — Materialized views are allowed in many larger relational databases. These objects are commonly used in data warehouses for pre-calculated aggregation queries. Queries can be automatically switched to direct access of materialized views. The result is less I/O activity by direct access to aggregated data stored in materialized views. Typically, aggregated materialized views contain far fewer records than underlying tables, reducing I/O activity and thus increasing performance. Views are not the same thing as materialized views. Views are overlays and not duplications of data and interfere with underlying source tables. Views often cause far more in the way of performance problems than application design issues they might ease. ❑ Clusters — These objects allow physical copies of heavily accessed fields and tables in join queries, allowing for faster access to data with more precise I/O. ❑ Index-organized tables —A table can be constructed, including both index and data fields in the same physical space. The table itself becomes both the index and the data because the table is constructed as a sorted index (usually as a BTree index), rather than just a heap or “pile” of unorganized “bits and pieces.” CD cd_id band_id (FK) title length tracks Track track_id cd_id (FK) track length rank region genre chart Band band_id name address phone email 162 Chapter 6 11_574906 ch06.qxd 10/28/05 11:39 PM Page 162 [...]... Advanced Relational Database Modeling Understanding the Object Model Many modern relational databases are, in fact, object-relational databases An object-relational database is by definition a relational database allowing certain object characteristics To get a good understanding of what an object-relational database is, you must have a basic understanding of the object model The object database model is... briefly compared data warehouse modeling relative to both the relational database model, and the object database model In addition, a brief description covered star schemas This chapter delves deeply into the details of the database warehouse database model Expanding the relational database model to include the data warehouse database model may seem a little obtuse; however, in the modern, computerized,... warehouse database installations as a whole Data warehouse databases are usually physically much larger on average Something larger is generally much more expensive and likely just as important as OLTP databases, if not more so A bigger database costs more money to build and maintain; therefore, data warehouse data modeling is just as important as relational database modeling for OLTP and transactional databases... Denormalization using specialized database objects such as materialized views ❑ Denormalizing each Normal Form, using the same examples from previous chapters ❑ The object database model and its significance ❑ The data warehouse database model, and its significance 169 Chapter 6 This chapter has described some advanced database modeling topics, including denormalization, object database modeling, and data... is important to understanding the basics of object-relational databases Denormalization is essential to understanding not only database performance but also the topic area of data warehousing database modeling Data warehousing is very important to modern-day databases and is a large topic in itself The next chapter covers data warehouse database modeling in detail Exercises Use the ERD in Figure 6-12... warehouses ❑ How data warehouses require a specialized database model ❑ Star and snowflake schemas ❑ Facts and dimensions ❑ The fact-dimensional database model ❑ How to create a data warehouse database model ❑ The contents of a data warehouse database model The Origin of Data Warehouses Data warehouses were originally devised because existing databases were being subjected to conflicting requirements... relational database field ❑ Method — A method is equivalent to a relational database stored procedure, except that it executes on the data contents of an object, within the bounds of that object In the relational database model, relationships are established using both table structures (metadata) and data values in fields, such as those between primary and foreign key values On the contrary, in an object database, ... warehouse database model is essentially an animal unto itself, with very little relationship to either the relational or the object database models: ❑ Data warehouses and the relational model — The relational model is too granular The relational model introduces granularity by removing duplication The result is a database model nearly always highly effective for front-end application performance and OLTP databases... Warehouse Database Modeling “Intuition becomes increasingly valuable in the new information society precisely because there is so much data.” (John Naisbitt) Data warehouses need special treatment and a special type of approach to database modeling, simply because they can get so unmanageably large Chapter 6 introduced the data warehouse database model, amongst other advanced relational database modeling... (however a company may decide to split its data) 168 Advanced Relational Database Modeling Some data warehouses are built using 3NF table structures, or even combine normalized structures with fact-dimensional structures in the same database Author Customer Publisher Book Shipper Subject One-To-Many Relationship Figure 6-17: A data warehouse database model star schema A book can obviously have several authors . the Object Model Many modern relational databases are, in fact, object-relational databases. An object-relational database is by definition a relational database allowing certain object characteristics a relational database model table structure. If it does, you might be attempting to build a relational database structure into an object database model. 165 Advanced Relational Database Modeling 11_574906. relational database model, and the object database model. In addition, a brief description covered star schemas. This chapter delves deeply into the details of the database warehouse database model. Expanding

Ngày đăng: 03/07/2014, 01:20