1. Trang chủ
  2. » Công Nghệ Thông Tin

Beginning Database Design- P18 doc

20 261 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 20
Dung lượng 635,83 KB

Nội dung

Histor y fact_id ter tiary_id (FK) secondar y_id (FK) location_id (FK) time_id (FK) buyer_id (FK) seller_id (FK) histor y_buyer histor y_buyer_comment_date history_buyer_comments history_seller histor y_seller_comment_date histor y_seller_comments Location location_id region_id (FK) countr y_id (FK) state_id (FK) city_id (FK) City city_id state_id (FK) city State state_id countr y_id (FK) state Count r y countr y_id region_id (FK) countr y Seller seller_id popularity_rating join_date Categor y_Secondar y secondar y_id primar y_id (FK) secondar y Categor y_Tertiar y tertiar y_id secondar y_id (FK) tertiary Buyer buyer_id popularity_rating join_date Time time_id year_id (FK) quarter_id (FK) month_id (FK) Quar ter quarter_id year_id (FK) quar ter Month month_id quarter_id (FK) month Year year_id year Buyer_Name buyer_id (FK) name Buyer_Address buyer_id (FK) address Categor y_Primary primar y_id primar y Seller_Nulls seller_id (FK) return _policy intern ational Seller_Pay ment Methods seller_id (FK) payment_method Seller_Name seller_id (FK) name Seller_Address seller_id (FK) address Region region_id region Figure 10-36: A data warehouse HISTORY fact table snowflake schema (a history data mart). 16_574906 ch10.qxd 11/4/05 10:46 AM Page 313 314 Chapter 10 Figure 10-37: Musicians, bands, their online advertisements, and some other goodies. How It Works Figure 10-37 shows the analyzed data warehouse database model, for online musician and band advertisements. The most significant requirement is to ultimately produce a single star schema, if a single star schema is possible. Also add any dimensional and fact information shown as additional in Figure 10-32. Figure 10-37 shows that the SHOW table is actually fact information, not dimensional. Examine Figure 10-37 once more. Think about the records in the tables. Yes, many advertisements are possible. However, a simple search of the Internet on Web sites such as www.themode.com and www.ticketmaster.com will reveal to you the sheer volume of advertisements, musicians, bands, shows, discography (released CDs), and venues. Figure 10-38 takes another slant on this data warehouse database model by rolling all of these tables into a single fact table. Musician musician_id musician phone email instruments skills Advertisement advertisement_id band_id (FK) musician_id (FK) ad_date ad_text Band band_id band founding_date genre Discography discography_id band_id (FK) cd_name release_date price Merchandise merchandise_id band_id (FK) type price Show show_id venue_id (FK) band_id (FK) venue date time Venue venue_id venue address directions phone 16_574906 ch10.qxd 11/4/05 10:46 AM Page 314 Figure 10-38: Denormalized —musicians, bands, their online advertisements, and some other goodies. Figure 10-38 is a partially complete data warehouse database model, with all the facts rolled into a single table. Figure 10-39 shows a finalized, much more sensible star schema, based purely on relative record numbers in various tables from Figure 10-39. Larger record numbers tend to warrant tables as being factual rather than dimensional in nature. Artists artist_id musician_id (FK) musician_name musician_phone musician_email band_name band_founding_date discography_cd_name discography_release_date discography_price show_date show_time venue_name venue_address venue_directions venue_phone advertisment_date advertiseme nt_text Musician musician_id instruments skills Merchandise merchandise_id type price Band genre 315 Creating and Refining Tables During the Design Phase 16_574906 ch10.qxd 11/4/05 10:46 AM Page 315 Figure 10-39: Denormalized into a single star schema —musicians, bands, their online advertisements, and some other goodies. It is not really possible to normalize the facts in the ARTISTS table, shown in Figure 10-39, into separate star schemas because all the separate elements (such as bands, advertisements, shows, and venues) are all related to each other. Thus, a single star schema (a single fact table) is the most appropriate data warehouse database model design in this situation. Summary In this chapter, you learned about: ❑ How to expand and differentiate database model design from analysis ❑ The design process, as opposed to the analysis process of the previous chapter Artists artist_id merchandise_id (FK) genre_id (FK) instrument_id (FK) musician_name musician_phone musician_email band_name band_founding_date discography_cd_name discography_release_date discography_price show_date show_time venue_name venue_address venue_directions venue_pho ne advertisment_date advertisement_text Instrument instrument_id section_id instrument Merchandise merchandise_id type price Genre genre_id parent_id (FK) genre 316 Chapter 10 16_574906 ch10.qxd 11/4/05 10:46 AM Page 316 ❑ How to create and refine tables ❑ How to enforce and refine inter-table relationships and referential integrity ❑ Normalization (without going too far) ❑ Denormalization (without going too far) ❑ The folly of normalization beyond 3NFs, for both OLTP and data warehouse databases ❑ Providing for application usability, flexibility, and performance in database modeling ❑ How to ensure applications translate into happy end-users (without happy end-users, there is no profit, and, thus, no company) This chapter has primarily expanded on Chapter 9, from analysis (what to do), into design (how to solve it). Once again, the online auction house database model has been expanded on, and detailed further by the design process, as the continuing case study. Chapter 11 digs even further into the design process by describing and specifying fields within each table, along with datatypes and indexing. The discussion on indexing is especially about alternate (secondary) indexing. Exercises Use the ERDs in Figure 10-32 and Figure 10-39 to help you answer these questions: 1. Create scripts to create tables for the OLTP database model shown in Figure 10-32. Create the tables in the proper order by understanding the relationships between the tables. 2. Create scripts to create tables for the data warehouse database model shown in Figure 10-39. Once again, create the tables in the proper order by understanding the relationships between the tables. 317 Creating and Refining Tables During the Design Phase 16_574906 ch10.qxd 11/4/05 10:46 AM Page 317 16_574906 ch10.qxd 11/4/05 10:46 AM Page 318 11 Filling in the Details with a Detailed Design “Digging ever deeper gives clarity to definition, and definition of clarity.” (Gavin Powell) The further you go the more you discover. This chapter provides the details on the internal structure of tables in terms of fields, field content, field formatting, and indexing on fields. This chapter digs a little deeper into the case study mate- rial presented in the previous two chapters. Chapter 9 introduced a database model in its infancy, by analyzing what needed to be done. Chapter 10 unearthed structural detail by describing how tables are built and how they are joined together. This chapter delves into the details of the tables themselves, by designing the precise content and structure of individual fields. Indexing is included at this stage because indexes are created against specific table fields. An index is not quite the same thing as a key, such as a primary key. A pri- mary key is required to be unique across all records in a table; therefore, many database engines usually create an automatic unique index for that primary key (which helps performance by checking for uniqueness). Foreign keys, on the other hand, do not have to be unique, and even the most sophisticated of relational databases does not automatically create indexes on foreign keys. This is intentional of course. If an index is required on a foreign key field (which it more often than not is), an index must be manually created for that foreign key. By the end of this chapter, you will have a good understanding of how best to structure fields, their datatype formats, how, when and where those formats apply. Also, you will have a better conceptual understanding of foreign key indexing and alternate (secondary) indexing. In this chapter, you learn about the following: ❑ Refining field structure and content in tables ❑ Using datatypes ❑ The difference between simple datatypes, ANSI datatypes, Microsoft Access datatypes and some specialized datatypes 17_574906 ch11.qxd 10/28/05 11:38 PM Page 319 ❑ Using keys and indexes ❑ Using alternate (secondary) indexing Case Study: Refining Field Structure In this section, you refine the field content of tables for both the OLTP and data warehouse database models. You continue with the consistent case study development of database models for the online auction house. The OLTP Database Model Figure 11-1 shows the most recent version of the OLTP database model for the online auction house. Figure 11-1: The online auction house OLTP database model. History history_id seller_id (FK) buyer_id (FK) comment_date comments Listing listing# category_id (FK) buyer_id (FK) seller_id (FK) ticker (FK) description image start_date listing_days starting_price reserve_price buy_now_price number_of_bids winning_price Category category_id parent_id category Currency ticker currency exchange_rate decimals Seller seller_id seller popularity_rating join_date address return_policy international payment_methods Buyer buyer_id buyer popularity_rating join_date address Bid listing# (FK) buyer_id (FK) bid_price bid_date 320 Chapter 11 17_574906 ch11.qxd 10/28/05 11:38 PM Page 320 Analysis and design are an ongoing process. Figure 11-1 shows two further examples of backtracking and refining: ❑ There is normalization of the CURRENCY table. Analytically, it is assumed that the online auction house is based in the U.S. and the U.S. dollar is the default currency. Currencies are separated because there can be a fair amount of complexity involved in currency exchange conversions. ❑ The relationships between SELLER to HISTORY, and BUYER to HISTORY tables should allow for his- tories with buyers or sellers. This is because the HISTORY table is a combination of buyer and seller histories. When a trader is only a buyer, that trader will have no history of activity as a seller; therefore, the relationship between BUYER and HISTORY tables is zero or one to zero, one or many. This means that for every HISTORY record, there does not necessarily have to be a SELLER record. This is because for every HISTORY record, there can be either a SELLER record, or a BUYER record. Figure 11-2 shows a refined field structure for the online auction house OLTP database model shown in Figure 11-1. Figure 11-2: Refining fields for the online auction house OLTP database model. History history_id seller_id (FK) buyer_id (FK) comment_date feedback_positive feedback_neutral feedback_negative Listing listing# category_id (FK) buyer_id (FK) seller_id (FK) ticker (FK) description image start_date listing_days starting_price reserve_price buy_now_price number_of_bids winning_prince Category category_id parent_id category Currency ticker currency exchange_rate decimals Seller seller_id seller company company_url popularity_rating join_date address_line_1 address_line_2 town zip postal_code country return_policy international_shipping payment_method_personal_check payment_method_cashiers_check payment_method_paypal payment_method_western_union payment_method_USPS_postal_order payment_method_international_postal_order payment_method_wire_ transfer payment_method_cash payment_method_visa payment_method_mastercard payment_method_american_express Buyer buyer_id buyer popularity_rating join_date address_line_1 address_line_2 town zip postal_code country Bid listing# (FK) buyer_id (FK) bid_price proxy_bid bid_date 321 Filling in the Details with a Detailed Design 17_574906 ch11.qxd 10/28/05 11:38 PM Page 321 Field additions and changes are refinements of both structure and application, as shown in Figure 11-2. An example of addition refinement is the addition of the INCREMENT field to the BID table. An example of structural refinement is a change to an existing field. Changing the ADDRESS field to five separate fields is a structural refinement. Field refinements are described as follows: ❑ The INCREMENT field is added to the LISTING table. Sellers can set a price increment for a list- ing. The application software will automatically apply bid increment values if INCREMENT is not set. The system may also override bid increments (based on all pricing factors) if an increment entered by the seller does not equate appropriately with all the pricing values set by the seller. ❑ A proxy bid (on the BID table) is where a bidder sets a maximum price a bidder is prepared to bid up to. When a bidder enters a proxy bid, it permits the online auction site to act on behalf of the bidder, increasing the bidders bid price, up to the proxy bid value (the maximum the bidder is prepared to pay). ❑ Address fields on BUYER and SELLER tables are split into ADDRESS_LINE_1, ADDRESS_LINE_2, TOWN, ZIP, POSTAL_CODE, and COUNTRY fields. The ZIP field is used in the U.S. Postal codes are used in other countries. It is necessary to divide address details up in this way for two reasons: ❑ It allows easy input by buyers and sellers (all sensibly broken up into separate boxes). ❑ Subsequent analysis of data by the system (such as in reporting by location) is much more effective with information split into separate fields. ❑ Payment methods on the SELLER table have been split into separate fields (containing simple answers of TRUE or FALSE). This allows multiple selections for sellers and is stored in one place (the SELLER table), as opposed to normalizing. The 4NF normalization, being a separate table, might make for less efficiency in joins. Additionally, this Boolean type division of multiple selectable options is best handled at the application level. It is simply too detailed for handling at the lower level of the database model. It might even be best to leave the PAYMENT_METHODS field in the SELLER table as a comma delimited string of options or even a comma-delimited string of TRUE and FALSE values. Applications would then dictate positions of TRUE and FALSE values (stored as T and F or Y and N, or 1 and 0, or otherwise). Remember, this is an OLTP database model. OLTP databases must be tightly controlled by applications because of the immense computing power utilized to man- age huge quantities of concurrent Internet users. Allowing ad hoc access to OLTP databases and applications will kill your system and result in no users, and thus no business. If this is not the case, it is unlikely you are not building an OLTP database model. ❑ It might be possible to split up the RETURN_POLICY field in the same way that the PAYMENT_ METHODS field is split, as shown in the previous option. This one is left to your imagination. ❑ The HISTORY table COMMENTS field could be split into multiple field options, perhaps helping to direct end-user comments (for example, COMMENTS_ABOUT_SELLER, COMMENTS_ABOUT_LISTING, COMMENTS_SERVICE_LEVEL, COMMENTS_BUYER_PROMPTNESS). There are many other possibili- ties. Comments could even be split into a field structure based on pick list type of preset answers (or answer categories), somewhat similar to payment methods division in the SELLER table. The HISTORY table COMMENTS field has been divided into the three general feedback type fields containing options for positive, neutral, and negative feedback (perhaps even all three can be entered). When people using online Internet sites feel that they can comment, it makes them 322 Chapter 11 17_574906 ch11.qxd 10/28/05 11:38 PM Page 322 [...]... warehouse database model These field refined field changes are all duplicated from the OLTP database model, to the database warehouse database model Field refinements shown in Figure 11-4 are described as follows: ❑ It is prudent to compare the OLTP database models between Figure 11-1 and Figure 11-2, and make any field additions to the data warehouse database model, already made to the OLTP database. .. — Some databases allow structured storage of XML documents where the XML Document Object Model (DOM) is actively available through the XML datatype field What this means is that when accessing an XML document, you can access and manipulate the definitions and attributes of the XML document (its structure), and not just the XML data Some relational databases can effectively mimic an XML native database. .. more to do with how applications will handle things So far, the OLTP database model in this case study has been far more tightly controlled than the data warehouse model At this point of field refinement for an OLTP database model, the OLTP database model may appear to become less mathematical The important thing to remember is that the database model is good at doing certain things, and that application... dimension needs to be added to the data warehouse database model, as it was for the OLTP database model in Figure 11-2 One quite distinct difference between OLTP and data warehouse database models is that it is prudent to always use integer surrogate key fields as primary keys on dimensions The TICKER field in the CURRENCY table for the data warehouse database model is no longer the primary key The CURRENCY_ID... returned from a database (usually) Date fields allow only date values such as 04/31/2004, or Apr 4, 2004 A specific date format is generally predetermined in the database engine and can be overridden for individual dates ANSI (American National Standards Institute) Datatypes There are many different database engines, each having its distinct set of datatypes Datatypes across all different database engines... fulfill the same functions Each database usually has its own specific naming conventions (for different datatypes), but not always Many of the datatypes across different database are often the same ANSI datatypes attempt to establish a standard Standards are formulated and documented in an attempt to maintain some form of consistency across different software tools, databases, and applications Consider... relational database table ❑ Pointers — These are special datatypes used to store simple address pointers to binary data stored outside the table structure of a relational database The actual file (such as an image) is stored on disk, externally to the database File pointers are usually the most efficient method of storing static binary data (such as JPG, BMP, and GIF images) 331 Chapter 11 ❑ XML Documents... its demonstrative function by showing what can be done, not necessarily what should be done You are beginning to see that a database model should not only be mathematically driven, but also application driven The needs of front-end applications can sometimes partially dictate database model design because database model and applications are dependent on each other in many respects The ERD shown in Figure... you can use the details learned about different datatypes to refine the OLTP and data warehouse database models for the online auction house The OLTP Database Model Figure 11-2 contains the most recent version of the OLTP database model for the online auction house Figure 11-7 defines datatypes for the OLTP database model shown in Figure 11-2 332 ... (such as Java) are good at doing other things You don’t need to be a database expert or an experienced Java programmer to understand the basic precept of this change in approach Essentially, the OLTP database model might become somewhat more end-user oriented at this stage, and perhaps a little less mathematically confusing to the database modeling uninitiated So far in this book, the data warehouse . online auction house data warehouse database model. These field refined field changes are all duplicated from the OLTP database model, to the database warehouse database model. Field refinements. Design 17_574906 ch11.qxd 10/28/05 11:38 PM Page 331 ❑ XML Documents— Some databases allow structured storage of XML documents where the XML Document Object Model (DOM) is actively available through. You are beginning to see that a database model should not only be mathematically driven, but also application driven. The needs of front-end applications can sometimes partially dictate database

Ngày đăng: 03/07/2014, 01:20

TỪ KHÓA LIÊN QUAN