Mastering Data Warehouse DesignRelational and Dimensional Techniques phần 6 pps

46 340 0
Mastering Data Warehouse DesignRelational and Dimensional Techniques phần 6 pps

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Figure 7.8 Packaged goods product hierarchy. Frozen Foods Ice Cream Snacks Lazy Guy Like-a-Chef Bada-Binge Good Golly Ms Dolly Crunchies Munchies Entree Appetizers Sides Entree Appetizers Sides Flavor Nuts Flavor Diet Nuts Fried Chicken Meat Loaf Division Brand Product Group Standard Product SKU 009208198 – 24 Pc Legs & Thighs 009202028 – Jumbo Party Pack 009203293 – 48 Pc Buffalo Wings 010317298 – 128 ox Hunka-Loaf 010318418 – 64 ox Mini Hunka-Loaf 010310598 – 128 ox Hunka -Loaf w Cheese (…) (…) Modeling Hierarchies 211 Figure 7.9 Packaged goods customer hierarchy. The lowest level of the customer’s distribution hierarchy is the ship-to- customer location. These are grouped under sold-to customers, which represent the customer’s purchasing representatives. These are further grouped into planning customers, which is the lowest level at which sales plans are esti- mated. Figure 7.9 shows the full hierarchy. Capacity planning is handled by another system, which allocates to plants and distribution centers the sales plan based upon previous sales history. The planners wish to use the data warehouse to support various levels of analysis. They wish to monitor the plan against actual sales in order to adjust both the sales and capacity plans to reflect recent experience. So that the com- parisons are meaningful, the reports must reflect the hierarchies that were in effect at the time the plan was published. Both the product and customer are simple hierarchies. Each child has only one parent. Analysis There are a number of aspects to consider concerning how the hierarchies are to be used in this application. First, and most important, there is a need to com- pare values that are provided at different levels in the hierarchies: a sale, which is collected at the transaction level and is specific to a SKU, sold-to customer, and ship-to address; and the sales plan which is specified at a higher level in both the product and customer hierarchies. Second, the data warehouse will most likely be called on to provide sales history to the capacity-planning sys- tem. The feed would be required to tie the detailed history to the higher-level B&Q Markets B&Q NthEact B&Q Central B&Q SthEast B&Q NE Frozen B&Q NE Juice B&Q NE Pet Allegheny Frz Customer HQ Planning Group Sold-To Ship-To Bronx Frz Freeport Frz Hartford Frz Allegheny Amb Bronx Amb Freeport Amb Hartford Amb Allegheny Frz Bronx Frz Freeport Frz Hartford Frz Allegheny Amb Bronx Amb Freeport Amb Hartford Amb Chapter 7 212 hierarchy entities where the planning has taken place. It may even be necessary to calculate the allocation percentages. Third, while planning is done at an intersection of the customer and product hierarchies, different user groups require the numbers to be reported using one side of the hierarchy tree. In the case study, the sales group is interested in reporting based on the customer sales hierarchy. It can be presumed that other groups have similar requirements using other hierarchies. Because sales and the sales plan exist at different levels of detail, there are two options available for reporting sales against the sales plan. Either summarize the lower level data (sales) to the higher-level data (the sales plan) or allocate the sales plan to the sales data level. The business managers feel that allocating the sales plan to such a low level of detail is of no real value. Instead, there is a need to summarize the detailed sales data to match the level of the sales plan. The simple aggregation of the sales data to the sales plan should occur as part of the delivery process using the hierarchy to drive the aggregation. If, how- ever, the decision is to allocate the sales plan to a lower level, you may wish to consider storing the allocations within the data warehouse. This decision would be based on the complexity of the allocation, the amount of data involved, how frequently the allocation needs to be performed, and the num- ber of data marts or other applications requiring this allocation. If the alloca- tion results are used frequently but are relatively stable, you may reduce delivery effort by storing the allocation results in the data warehouse. If the sales plan or the basis for allocations change almost as often as the need to delivery it, there would be little benefit to store the results. Instead, handle the allocation in the delivery process. Further examination of the hierarchy provided by the business shows that the ship-to customer may be a child of more than one sold-to customer. The reason for this is simple. The sold-to customer represents a buyer within the cus- tomer’s organization, while a ship-to customer is a physical location that receives the goods. In our example of B&Q Markets, buyers are organized by types of products that are shipped to many of the same locations, which are B&Q Market’s distribution centers. With additional investigation, it is found that this sold-to/ship-to relationship is primarily for operational control and not necessary for analysis. In fact, when an order is processed, both the buyer (sold-to customer) and shipping destination (ship-to customer) are recorded in the system. Both will be received in the transactional data feed. From this, we determine that the ship-to customer is not needed in the hierarchy to analyze sales. However, there is a desire to produce customer lists without transac- tional data using the full hierarchy. Later, we will show how to use different parts of the hierarchy for different classes of queries. Another issue is the men- tion of the preferred distribution center assigned to each ship-to customer address. It is often the case that from a business point of view, users may think of this as being part of the “hierarchy.” This is natural and should be expected. Modeling Hierarchies 213 However, it is up to you as the modeler to not confuse your design. The assigned distribution center is an independent attribute of the ship-to cus- tomer as these are assigned based on the geographic location of the delivery point regardless of the customer hierarchy structure. It is important to identify such cases and avoid attempts to include such attributes in a hierarchical structure. Later discussions with the operational system support staff revealed the prod- uct hierarchy is maintained as a flattened structure, while the customer hierar- chy is a recursive tree structure. In the next sections, we will discuss the most effective way to store these hierarchies. Another challenge we will examine is creating a data structure that will bridge data stored at different levels of sum- marization. In this case, tying detail sales data to the sales plan. Figure 7.10 shows the complete business model representation of these enti- ties. In the remainder of this section, we will examine how this model is imple- mented and used within the data warehouse. We will first look at the product hierarchy data feed and the processing issues it presents. Figure 7.10 Sales and planning business model. Sold To Customer Sold To Customer Identifier Planning Group Identifier (FK) Product Product Identifier Standard Product Identifier (FK) Ship Sold Customer Sold To Customer Identifier (FK) Ship To Customer Identifier (FK) Distribution Center Distribution Center Identifier Standard Product Standard Product Identifier Product Group Identifier (FK) Product Group Product Group Identifier Brand Identifier (FK) Brand Brand Identifier Division Identifier (FK) Division Division Identifier Customer HQ Customer HQ Identifier Planning Group Planning Group Identifier Customer HQ Identifier (FK) Ship To Customer Ship To Customer Identifier Preferred Distribution Center Identifier (FK) Chapter 7 214 The Product Hierarchy In this section, we will look at the issues involved in receiving a flattened (non- recursive) denormalized hierarchy from the operational system. Denormal- ized flat hierarchy structures, represented as a series of columns or a single column, do not inherently enforce any business rules that may exist. If the data is physically stored in this manner in the source system, it would be up to the source system’s application logic or manual effort by those maintaining the data to enforce whatever hierarchy rules that exist. This can lead to incorrect data that can materially effect how the data warehouse should receive and store the data. Storing the Product Hierarchy The product hierarchy is received from the source system as a single column. The column’s value contains formatted text where portions of the text repre- sent different parts of the hierarchy. For this example, the division is stored in positions 1 to 2, the brand in positions 3 to 5, the product group in positions 6 to 9, and the standard product code in positions 10 to 14. Receiving hierarchies in this manner from packaged software products is not unusual. Like the data warehouse, application software developers have the option to represent hierarchies as recursive trees or flattened structures. Those that implement them as flattened structures run into a problem with hierarchy depth. They wish to build flexibility into the application by allowing the end users the ability to define their own hierarchy, but how do you do that if the hierarchy must be of known depth to fit into the flat structure? Do you define a schema with some large number of columns, say 10, to allow a user to spec- ify up to 10 levels in the hierarchy? What if the user needs 11? What if he or she only uses three? How does the application deal with the other columns? A common solution is to simply provide one large text column. The user would then define the hierarchy levels and assign positions in this text field to hold the code value for each element of the hierarchy. This allows the end user to define as many levels as necessary, with the only restriction being the com- bined width of the code values. This is a good, workable solution for an appli- cation system, but a bad design for a data warehouse. As a function of the extract, transform, and load process in the data ware- house, the column must be interpreted to derive meaning from it. This inter- pretation should be performed up front in the data load process to create a structure like the one shown in Figure 7.11. This, of course, fixes the structure to known entities and a known depth. By breaking the column into its compo- nents and defining entities for each component, you clarify the model and sim- plify the use of the data. Modeling Hierarchies 215 Figure 7.11 Transforming the flattened hierarchy column. The resulting structure is in 2NF. This differs from the 3NF structure discussed earlier by the fact that the 2NF structure does not enforce true hierarchical rela- tionships. In fact, each attribute in the hierarchy is independent of the others. In the sample data, product group codes are shared across brands. Since the model does not enforce relationships between the entities, there is no guaran- tee that a child has one and only one parent. Without this guarantee, we can- not generate the types of reports required by the business. We need to perform additional transformations of the data to convert the complex hierarchy into a simple hierarchy and store it in 3NF. Simplifying Complex Hierarchies In this example, we have seen that the danger of receiving a denormalized flat- tened hierarchy is there may be no guarantee that this is not a complex hierar- chy. A complex hierarchy is one where a child may have multiple parents. As the product hierarchy in Figure 7.8 shows, common product groups are shared among different brands. For example, both Lazy Guy and Like-A-Chef brands have an Entrée product group. But, it is also true that leaves of this hierarchy, the products themselves, have unique SKUs and that the SKUs that are Lazy Guy entrees are mutually exclusive of SKUs that are Like-A-Chef entrees. So, what we have is an identity crisis, where two different product groups share the same identifier. Division Frozen Foods Ice Cream Snacks Beverage 01 02 05 18 Brand Lazy Guy Bada-Binge Like-a-Chef Crunchies 035 152 160 215 Ms Dolly237 Munchies348 Product Group Entree Side Orders Appetizers Flavored ENTR SIDE APPT FLAV With NutsNUTS Diet/Low FatDIET CookiesCOKI Standard Product Fried Chicken Meat Loaf Dumplings Potatoes 02319 02323 03401 04392 Chocolate IC10024 Vanilla IC10025 Chocolate Swirl10032 01 160 SIDE 04392 …… 01160SIDE04392 …… Incoming hierarchy data Transformation Process Chapter 7 216 Retaining Ancestry To avoid a complex hierarchy in this example, the Lazy Guy Entrée product group must have a different business key than the Like-A-Chef Entrée product group. Where such a key does not exist, it becomes the responsibility of the data warehouse process to create such a key. The easiest technique to do this is to prefix the code with the codes of its parents. This becomes the business key. As Figure 7.12 shows, each child becomes framed within the context of its ancestry. Conceptually, this makes the children dependent (as opposed to independent) entities. As the figure shows, this is only necessary for the hierarchy entities. It is rea- sonable to expect that the SKU is unique and unambiguous. It is not necessary to create a new key for the SKU. Here are some tips for use when building the 3NF tables: ■■ Use a surrogate key as the primary key. This will reduce the size of the key, particularly in foreign key references. Use the concatenated business key as an alternate key. ■■ Retain the original code value if they have business meaning and are unique within a source system. This is necessary to update attributes since the source system would provide such data using this value. Create an inversion index on this column, because duplicate values will exist. Updates should modify all rows with that value. Interface Issues A complex hierarchy raises specific issues that must be supported in the data feeds received by the data warehouse. The business case stated that most plan- ning occurs by product group. Yet, if the code ENTR represents the Entrée product group, then the question becomes, “Which brand’s product group?” Retaining ancestry in the key, as described in the previous section, will resolve the ambiguity in the data warehouse. However, the data feed for the sales plan must also provide the same ancestry data so that it properly identifies the proper product group. Without the necessary ancestry information, it is not possible to properly associate the sale plan with sales. If, as is the case here, the planning system is separate from the operational system, you will need to verify that such information is available. How such information is provided will depend on the environment and may require some transformation before is can be loaded into the data warehouse. The simplest is to receive a clean concatenated key that matches the ancestry business key described earlier. Or, you may receive a SKU for one of the products in the group. In such a case, you would use the hierarchy keys defined for that SKU. Regardless of how it is received, it is critical that data feeds, which provide infor- mation related to a hierarchy, unambiguously reference that hierarchy. Modeling Hierarchies 217 Figure 7.12 Adding ancestry to the business key. Division Frozen Foods Ice Cream Snacks Beverage 01 02 05 18 Brand Lazy Guy Bada-Binge Like-a-Chef Crunchies 01035 02152 01160 05215 Ms Dolly 02237 Munchies 05348 Product Group Entree Side Orders Appetizers Flavored 01035ENTR 01035SIDE 01035APPT 02237FLAV With Nuts 02237NUTS Diet/Low Fat 02237DIET Entree Side Orders Appetizers 01160ENTR 01160SIDE 01160APPT Flavored 02152FLAV With Nuts 02152NUTS Standard Product Fried Chicken Meat Loaf Dumplings 01035ENTR02319 01035ENTR02323 01035SIDE03401 Potatoes 01035SIDE04392 Fried Chicken Meat Loaf Potatoes 01160ENTR02319 01160ENTR02323 01160ENTR04392 SKU 009208198 009202028 009203293 010317298 … … … … 010318418 … 010310598 … Chapter 7 218 Bridging Levels An aspect of the reporting requirements is to summarize detailed sales data to the same level as the sales plan numbers. A table that matches the detail level keys with the hierarchy level keys can best handle this. Figure 7.13 shows an example. In this example, there is a sales plan for the Side Dish product group of the Like-A-Chef brand. Sales, recorded at the SKU level, contain many rows for many different cookie SKUs. The bridge table contains rows with pairs of keys. The column on the left contains the product group key, and the column on the right contains the SKU key. A query that joins the sales plan to actual sales through this bridge will naturally roll up actual sales to the same level of detail as the plan. This allows such summarization to occur on the fly using simple SQL, without the need to perform a recursive transversal of a hierarchy tree. This structure would be created in the data warehouse to aid in the deliv- ery of combined sales and sales plan data. In this situation, building such a structure is optional. The existing relationships between Product Group, Stan- dard Product, and Product (SKU), as depicted in the business model (see Fig- ure 7.10), are sufficient to resolve the relationship. The advantage of this structure is that it reduces the join path required to associate the data. This may significantly reduce the time to deliver such information to the marts and other external applications. However, one of the issues we face is that sales plan is not always by product group. In fact, a plan may be created at any level in the product hierarchy. How can a bridge be constructed so that detailed sales data can roll to the sales plan, regardless of the level of the plan? A more generic solution is in order, and the solution lies in addressing two problems: keys and tables. The issue with keys is that you have a number of different entities: product group, brand, SKU, which all have different key formats and content. Also, because of the way the hierarchy structure is received from the source system, there is no guarantee that a business key value is unique across each entity. There is nothing, other than internal business rules, to stop a business key for a brand from being the same as a business key for a product group. In the bridge, we want one column for the parent key, and we want that key value to be unique. The solution here is to use surrogate keys. With a surrogate, all keys will have the same format, and the data warehouse load process can control the value assignment, ensuring uniqueness. Modeling Hierarchies 219 Figure 7.13 Bridging a hierarchy. Product Group Entree Side Orders Appetizers Flavored 01035ENTR 01035SIDE 01035APPT 02237FLAV With Nuts 02237NUTS Diet/Low Fat 02237DIET Entree Side Orders Appetizers 01160ENTR 01160SIDE 01160APPT Flavored 02152FLAV With Nuts 02152NUTS Products 012308198 012302028 012303293 012317298 LAC Creamed Corn LAC Broccoli w Cheese LAC Au Gratin Potatoes LAC Mashed Potatoes 012318418 LAC Potatoes and SPAM 012310598 LAC SPAM, SPAM & SPAM Sales Plan 06/2002 01160SIDE 15,000 CS $517,000 012308198 012302028 012303293 012317298 012318418 012310598 01160SIDE 01160SIDE 01160SIDE 01160SIDE 01160SIDE 01160SIDE Product Group Bridge Sales 06/01/2002 100 CS $3500 012317298 06/01/2002 50 CS $1825 012303293 06/02/2002 125 CS $5030 012317298 06/02/2002 85 CS $2450 012310598 06/02/2002 329 CS $10200 012303293 06/03/2002 43 CS $207 012310598 Chapter 7 220 [...]... Cold 1015 SG WDW - Petite 10 16 SG WDW - Standard 1017 BY Joe Buyer 1018 BY Jane Doe 1019 BY Jan Deaux Buyer Responsibility Store Product Buyer 28 61 02 1017 28 61 03 1017 28 61 04 1017 28 61 05 1017 35 61 02 1017 35 61 03 1017 35 61 04 1017 35 61 05 1017 36 6102 1017 36 6103 1017 36 6104 1017 36 6105 1017 37 61 02 1017 37 61 03 1017 37 61 04 1017 37 61 05 1017 237 Key 61 02 61 03 61 04 61 05 Product SKU 192834001 192732198... Potatoes and SPAM 012310598 LAC SPAM, SPAM & SPAM Business key 100 101 102 103 104 105 1 06 107 108 109 110 111 112 113 114 115 1 16 117 118 119 120 121 122 123 124 125 1 26 Product Hierarchy Bridge Parent Child 100 124 1 06 124 114 124 120 124 124 124 100 1 26 1 06 1 26 113 1 26 119 1 26 1 26 1 26 Surrogate primary key Figure 7.14 A generic bridge structure Updates to the bridge need to be applied from the perspective... weight and size breakdown so that storage and transportation costs can be properly charged to the correct brand And the Lazy Guy brand manager requires that revenue be distributed across the product groups (appetizer, entrée, and side dishes) The marketing group and sales have countered that significant effort will be expended to launch and promote the new product They need to see all the sales and revenue... in the data warehouse and advantageous to use flattened structures in the data marts, the data warehouse should be positioned to transform these structures when necessary In this section, we will examine additional techniques to transform hierarchical structures Making a Recursive Tree If data is received from the source system as a flattened hierarchy, you may wish to consider storing the data in... code to implement and would be very difficult, if not impossible, to implement using an off-theshelf query tool The solution requires the problem be broken into smaller pieces First, you must recognize that the hierarchy stored in the data warehouse and the hierarchy stored in the data mart are there for different reasons The data warehouse structures should be designed to best collect and maintain the... simple tree data structure identifiers that identify Customer HQ, Planning Group, Sold-To, and Ship-To roles Let’s first take a look at the tree data structure and then examine how it is best implemented in the data warehouse The Recursive Hierarchy Tree A recursive tree data structure is the most elegant and flexible data structure for storing a hierarchy Any type of hierarchy can be stored in a recursive... Entree 01 160 SIDE Side Orders 01 160 APPT Appetizers 02152FLAV Flavored 01035ENTR02319 Fried Chicken 01 160 ENTR02319 Fried Chicken 01 160 ENTR02323 Meat Loaf 01 160 ENTR04392 Potatoes 012308198 LAC Creamed Corn 012302028 LAC Broccoli w Cheese 012303293 LAC Au Gratin Potatoes 012317298 LAC Mashed Potatoes 012318418 LAC Potatoes and SPAM 012310598 LAC SPAM, SPAM & SPAM Business key 100 101 102 103 104 105 1 06 107... 1007 1008 1003 10 06 1007 1008 1004 1005 10 06 1007 1008 Exploded Customer Hierarchy Level Distance Bottom Seq 1 0 N 1 1 1 N 3 1 1 N 2 1 1 N 7 1 2 Y 4 1 2 Y 6 1 2 Y 5 2 0 N 3 2 1 Y 4 2 1 Y 6 2 1 Y 5 2 0 N 2 2 0 N 7 3 0 Y 4 3 0 Y 6 3 0 Y 5 Eff Dt … … … … … … … … … … … … … … … … Exp Dt … … … … … … … … … … … … … … … … 1002 1003 1004 1005 10 06 1007 1008 1009 1010 1011 1012 1013 1014 1015 10 16 HQ PG PG PG SO... MIAB 6oz Creamed Corn – MIAB 4oz Fried Chicken – MIAB 3pc Calamari App – MIAB Modeling Hierarchies 245 Publishing the Data When publishing this information to the data marts, it is better to avoid the need to explode the sales data in the data mart This adds additional calculation burden on the mart that can adversely impact query performance If a data mart is to provide exploded numbers, the data. .. to the hierarchy do not require changes to the data structure, it is compact, and it is easy to record changes and easy to maintain historical perspective However, the downside is query complexity, requiring recursive code to use the structure makes the recursive tree less suitable for the data marts From a data mart standpoint, flat hierarchy structures and bridges provide support for the needed reporting . Loaf Potatoes 01 160 ENTR02319 01 160 ENTR02323 SP SP 118 119 12001 160 ENTR04392SP PR PR PR PR PR PR DV DV DV DV 124 124 124 124 124 1 26 1 26 1 26 1 26 1 26 1 06 113 119 1 26 Row type Business key Surrogate primary key Parent. CS $517,000 012308198 012302028 012303293 012317298 012318418 012310598 01 160 SIDE 01 160 SIDE 01 160 SIDE 01 160 SIDE 01 160 SIDE 01 160 SIDE Product Group Bridge Sales 06/ 01/2002 100 CS $3500 012317298 06/ 01/2002 50 CS $1825 012303293 06/ 02/2002 125 CS $5030 012317298 06/ 02/2002 85 CS $2450 012310598 06/ 02/2002 329. Potatoes and SPAM 012310598 LAC SPAM, SPAM & SPAM Sales Plan 06/ 2002 01 160 SIDE 15,000 CS $517,000 012308198 012302028 012303293 012317298 012318418 012310598 01 160 SIDE 01 160 SIDE 01 160 SIDE 01 160 SIDE 01 160 SIDE 01 160 SIDE Product

Ngày đăng: 08/08/2014, 22:20