Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 30 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
30
Dung lượng
1,62 MB
Nội dung
Information Usage Unpredictable. Let us say you are building an operational system for order processing in your company. Based on the business functions that make up order processing and the data needed to support these functions, you can create a conceptual data model. You may use the E-R modeling technique. For an operational system such as order processing, users are able to give you precise details of the required functions, information content, and usage patterns. In striking contrast, for a data warehousing system, users are generally unable to define their requirements precisely and clearly. They are unsure how exactly they will be using the data warehouse and cannot express how they would use the information or process it. Of course, they know that they will use the data warehouse for analysis, but they are not clear how exactly they would do that. The whole process of defining the information requirements for a data warehouse is nebulous. If so, how can you create a data model for something the users are unable to define clearly and precisely. Dimensional Nature of Business Data. Fortunately, the situation is not as hopeless as it seems. Even though users cannot fully describe what they want in a data warehouse, they can provide you with some useful insights into how they think about the business. They can tell you what measurement units are important to them. Each department can let you know how they measure success in that department. Users can provide clues about how they combine the various pieces of information for strategic decision making. Managers think of the business in terms of business dimensions. Let us underst and what these business dimensions are by considering a few examples. Look at the following examples of the kinds of questions managers are likely to ask for decision making. Marketing Vice President. How much did my new product generate month by month, in the southern division, by user demographic, by sales office, relative to the previous version and compared with plan? Marketing Manager. Give me sales statistics by produ cts, summarized by product categories, daily, weekly, and monthly, by sales districts, by distribution channels. Financial Controller. Show me expenses, listing actual versus budget, by months, quar- ters, and annual, by budget line items, by district, division, summarized for the whole company. The marketing vice president is interested in the revenue generated by her new product; but she is not interested in a single number. She is interested in the revenue numbers by month, in a certain division, by demographic, by sales office, relative to the previous product version, and compared with plan. So, the marketing vice president wants the revenue numbers broken down by month, division, customer demographic, sales office, product version, and plan. These are her business dimensions along which she wants to analyze her revenue numbers. Similarly, for the marketing manager, his business dimensions are product, product cat- egory, time (day, week, month), sales district, and distribution channel. For the financial controller, the business dimensions are budget line, time (month, quarter, year), district, and division. If users of a data warehouse think in terms of business dimensions for decision making, as a data modeler, you must also think of business dimensions for the modeling process. 306 CHAPTER 9 MODELING FOR DECISION-SUPPORT SYSTEMS Although the details of actual usage of the data warehouse could be unclear, the business dimensions used by managers are not ambiguous at all. So, as an early step in the data modeling process, determine the business dimensions. Examples of Business Dimensions. The concept of business dimensions is fundamental to requirements definit ion and data modeling for the data warehouse. There- fore, let us look at a few examples. Figure 9-4 displays business dimensions for four differ- ent cases. Let us quickly review the examples. For the supermarket chain, the measurements that are analyzed are the sales units. These are analyzed along four business dimensions: time, promotion, product, and store. For the insurance company, the business dimensions are different—more appropriate to that business. Here you want to analyze claim amounts by six business dimensions: time, agent, claim, insured party, policy, and status. Observe the other two exampl es and note the business dimensions. These are different, more appropriate for the nature of the businesses. What we find from these examples is that the business dimensions are different and relevant to the industry and to the subject for analysis. We also note that generally the time dimension is a common dimension in all examples. Almost all business analyses are performed over time. Information Package. Having understood the concept of business dimensions and how these enable us to move forward in data modeling , let us introduce the notion of an infor- mation package. Creation of an information package is a preliminary step for recording the requirements and preparing for the data modeling process. FIGURE 9-4 Examples of business dimensions. DATA WAREHOUSE 307 An information package incorporates the basic measurements and the business dimensions along which the basic measurements may be analyzed. Each information package refers to one information subject. Figure 9-5 shows an information package for automaker sales. Go through the figure carefully and note the following. Business Dimensions. These are shown as column headings. Key Business Metrics or Facts. These are the measurements that are analyzed along the business dimensions. These are shown at the bottom in the information package diagram. In this example, many different metrics are meant to be available for analysis. Dimension Hierarchies/Categories. These are various levels of individual dimensions for drilling down or up. The hierarchy for the time dimension ranges from year to actual date. Note these hierarchy indicators in the dimension columns. These columns are also used to indicate categories within business dimensions. For example, Single Brand Flag is a category indicator. Here the intention is to analyze the metrics by dealers who sell only single brands. DIMENSIONAL MODELING You have reviewed information package diagrams. These diagrams reflect the ways in which managers and other professional tend to make use of the data warehouse for analysis and FIGURE 9-5 Information package: automaker sales. 308 CHAPTER 9 MODELING FOR DECISION-SUPPORT SYSTEMS decision making. Information pac kages are informa tion matrices showingthe metrics, business dimensions, and the hierarchies and categories within individual business dimensions. The information package diagrams form the basis for proceeding to the data modeling process. You can use the information contained in these packages to come up with a con- ceptual model and, thereafter, a logical model. As you know, the logical model may be a relational model if you are implementing your data warehouse using a relational DBMS. The modeling process results in what is known as a dimensional data model. Dimensional Modeling Basics Dimensional modeling gets its name from the business dimensions we need to incorporate into the model from the nature of business analysis. Dimensional modeling is a modeling technique to structure business dimensions and metrics that are analyzed along these dimensions. This modeling technique is intuitive for that purpose. In practice, a dimen- sional data model has proved to provide high performance for queries and analysis. The multidimensional information package diagram is the foundation for the dimen- sional model. First, the dimensional model consists of specific data structures needed to represent the business dimensions. These data structures also contain the metrics or facts. Figure 9-5 shows the information package diagram for automaker sales. Go back and review the figure. In the bottom section of the diagram, you observe the list of measure- ments or metrics that the automaker wants to use for analysis. Next, look at the columns headings. These are the business dimensions along which the automaker wants to analyze the metrics. Under each column heading, you notice the dimension hierarchies and categories within that business dimension. What you see under the column headings are the attributes relating to that business dimension. Reviewing the information package diagram, we note three types of data elements: (1) measurements or metrics (called facts), (2) business dimensions, and (3) attributes for each business dimension. So, when we put together the dimensional model to represent the information contained in the information package, we need to come up with data structures to represent these three types of data elements. How to do this? Fact Entity Type. First, let us work with the measurements or metrics seen at the bottom of the diagram. These are facts for analysis. In the automaker sales diagram, the facts are as follows: actual sale price, MSRP sale price, options price, full price, dealer add-ons, dealer credits, dealer invoice, amount of down payment, manufacturer proceeds, and amount financed. Each of these items is a measurement or fact. Actual sale price is a fact about what the actual price is for the sale. Full price is a fact about what the full price is relating to the sale. As we review each of these factual items, we find that we can group all of these into a single data structure. Borrowing the terminology used in the examples of the previous chapters, we may call the data structure as a fact entity type. For the automaker sales analy- sis, this fact entity type would be the AUTOMAKER-SALES entity type. Therefore, each fact item or measurement would be an attribute for this entity type. We have determined one of the data structures to be included in the dimensional model for automaker sales and derived the AUTOMAKER-SALES fact entity type with its attri- butes from the information package diagram. This is shown in Figure 9-6. Dimension Entity Type. Let us move on to the other sections of the information package diagram, taking the business dimensions one by one. Look at the product business DIMENSIONAL MODELING 309 dimension. This business dimension is used when we want to analyze by products. Some- times our analysis could be a breakdown of sales by individual models. Anothe r analysis could be at a higher level by product lines. Yet another analysis could be at even a higher level by product categories. The list of data items relating to the product dimension are as follows: model name, model year, package styling, product line, product category, exterior color, interior color, and first model year. What can we do with all these data items in our dimensional model? All of these relate to the product in some way. We can, therefore, group all of these data items in one data structure and call it a dimension entity type. More specifically, this would be the PRODUCT dimension entity type. The data items listed above would all be attributes of the PRODUCT dimension entity type. Look further into the information package diagram. You note the other business dimen- sions shown as column headings. In the case of automaker sales information package, these other business dimensions are dealer, customer demographics, payment method, and time. Just as we formed the PRODUCT dimension entity type, we can put together the remaining dimension entity types. The data items shown in each column would then be the attributes for each corresponding dimension entity type. Figure 9-7 puts all of these together. It shows how the various dimensions tables are formed from the information package diagram. Study the figure carefully and note how each dimension entity type is formed. Arrangement of Entity Types. Thus far, we have formed the fact and dimension entity types. How should these be arranged in a dimensional model? What are the relation- ships and how should the relationships be marked? FIGURE 9-6 Formation of AUTOMAKER-SALES fact entity type. 310 CHAPTER 9 MODELING FOR DECISION-SUPPORT SYSTEMS Before we decide how to arrange the fact and dimension entity types in our dimensional model and mark the relationships, let us go over what the dimensional model needs to achieve and what its purposes are. Here are some criteria for combining the entity types into a dimensional model. . The model should provide the best data access. . The whole model must be query-centric. . It must be optimized for queries and analyses. . The model must express that the dimension entity types are related to the fact entity type. . It must be structured in such a way that every dimension entity type can have an equal chance of interacting with the fact entity type. . The model should allow drilling down or rolling up along dimension hierarchies. With these requirements, we find that a dimensional model with the fact entity type in the middle and the dimension entity types arranged around the fact entity type appears to be the ideal arrangement. In such an arrangement, each dimension entity type will have a direct relationship with the fact entity type in the middle. This is necessary because every dimension entity type with its attributes must have an even chance of participating in a query to analyze the attributes in the fact entity type. Such an arrangement in the dimensional model looks like a star formation. The fact entity type is at the core of the star and the dimensional entity types are along the spikes of the star. Figure 9-8 shows this star formation for automaker sales. FIGURE 9-7 Formation of automaker dimension entity types. DIMENSIONAL MODELING 311 STAR Schema The STAR formation introduced in the previous subsection is known as the STAR sc hema. Now that you have been introduced to the STAR schema, let us take a simple example and examine its characteristics. Creating the STAR schema is the fundamental data modeling task for the data warehouse storage. It is necessary to gain a good grip of this task. Review of a Simple STAR Schema. Let us take a simple STAR schema designed for order analysis. Assume this to be a schema for a manufacturing company and that the mar- keting department is interested in determining how they are doing with the orders received by the company. Figure 9-9 shows this simple STAR schema. It consists of ORDERS fact entity type shown in the middle of schema diagram. Surrounding the fact entity type are the four dimension entity types of CUSTOMER, SALESPERSON, ORDER-DATE, and PRODUCT. Let us begin to examine this STAR schema. Look at the structure from the point of view of the marketing department. Users in this department will analyze the orders using dollar amounts of cost, profit margin, and sold quantity. This information is found in the fact entity type of the structure. Users will analyze these measurements by breaking down the numbers in combinations by customer, salesperson, date, and product. All these dimen- sions along which users will analyze are found in the structure. Thus, the STAR schema structure is a structure that can be easily understood by users and with which they can work FIGURE 9-8 Star formation for automaker sales. 312 CHAPTER 9 MODELING FOR DECISION-SUPPORT SYSTEMS comfortably. The structure mirrors how users normally view their critical measures along their business dimensions. When you look at the order dollars, the STAR schema structure intuitively answ ers questions of what, when, by whom, and to whom. For example, users can easily visualize answers to questions such as: For a given set of customers for a certain month, what is the quantity sold of a specific product, enabled by salespersons in a given territory? Inside a Dimension Entity Type. A significant component of the STAR schema is the set of dimension entity types. These represent the business dimensions along which the metrics are analyzed. Let us look inside a dimension entity type and study its character- istics. See Figure 9-10 showing the contents of one of the dimension entity types and also a set of characteristics. Note the following comments about the dimension entity type and what is inside. Dimension Entity Type Identifier. This is usually a surrogate identifier to uniquely identify each instance dimension entity. Sometimes, one or more attributes can be used as the identifier. However, a short surrogate identifier is preferred. Entity Type Is Wide. Typically, a dimension entity type is wide in the sense it has many attributes. It is not uncommon for some dimension entity types to have even more than 50 attributes. FIGURE 9-9 Simple STAR schema for order analysis. DIMENSIONAL MODELING 313 Textual Attributes. You will seldom find any numeric attributes used for calculations. Attributes are of textual format representing textual descriptions of components within business dimension. Users will compose their queries using these textual descriptors. Not Normalized. The attributes in a dimension entity type are used over and over again in queries. For efficient query performance, it is best if the query picks up the value of an attribute from the dimension entity type and goes directly to the fact entity type and not through intermediary entity types. If you normalize the dimension entity type, you will be creating such intermediary entity types and that will reduce the efficiency in query processing. Drilling Down, Rolling Up. The attributes in a dimension entity type provide the ability to get to the details from higher levels of aggregation to lower levels of details. For example, the three attributes zip, city, and state form a hierarchy. You may get the total sales by state, then drill down to total sales by city, and then by zip. Going the other way, you may first look at totals by zip and then roll up to totals by city and then state. Multiple Hierarchies. In the example of the CUSTOMER dimension entity type, there is a single hierarchy going up from individual customer to zip, city, and state. But, dimension entity types often provide for multiple hierarchies. However, dimension entity types such as product may have dimension hierarchies such as marketing–product–category, market- ing–product–department, finance–product–category, and finance–product–department so that different user groups may drill down or roll up differently. Fewer Number of Occurrences. Usually, a dimension entity type has fewer instances than a fact entity type. A product dimension entity type for an automaker may just have 500 occurrences or less. Inside the Fact Entity Type. Let us now get into a fact entity type and examine the components. Remember, this is the representation of where we keep the measurements. We may keep the details at the lowest possible level. In a department store’s fact entity type for sales analysis, the level may be as units sold in each individual transaction at FIGURE 9-10 Inside a dimension entity type. 314 CHAPTER 9 MODELING FOR DECISION-SUPPORT SYSTEMS the cashier’s checkout. Some fact entity types may represent storage of summary data. Such entity types are known as aggregate fact entity types. Figure 9-11 shows the contents of a fact entity type and also a set of its characteristics. Note the following characteristics of the fact entity type. Fact Entity Type Identifier. None of the attributes qualify to be an identifier for the fact entity type. These attributes are numer ic units for measurements. We will discuss the iden- tifier in a later subsection when studying the transition to logical model. It will be more meaningful at that point. Data Grain. This is an important characteristic of the fact entity type. Data grain is the lowest level of detail for measurements or metrics. In the example of Figure 9-11, the metrics are at the detailed or lowest level. The quantity ordered relates to the quantity of a particular product on a certain day, for a specific customer, and procured by a specific sales representative. If we keep the quantity ordered as the quantity of a specific product for each month, then the data grain is different and it is at a higher level. So, when you model a fact entity type, be careful about the level of data grain it is supposed to represent through its attributes. Fully Additive Measures. Let us look at the attributes OrderDollars, ExtendedCost, and QuantityOrdered. Each of these relates to a particular product on a certain date for a specific customer procured by an individual sales representative. In a certain query, let us say that the user wants the totals for the particular product, not for a specific customer, but for customers in a particular state. Then we need to find all the instances of the entity type relating to all the customers in that state and add OrderDollars, ExtendedCost, and QuantityOrdered to come with the totals. The values of these attributes may be summed up by simple addition. Such measures are known as fully additive measures. Aggregation of fully additive measures is done by simple addition. While designing a fact entity type, you must be cognizant of fully additive measures and note them in the model. Semiadditive Measures. Consider the MarginDollars attribute in the fact entity type. For example, if OrderDollars has a value of 120 and ExtendedCost 100, then FIGURE 9-11 Inside a fact entity type. DIMENSIONAL MODELING 315 [...]... context Many of the data repositories supporting OLAP are proprietary databases from vendors These specific data structures determine how data modeling has to be done for OLAP You may be able to adapt general data modeling principles to suit the requirements of OLAP data structures Features and Functions of OLAP A data warehouse stores data and provides simpler access and analysis of data However, OLAP... figure must be taken into consideration while performing data modeling for OLAP systems Pay attention to the different types of data in an OLAP system When you model the data structures for your OLAP system, you need to provide for these types of data Data Modeling for MOLAP As a prerequisite to creation and storage of hypercubes in proprietary MDDBs, data must be in the form of multidimensional representations... relational databases Data is perceived as relational tables with rows and columns However, the model presents data to users in the form of multidimensional hypercubes In order to hide the storage structure to the user and present data multidimensionally, a semantic layer of meta -data is created The metadata layer supports the mapping of dimensions to the relational tables Additional metadata supports... directed to the relational database Unlike the MOLAP model, static multidimensional hypercubes are not precreated and stored OLAP SYSTEMS FIGURE 9-25 The MOLAP model FIGURE 9-26 The ROLAP model 331 332 CHAPTER 9 MODELING FOR DECISION-SUPPORT SYSTEMS Data Modeling for OLAP In order to perform data modeling for OLAP, let us first examine some significant characteristics of data in such a system Review... following list highlighting differences between OLAP and warehouse data: An OLAP system stores and uses much less data compared with a data warehouse Data in an OLAP system is summarized The lowest level of detail as in the data warehouse is very infrequent OLAP data is more flexible for processing and analysis partly because there is much less data to work with Every instance of the OLAP system is customized... view of the data from the MDDB to the users Multidimensional database systems are proprietary software systems These systems provide capabilities to create hypercubes and to consolidate them where necessary during the process that loads data into the MDDB from the main data warehouse Users who can use summarized data enjoy fast response times from the consolidated data ROLAP In this model, data is store... better MOLAP performance FIGURE 9-28 MOLAP data design and implementation 334 CHAPTER 9 MODELING FOR DECISION-SUPPORT SYSTEMS FIGURE 9-29 ROLAP data design and implementation Data Modeling for ROLAP As you know, ROLAP systems do not store prefabricated hypercubes in MDDBs; generally, relational DBMSs are used for implementation Data storage implementations in data warehouses also generally use relational... In today’s data warehousing environment, with such tremendous progress in analysis tools from various vendors, you cannot have a data warehouse without OLAP We will explore the nature of OLAP and why it is essential There are two major methods of implementing OLAP Each method is supported by specific data repositories We will look at the structure of the data repositories and discuss data modeling in... representing data points along the edges With three groups of data two groups of business dimensions and one group of metrics—we can easily visualize the data as being along the edges of a cube Now add another business dimension to the model Let us add the store dimension That results in three business dimensions plus the metrics data four data groups in all How can you represent these four groups of data. .. approach, data is stored in multidimensional databases (MDDB) It is, therefore, called MOLAP, or multidimensional OLAP Typically, proprietary DBMSs by specific vendors run these MDDBs In the second approach, data is stored in relational databases but used as multidimensional data by the OLAP applications This approach is known as ROLAP, or relational OLAP Regular, powerful relational DBMSs administer the data . structure of the data repositories and discuss data modeling in that context. Many of the data repositories supporting OLAP are proprietary databases from vendors. These specific data structures. determine how data modeling has to be done for OLAP. You may be able to adapt general data modeling principles to suit the requirements of OLAP data structures. Features and Functions of OLAP A data warehouse. implementing your data warehouse using a relational DBMS. The modeling process results in what is known as a dimensional data model. Dimensional Modeling Basics Dimensional modeling gets its