Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 46 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
46
Dung lượng
856,71 KB
Nội dung
data model step by step, and discuss deployment issues and problems you may encounter along the way to creating a sustainable and maintainable busi- ness intelligence environment. By the end of the book, you should be fully qualified to begin constructing your BI environment armed with the best design techniques possible for your data warehouse. Introduction 27 Installing Custom Controls 29 Fundamental Relational Concepts CHAPTER 2 E very data-modeling technique has its own set of terms, definitions, and tech- niques. This vernacular permits us to understand complex and difficult con- cepts and to use them to design complex databases. This book applies relational data-modeling techniques for developing the data warehouse data model. To that end, this chapter introduces the terms and terminology of rela- tional data modeling. It then continues with an overview of normalization techniques and the rules for the different normalization levels (for example, first, second, and third normal form) and the purpose for each. Sample data models will be given, showing the progression of normalization. The chapter ends with a discussion of normalization of the data model and the associated benefits. Before we get into the various types of data models we use in creating a data warehouse, it is necessary to first understand why a data model is important and the various types of data models you will create in developing your BI environment. Why Do You Need a Data Model? A model is an abstraction or representation of a subject that looks or behaves like all or part of the original. Examples include a concept car and a model of a 29 building. All models have a common set of objectives. They are designed to help people envision how the parts fit together, help people understand how to use or apply the final product, reduce the development risk, and ensure that the people building the product and those requesting it have the same expec- tations. Let’s look more closely at these benefits: ■■ A model reduces overall risk by ensuring that the requirements of the final product will be satisfactorily met. By examining a “mock-up” of the ultimate product, the intended users can make a reasonable determination of whether the product will indeed fulfill their needs and objectives. ■■ A model helps the developers envision how the final product will inter- face with other systems or functions. The level of effort needed to create the interfaces and their feasibility can be reasonably estimated if a detailed model is created. (In the case of a data warehouse, these inter- faces include the data acquisition and the data delivery programs, where and when to perform data cleansing, audits, data maintenance processes, and so on.) ■■ A model helps all the people involved understand how to relate to the ultimate product and how it will pertain to their work function. The model also helps the developers understand the skills needed by the ulti- mate audience and what training needs to occur to ensure proper usage of the product. ■■ Finally a model ensures that the people building the product and those requesting it have the same expectations about the ultimate outcome of the effort. By examining the model, the potential for a missed opportunity is greatly reduced, and the belief and trust by all parties that the ultimate product will be satisfactory is greatly enhanced. We feel that a model is so important, especially when undertaking a set of pro- jects as complex as building a business intelligence (BI) environment, that we recommend a project be halted or delayed until the justification for a solid set of models is made, signed off on, and funded. Relational Data-Modeling Objects Now that we understand the need for a model, let’s turn our attention to a spe- cific type of model—the data model. Before describing the various levels of models, we need to come up with a common set of terms for use in describing these models. Chapter 2 30 NOTE This book is not intended to replace the many significant and authoritative books written on generic data modeling; rather this section should only serve as a refresher on some of the more significant terms we will use throughout the book. If more detail is needed, please refer to the wealth of data-modeling books at your disposal and listed in the “Recommended Reading” section in this book. Subject The first term to describe is a subject. You will see us refer to a subject-oriented data warehouse and a subject area model. In both cases, the term subject refers to a data subject or a major category of data relevant to the business. A subject area is the subset of the enterprise’s data and consists of related entities and relationships. Customers, Sales, and Products are examples of subject areas. Entity An entity is generally defined as a person, place, thing, concept, or event in which the enterprise has both the interest and the capability to capture and store information. An entity is unique within the data model. For the third nor- mal form data model, there is one and only one entry representing that entity. In entity-relationship diagrams (ERD) or logical data modeling in the classical Codd and Date sense, there are four types of entities from which to build logical or business data models and data warehouse models (see Figure 2.1). ■■ A Primary or Fundamental Entity is defined as an entity that does not depend on any other entity for its existence. Generally each subject area is represented by a primary entity that has the same name (except that the subject area name is pluralized and the entity name is singular), such as Customer, Sale, and Product. These entities are a grouping of dependent data occurring singularly. ■■ A Subtype Entity is a logical division or category of a parent (supertype) entity. Examples of subtypes for the Customer entity are Retail Customer and Wholesale Customer. The subtypes always inherit the characteristics, or attributes and relationships, of the parent entity; that is, the Retail Cus- tomer will inherit any attributes that describe the more generic parent entity, Customer (for example, Customer ID, Customer Name), as well as relationships such as “Customer acquires Product.” ■■ An Attributive or Characteristic Entity is an entity whose existence depends on another entity. It is created to handle a group of data that could occur multiple times for each instance of its parent entity. Customer Address is Fundamental Relational Concepts 31 an attributive entity of Customer since each customer may have multiple addresses. ■■ An Associative or Intersection Entity is an entity that is dependent upon two or more entities for its existence, and that records data at the point of intersection. Order is an associative entity. Its key is composed of the keys of the two parent entities—Customer and Item—and a qualifier such as Date. Attributes that could be retained include the Quantity of the Item and Purchase Date. With these four types of entities, we have all we will need in terms of compo- nents to create the business and data warehouse data models. We describe these models in the next section of this chapter and go through the steps to cre- ate them in Chapters 3 and 4. Element or Attribute An element or attribute is the lowest level of information relating to any entity. It models a specific piece of information or a property of a specific entity. Ele- ments or attributes serve several purposes within an entity. ■■ A primary key serves to uniquely identify the entity and is used in the physical database to locate a record for storage or access. Examples include Customer ID for the Customer entity and Item ID for the Item entity. Figure 2.1 Sample data model. Primary Entity Customer ID Customer Name Customer Type Customer VIP Status Related Customer ID Customer ID No of Children Homeowner Status Customer Sub Type Entities Retail Customer Commercial Customer Customer ID No of Employees SIC Customer ID Address Type Address City State Postal Code Country Attributive Entity Customer Address Customer ID Item ID Purchase Date Quantity Associative Entity Order Customer ID Item ID Purchase Date Quantity Primary Entity Item Chapter 2 32 NOTE The key may be a single element or it may consist of multiple elements that are combined, in which case it is called a concatenated key. Finally, primary keys may or may not have meaning or intelligence. Care must be taken with intelligent primary keys. For example, an Account Code that also depicts geographic area or department is both confusing and erroneous in this data model. See the sidebar for further rules for good keys. ■■ A foreign key is a key that exists because of a parent-child relationship between a pair of entities. The foreign key in the child entity is the pri- mary key in the parent entity and links the two entities together. For example, the Customer ID of the Customer entity is also found in the Order entity, relating the two. ■■ A nonkey element or attribute is not needed to uniquely identify the entity but is used to further describe or characterize information about the entity. Examples of nonkey elements or attributes are Customer Name, Customer Type, Item Color, and Item Quantity. Fundamental Relational Concepts 33 Characteristics of a Good Key The following are characteristics of “well-behaved” keys—those keys that are maintainable and sustainable over the lifetime of the operational system and therefore, the data warehouse: ◆ The key is not null over the scope of integration. It is imperative that there can never be a situation or event that could cause a null key. ◆ The key is unique over the scope of integration. It is also imperative that there can never be a situation where duplicate keys could be generated. ◆ The key is unique by design not by circumstance. Key generation has been carefully thought out and tested under all circumstances. ◆ The key is persistent over time. This is mandatory in the data warehouse environment where data has a very long lifetime. ◆ The key is in a manageable format, that is, there is no undue overhead pro- duced in the creation or maintenance of the key structures. It consists of straightforward integers or character strings, no embedded symbols or odd characters. ◆ The key should not contain embedded intelligence but rather is a generic string. (It may be created based on some intelligence but, once created, the intelligence embedded in the key is never used.) Relationships A relationship documents the business rule associating two entities together. The relationship is used to describe how the two entities are naturally linked to each other. Customer places Order and Order is for Items are examples of relationships in Figure 2.1. There are different characteristics of relationships used in documenting the business rules of the enterprise: ■■ Cardinality denotes the maximum number of occurrences of one entity that can be related to another entity. Usually these are expressed as “one” or “many.” In Figure 2.1, a Customer has many addresses (Bill-to, Ship-to) and every address belongs to one customer. ■■ Optionality or modality indicates whether an entity occurrence must partici- pate in a relationship. This characteristic tells you the minimum number (zero or optional) of occurrences in the relationship. There are also different types of relationships: ■■ An identifying relationship is one in which the primary key of the parent entity becomes a part of the primary key of the child entity. ■■ A nonidentifying relationship is one in which the primary key of the parent entity becomes a nonkey attribute of the child entity. An example of this type of relationship is a recursive relationship, that is, a situation in which an entity is related to itself. Customers who are related to other customers (for example, subsidiaries of corporations and families or households) are examples of recursive relationships. These are used to denote an entity occurrence that is related to another entity occurrence of the same entity. See Figure 2.2 for more on these types of relationships. The components of a relationship in a data model consist of a verb phrase denoting the business rule (places, has, contains), the cardinality, and the modality or optionality of the relationship. Chapter 2 34 Figure 2.2 Identifying and nonidentifying relationships. Types of Data Models A data model is an abstraction or representation of the data in a given environ- ment. It is a collection and a subsequent verification and communication method to fully document the data requirements used in the creation of accu- rate, effective, and efficient physical databases. The data model consists of entities, attributes, and relationships. Within the complete data model, appro- priate meta data, such as definitions and physical characteristics, is defined for each of these. As we stated earlier, we feel that the data models you create for your BI envi- ronment are critical to the overall success of your initiative as well as the long- term maintenance and sustainability of the environment. If the data model is so important, why isn’t it always developed? There are a number of reasons for this: ■■ It’s not easy. Creating the data model takes significant effort from the IT technical staff and business community. Data modelers must be either hired or internal resources trained in the disciplines of data modeling. ■■ It requires discipline and tools. Once the techniques for data modeling are learned, they must be applied with conformity and compliance. The enterprise must create a set of documents detailing the standards it will use in the creation of its data models. Examples of these are naming stan- dards, conflict resolution procedures, data steward roles and responsibili- ties (see Chapter 3 for more on this topic), and meta data capture and maintenance procedures. Identifying Relationship Parent Parent Nonkey Attribute is the parent of Parent Identifier Child Child Nonkey Attribute Child Identifier Parent Identifier (FK) Non-identifying Relationship Parent Parent Nonkey Attribute is the parent of Parent Identifier Child Parent Identifier Child Nonkey Attribute Child Identifier Fundamental Relational Concepts 35 ■■ It requires significant business involvement. A company’s data model must—repeat—must have business community involvement. We are, after all, designing the critical component of the business community’s ultimate competitive weapon. It is for them that we are creating this vast wealth of information. ■■ It postpones the visible work. Data modeling does not create tangible products that can be used by the business community. The models pro- vide the technical staff creating the environment with information about the business environment and some requirements. The old joke goes something like this: “Start coding—I’ll go find out what they want.” ■■ It requires a broad view. The data model for the BI environment must encompass the entire enterprise. It will be used to create the ultimate decision-making components—the data marts—for all strategic analysis. Therefore, it must have a multidepartment and multiprocess perspective. ■■ The benefits of a data model are often not realized with the first project. The real productivity comes in its reuse and its enterprise perspective. Having said all this, what is the impact of not developing a data model? ■■ It becomes very difficult to extract desired data. It is easy to implement something that either misses the users’ expectations or only partially satis- fies them. ■■ Significant effort is spent on interfaces that generally provide little or no business value. ■■ The environment’s complexity increases significantly. When there is no data model to serve as a roadmap, it becomes difficult, if not impossible, to know what you already have in your data warehouse and what needs to be added. ■■ It virtually guarantees lack of data integration because you cannot visual- ize how things fit together. Data warehouse development will not be effective and efficient, and may not even be feasible. ■■ One of the most significant drawbacks is that, without a data model, data will not be effectively managed as an asset. Now, having explained the need for data models, what are the types of data models will you need for your data warehouse implementation? Figure 2.3 shows the types of data models we recommend and the interaction between the models. The following sections describe the different data models neces- sary for a complete, successful, and maintainable BI environment. It is impor- tant to note the two-way arrows. The arrows pointing to the next lower level Chapter 2 36 [...]... foundation and footing you need to deal with these issues before you begin your data warehouse design Let’s start with a set of guidelines garnered from the many years of data modeling we have performed Guidelines and Best Practices The goal of any data model is to completely and accurately reflect the data requirements and business rules for handling that data so that the business can 46 Chapter 2 perform... the data warehouse PA R T TWO Model Development T he data warehouse should represent the enterprise perspective of the data, and that perspective starts with the subject area and business data models Using a fictitious company, we provide a step-by-step process to develop these two models in Chapter 3 Then using the business data model as the starting point, Chapter 4 develops the data warehouse data. .. data warehouse, that is, it is used to capture snapshots of the data over time Chapters 7 and 8 delve into modeling two types of data frequently stored in the data warehouse – hierarchies and transactions The design of the data warehouse reflects a compromise It recognizes both the structure of the source 56 PA R T T W O systems (typically relational) and the structure of the popular dimensional data. .. on the data used by the company The business data model represents that data and is the foundation for all systems’ models, including the data warehouse model In Chapter 2, we described how a third normal form model provides data consistency and restricts data redundancy As we will present in this chapter, the business model is a third normal form model Since one of the objectives of the data warehouse. .. diagram but focus on only the subject area(s) needed for your data warehouse for the business data model You will continue to fill out the business and data warehouse data models as new areas are needed in the data warehouse but should not get sidetracked into trying to create the entire business data model before you generate your first data warehouse subject area See Chapter 4 for more information on this... systems and how these should be addressed in the data warehouse data model One of the distinguishing characteristics of the data warehouse is its historical perspective In Chapter 6, we explain the importance of modeling the calendar in the data warehouse and different approaches for maintaining the historical perspective in this data model These approaches help us deal with the unusual nature of the data. .. activities (for example, data warehouse model development, data transformation logic) are dependent on the number of data elements Using the data entities (and attributes if available) as a starting point provides the project manager with a basis for estimating the effort For example, the formula for developing the data warehouse model may consist of the number of entities and attributes2 multiplied by the... How you deliver the data from the data warehouse into the various data marts will have an impact on the design of the database Considerations include whether the data is delivered via a portal or through a managed query process ■ ■ Security Many times the data warehouse contains highly sensitive data You may choose to invoke security at the DBMS level by physically separating this data from the rest,... organizations, and a company embarking on the development of a subject area model can begin with these Types of Data Models Subject Area Model Business Data Model Operational System Model Data Warehouse System Model Technology Models Figure 2. 3 Data model types 1 In this context, “things” refers to physical items, concepts, events, people, and places 38 Chapter 2 These subject areas conform to standards... representation of the data in a given business environment, and it provides the benefits cited for any model It helps people envision how the information in the business relates to other information in the business (“how the parts fit together”) Products that apply the business data model include operational systems, data warehouse, and data mart databases, and the model provides the meta data (or information . apply the business data model include operational systems, data warehouse, and data mart databases, and the model provides the meta data (or information about the data) for these databases to help. case of a data warehouse, these inter- faces include the data acquisition and the data delivery programs, where and when to perform data cleansing, audits, data maintenance processes, and so on.). Chapter 2 34 Figure 2. 2 Identifying and nonidentifying relationships. Types of Data Models A data model is an abstraction or representation of the data in a given environ- ment. It is a collection and