Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 43 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
43
Dung lượng
571,2 KB
Nội dung
redundant detailed data is a very undesirable condition for the detailed level of data in the data warehouse and defeats its purpose. If multiple development groups will be doing concurrent design and population in the current level of detail, great care must be taken to ensure that no redundant detailed data is created. To ensure that no redundant data is developed, it is necessary to create a data model that reflects the common detailed data. Figure 6.32 shows that multiple development groups have combined their interests to create a common data model. In addition to the currently active development groups, other groups that will have future requirements but who are not currently in a development mode may also contribute their requirements. (Of course, if a group knows it will have future requirements but is unable to articulate them, then those requirements cannot be factored into the common detailed data model.) The common detailed data model reflects the collective need among the different groups for detailed data in the data warehouse. The data model forms the basis of the design for the data warehouse. Fig- ure 6.33 shows that the data model will be broken up into many tables as design progresses, each of which physically becomes part of the warehouse. Because the data model is broken into multiple physical tables at the moment of implementation, the development process for the data warehouse can pro- ceed in an iterative manner. There is no need to build all of the tables at once. In fact, a good reason to build only a few tables at a time is so that the end user CHAPTER 6 236 development group A development group B development group C development group D development group E Figure 6.31 Different development groups that are developing the current level of detail for the data warehouse. Uttama Reddy The Distributed Data Warehouse 237 data unique to development group A data unique to development group B data unique to development group C data unique to development group D common data model data common to development groups A, B, C, D Figure 6.32 A data model identifies data that is common to all the development groups. customer movement history sales history vendor history parts history customer survey history substitute part history customer history sales pricing history shipment history customer complaint history parts reject history shipment arrival history shipment breakage history common data model Figure 6.33 The data warehouse is physically manifested over multiple physical tables and databases. Uttama Reddy feedback can be factored into the modification of the table, if necessary, with a minimum of fuss. In addition, because the common data model is broken into multiple tables, adding new tables at a later time to reflect requirements that are now unknown is not a problem. Different Requirements at Different Levels Normally different groups have unique requirements (see Figure 6.34). These requirements result in what can be termed “local” current-level detail. The local data is certainly part of the data warehouse. It is, however, distinctively differ- ent from the “common” part. The local data has its own data model, usually much smaller and simpler than the common detailed data model. There is, of necessity, nonredundancy of data across all of the detailed data. Figure 6.35 makes this point clear. Of course, the nonredundancy of the data is restricted to nonkey data. Redun- dancy exists at the key level because a form of foreign key relationships is used to relate the different types of data. Figure 6.36 shows the use of foreign keys. The foreign keys found in the tables shown in Figure 6.36 are quite different from the classical foreign key relationships that are governed by referential integrity. Because the data in the data warehouse is gathered by and stored in CHAPTER 6 238 data unique to development group A data unique to development group B data unique to development group C data unique to development group D common data model data common to development groups A, B, C, D local current level detail Figure 6.34 Just because data is not common to all development groups does not mean that it does not belong in the current-level detail of the data ware- house. TEAMFLY Team-Fly ® Uttama Reddy terms of snapshots of data, the foreign key relationships that are found are organized in terms of “artifacts” of relationships. For an in-depth explanation of artifacts of relationships, refer to the www.billinmon.com Tech Topic on the subject, found in the “References” section. An issue that arises is whether to place all of the detailed tables—common and local—under the same technology, as shown in Figure 6.37. There are many good arguments for doing so One is that the cost of a single platform versus multiple platforms is much less. Another is that the cost of support and training will be less. In fact, about the only argument for multiple platforms for detailed data is that with multiple platforms, there may not be the need for a single mas- sively large platform, and as a consequence, the cost of the multiple smaller platforms may be less than a single larger platform. In any case, many organi- zations adopt the strategy of a single platform for all their detailed data ware- house data, and the strategy works well. Other Types of Detailed Data Another strategy is to use different platforms for the different types of data found at the detailed level. Figure 6.38 shows one example of this option. Some of the local data is on one platform, the common data is on another platform, and other local data is on yet another. This option is certainly one that is valid, The Distributed Data Warehouse 239 customer movement history sales history vendor history parts history customer survey history substitute part history customer history sales pricing history shipment history customer complaint history parts reject history shipment arrival history shipment breakage history Figure 6.35 Nonredundancy of nonkey data throughout the many tables that make up the detailed level of the data warehouse. Uttama Reddy and it often satisfies the different political needs of the organization. With this option each group doing development can feel that it has some degree of con- trol of at least its own peculiar needs. Unfortunately, this option has several major drawbacks. First, multiple technologies must be purchased and sup- ported. Second, the end user needs to be trained in different technologies. And finally, the boundaries between the technologies may not be as easy to cross. Figure 6.39 illustrates this dilemma. CHAPTER 6 240 key foreign key foreign key vendor history key foreign key shipment history key foreign key foreign key sales history key foreign key foreign key parts history key foreign key customer history Figure 6.36 Foreign keys in the data warehouse environment. Uttama Reddy The Distributed Data Warehouse 241 data unique to development group A data unique to development group B data unique to development group C data unique to development group D data common to development groups A, B, C, D common technological platform Figure 6.37 The different types of data in the detailed level of the data warehouse all on a common platform. data unique to development group A data unique to development group B data unique to development group C data unique to development group D data common to development groups A, B, C, D platform A platform B platform C Figure 6.38 In this case, the different parts of the detailed level of the data warehouse are scattered across different technological platforms. Uttama Reddy If there are to be multiple technologies supporting the different levels of detail in the data warehouse, it will be necessary to cross the boundaries between the technologies frequently. Software that is designed to access data across differ- ent technological platforms is available. Some of the problems that remain are shown in Figure 6.40. One problem is in the passage of data. If multi-interfaced technology is used for the passage of small amounts of data, then there is no problem with perfor- mance. But if multi-interfaced technology is used to pass large amounts of data, then the software can become a performance bottleneck. Unfortunately, in a DSS environment it is almost impossible to know how much data will be accessed by any one request. Some requests access very little data; other requests access large amounts of data. This problem of resource utilization and management manifests itself when detailed data resides on multiple platforms. CHAPTER 6 242 data unique to development group A data unique to development group B data unique to development group C data unique to development group D data common to development groups A, B, C, D platform A platform B platform C data transfer Figure 6.39 Data transfer and multiple table queries present special technological problems. Uttama Reddy Another related problem is “leaving” detailed data on one side of the data ware- house after it has been transported from the other side. This casual redeploy- ment of detailed data has the effect of creating redundancy of data at the detailed level, something that is not acceptable. The Distributed Data Warehouse 243 platform B platform A platform C bulk transfer of data leaving data after analysis is complete Figure 6.40 Some problems with interfacing different platforms. customer complaint history shipment history customer movement history sales history vendor history parts history customer history sales pricing history meta data Figure 6.41 Meta data sits on top of the actual data contents of the data warehouse. Uttama Reddy Meta Data In any case, whether detailed data is managed on a single technology or on mul- tiple technologies, the role of meta data is not diminished. Figure 6.41 shows that meta data is needed to sit on top of the detailed data warehouse data. Multiple Platforms for Common Detail Data One other possibility worth mentioning is using multiple platforms for common detail of data. Figure 6.42 outlines this scenario. While such a possibility is certainly an option, however, it is almost never a good choice. Managing common current detailed data is difficult enough. The volumes of data found at that level present their own unique problems for man- agement. Adding the complication of having to cross multiple technological platforms merely makes life more difficult. Unless there are very unusual miti- gating circumstances, this option is not recommended. The only advantage of multiple platforms for the management of common detail is that this option satisfies immediate political and organizational differ- ences of opinion. CHAPTER 6 244 platform A platform B platform C common data across many development groups current detailed data Figure 6.42 Common detailed data across multiple platforms-a real red flag in all cases. Uttama Reddy Summary Most environments operate from a single centralized data warehouse. But in some circumstances there can be a distributed data warehouse. The three types of distributed data warehouses are as follows: ■■ Data warehouses serving global businesses where there are local opera- tions and a central operation ■■ Technologically distributed data warehouses where the volume of data is such that the data is spread over multiple physical volumes ■■ Disparate data warehouses that have grown separately through lack of organizational or political alignment Each type of distributed data warehouses has its own considerations. The most difficult aspect of a global data warehouse is the mapping done at the local level. The mapping must account for conversion, integration, and differ- ent business practices. The mapping is done iteratively. In many cases, the global data warehouse will be quite simple because only the corporate data that participates in business integration will be found in the global data warehouse. Much of the local data will never be passed to or participate in the loading of the global data warehouse. Access of global data is done according to the busi- ness needs of the analyst. As long as the analyst is focusing on a local business practice, access to global data is an acceptable practice. The local data warehouses often are housed on different technologies. In addi- tion, the global data warehouse may be on a different technology than any of the local data warehouses. The corporate data model acts as the glue that holds the different local data warehouses together, as far as their intersection at the global data warehouse is concerned. There may be local data warehouses that house data unique to and of interest to the local operating site. There may also be a globally distributed data warehouse. The structure and content of the dis- tributed global data warehouse are determined centrally, whereas the mapping of data into the global data warehouse is determined locally. The coordination and administration of the distributed data warehouse envi- ronment is much more complex than that of the single-site data warehouse. Many issues relate to the transport of the data from the local environment to the global environment, including the following questions: ■■ What network technology will be used? ■■ Is the transport of data legal? ■■ Is there a processing window large enough at the global site? ■■ What technological conversion must be done? The Distributed Data Warehouse 245 Uttama Reddy [...]... of data for the EIS analyst There is no infrastructure to support the EIS environment The Data Warehouse as a Basis for EIS It is in the EIS environment that the data warehouse operates in its most effective state The data warehouse is tailor-made for the needs of the EIS analyst Once the data warehouse has been built, the job of the EIS is infinitely easier than when there is no foundation of data. .. available supply of summary data ■ ■ The structure of the data warehouse to support the drill-down process ■ ■ Data warehouse meta data for the DSS analyst to plan how the EIS system is built ■ ■ The historical content of the data warehouse to support the trend analysis that management wishes to see ■ ■ The integrated data found throughout the data warehouse to look at data across the corporation Event Mapping... done, the potential to save on processing is huge The unstructured data becomes an adjunct to the data warehouse The unstructured data is connected to the data warehouse by means of an index, and the unstructured data is brought into the data warehouse only when there is a specific, prequalified request for it Meta Data and External Data As we’ve discussed, meta data is an important component of the data. .. notification data is merely a file created for users of the system that indicates classifications of data interesting to the users When data is entered into the data warehouse and into the meta data, a check is made to see who is interested in it The person is then notified that the external data has been captured unstructured data meta data data warehouse external data metadata: • document ID • data of... the data warehouse serves the needs of the world of EIS is this: The data warehouse operates at a low level of granularity The data warehouse contains—for lack of a better word—atomic data The atomic data can be shaped one way, then another When management has a new set of needs for information that has never before been encountered in the corporation, the very detailed data found in the data warehouse. .. mgmt There is a production problem financial production mgmt day 3 Suddenly there is a shipment problem financial Figure 7. 7 The constantly changing interests of executives 255 Uttama Reddy 256 CHAPTER 7 data warehouse mgmt the unpredictable nature of management’s focus Figure 7. 8 The data warehouse supports management’s need for EIS data In short, the data warehouse provides the basis of data -the infrastructure—that... Meta data takes on a new role in the face of external and unstructured data Uttama Reddy External/ Unstructured Data and the Data Warehouse unstructured data 271 external data notification file meta data data warehouse Figure 8.4 Another nice feature of external data and meta data is the ability to create a tailored notification file Storing External/Unstructured Data External data and unstructured data. .. strong affinity between the needs of the EIS analyst and the data warehouse The data warehouse explicitly supports all of the EIS analyst’s needs With a data warehouse in place, the EIS analyst can be in a proactive rather than a reactive position The data warehouse enables the EIS analyst to deal with the following management needs: ■ ■ Accessing information quickly ■ ■ Changing their minds (i.e., flexibility)... departmental (data mart) level is data at the data warehouse lightly summarized level Finally, the light summarization at the data warehouse level is supported by archival/dormant data The sequence of summaries just described is precisely what is required to support drill-down EIS analysis Almost by default, the data warehouse lays a path for drill-down analysis At the different levels of the data warehouse, ... of the granular atomic data that resides in the data warehouse, analysis is flexible and responsive The detailed data in the data warehouse sits and waits for future unknown needs for information This is why the data warehouse turns an organization from a reactive stance to a proactive stance Where to Turn The EIS analyst can turn to various places in the architecture to get data In Figure 7. 9, the . local data warehouses. The corporate data model acts as the glue that holds the different local data warehouses together, as far as their intersection at the global data warehouse is concerned. There. different groups for detailed data in the data warehouse. The data model forms the basis of the design for the data warehouse. Fig- ure 6.33 shows that the data model will be broken up into many. explore them further. The manager then looks at the regions that have con- tributed to the summary analysis. The figures analyzed are those of the Western region, the Southeast region, the Northeast