Oracle9 i Data Warehousing Guide Release 1 (9.0.1) June 2001 Part No. A90237-01 Oracle9i Data Warehousing Guide, Release 1 (9.0.1) Part No. A90237-01 Copyright © 2001 Oracle Corporation. All rights reserved. Primary Author: Paul Lane Contributing Author: Viv Schupmann (Change Data Capture) Contributors: Patrick Amor, Hermann Baer, Srikanth Bellamkonda, Randy Bello, Tolga Bozkaya, Benoit Dageville, John Haydu, Lilian Hobbs, Hakan Jakobsson, George Lumpkin, Jack Raitto, Ray Roccaforte, Gregory Smith, Ashish Thusoo, Jean-Francois Verrier, Gary Vincent, Andy Witkowski, Zia Ziauddin Graphic Designer: Valarie Moore The Programs (which include both the software and documentation) contain proprietary information of Oracle Corporation; they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright, patent, and other intellectual and industrial property laws. Reverse engineering, disassembly, or decompilation of the Programs is prohibited. The information contained in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. Oracle Corporation does not warrant that this document is error free. Except as may be expressly permitted in your license agreement for these Programs, no part of these Programs may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Oracle Corporation. If the Programs are delivered to the U.S. Government or anyone licensing or using the programs on behalf of the U.S. Government, the following notice is applicable: Restricted Rights Notice Programs delivered subject to the DOD FAR Supplement are "commercial computer software" and use, duplication, and disclosure of the Programs, including documentation, shall be subject to the licensing restrictions set forth in the applicable Oracle license agreement. Otherwise, Programs delivered subject to the Federal Acquisition Regulations are "restricted computer software" and use, duplication, and disclosure of the Programs shall be subject to the restrictions in FAR 52.227-19, Commercial Computer Software - Restricted Rights (June, 1987). Oracle Corporation, 500 Oracle Parkway, Redwood City, CA 94065. The Programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently dangerous applications. It shall be the licensee's responsibility to take all appropriate fail-safe, backup, redundancy, and other measures to ensure the safe use of such applications if the Programs are used for such purposes, and Oracle Corporation disclaims liability for any damages caused by such use of the Programs. Oracle is a registered trademark, and LogMiner, Oracle9i, Oracle Call Interface, Oracle Database Configuration Assistant, Oracle Enterprise Manager, Oracle interMedia, Oracle Net, Oracle Spatial, Oracle Store, Oracle Text, Oracle Trace, PL/SQL, and Real Application Clusters, and SQL*Plus ar e trademarks or registered trademarks of Oracle Corporation. Other names may be trademarks of their respective owners. iii Contents Send Us Your Comments xvii Preface xix Part I Concepts 1 Data Warehousing Concepts What is a Data Warehouse? . 1-2 Subject Oriented 1-2 Integrated . 1-2 Nonvolatile 1-3 Time Variant 1-3 Contrasting OLTP and Data Warehousing Environments . 1-3 Data Warehouse Architectures . 1-5 Data Warehouse Architecture (Basic) 1-5 Data Warehouse Architecture (with a Staging Area) 1-6 Data Warehouse Architecture (with a Staging Area and Data Marts) . 1-7 Part II Logical Design 2 Logical Design in Data Warehouses Logical versus Physical Design in Data Warehouses 2-2 Creating a Logical Design . 2-2 Data Warehousing Schemas . 2-3 iv Star Schemas 2-4 Other Schemas . 2-4 Data Warehousing Objects 2-5 Fact Tables 2-5 Dimension Tables . 2-6 Unique Identifiers . 2-7 Relationships . 2-8 Typical Example of Data Warehousing Objects and Their Relationships 2-8 Part III Physical Design 3 Physical Design in Data Warehouses Moving from Logical to Physical Design . 3-2 Physical Design . 3-2 Physical Design Structures 3-4 Tablespaces 3-4 Tables and Partitioned Tables . 3-5 Views 3-5 Integrity Constraints 3-5 Indexes and Partitioned Indexes 3-6 Materialized Views . 3-6 Dimensions 3-6 4 Hardware and I/O Considerations in Data Warehouses Overview of Hardware and I/O Considerations in Data Warehouses . 4-2 Why Stripe the Data? 4-2 Automatic Striping . 4-3 Manual Striping 4-4 Local and Global Striping 4-4 Analyzing Striping . 4-6 RAID Configurations . 4-9 RAID 0 (Striping) 4-10 RAID 1 (Mirroring) . 4-10 RAID 0+1 (Striping and Mirroring) . 4-10 v Striping, Mirroring, and Media Recovery . 4-10 RAID 5 4-11 The Importance of Specific Analysis 4-12 5 Parallelism and Partitioning in Data Warehouses Overview of Parallel Execution 5-2 When to Implement Parallel Execution . 5-2 Granules of Parallelism . 5-3 Block Range Granules 5-3 Partition Granules . 5-4 Partitioning Design Considerations . 5-4 Types of Partitioning 5-4 Partition Pruning 5-13 Partition-wise Joins . 5-15 6 Indexes Bitmap Indexes 6-2 Bitmap Join Indexes 6-6 B-tree Indexes 6-10 Local Indexes Versus Global Indexes . 6-10 7 Integrity Constraints Why Integrity Constraints are Useful in a Data Warehouse 7-2 Overview of Constraint States . 7-3 Typical Data Warehouse Integrity Constraints . 7-4 UNIQUE Constraints in a Data Warehouse . 7-4 FOREIGN KEY Constraints in a Data Warehouse . 7-5 RELY Constraints . 7-6 Integrity Constraints and Parallelism 7-7 Integrity Constraints and Partitioning 7-7 View Constraints 7-7 8 Materialized Views Overview of Data Warehousing with Materialized Views . 8-2 vi Materialized Views for Data Warehouses . 8-2 Materialized Views for Distributed Computing 8-3 Materialized Views for Mobile Computing 8-3 The Need for Materialized Views 8-3 Components of Summary Management . 8-5 Terminology 8-7 Schema Design Guidelines for Materialized Views 8-8 Types of Materialized Views 8-10 Materialized Views with Aggregates . 8-10 Materialized Views Containing Only Joins 8-16 Nested Materialized Views . 8-18 Creating Materialized Views 8-22 Naming . 8-23 Storage Characteristics . 8-23 Build Methods . 8-24 Enabling Query Rewrite 8-24 Query Rewrite Restrictions . 8-25 Refresh Options . 8-26 ORDER BY Clause 8-30 Materialized View Logs . 8-30 Using Oracle Enterprise Manager 8-31 Using Materialized Views with NLS Parameters 8-31 Registering Existing Materialized Views . 8-32 Partitioning and Materialized Views 8-34 Partition Change Tracking 8-34 Partitioning a Materialized View . 8-38 Partitioning a Prebuilt Table . 8-39 Rolling Materialized Views . 8-40 Choosing Indexes for Materialized Views . 8-40 Invalidating Materialized Views . 8-41 Security Issues with Materialized Views . 8-41 Altering Materialized Views . 8-42 Dropping Materialized Views 8-42 Analyzing Materialized View Capabilities . 8-43 Using the DBMS_MVIEW.EXPLAIN_MVIEW Procedure . 8-43 vii MV_CAPABILITIES_TABLE.CAPABILITY_NAME Details . 8-46 MV_CAPABILITIES_TABLE Column Details . 8-48 Overview of Materialized View Management Tasks 8-49 9 Dimensions What are Dimensions? . 9-2 Creating Dimensions . 9-4 Multiple Hierarchies 9-7 Using Normalized Dimension Tables . 9-9 Dimension Wizard 9-10 Viewing Dimensions 9-10 Using The DEMO_DIM Package 9-10 Using Oracle Enterprise Manager 9-11 Using Dimensions with Constraints . 9-11 Validating Dimensions 9-12 Altering Dimensions 9-13 Deleting Dimensions . 9-14 Part IV Managing the Warehouse Environment 10 Overview of Extraction, Transformation, and Loading Overview of ETL . 10-2 ETL Tools 10-3 Daily Operations . 10-4 Evolution of the Data Warehouse 10-4 11 Extraction in Data Warehouses Overview of Extraction in Data Warehouses . 11-2 Understanding Extraction Methods in Data Warehouses . 11-2 Logical Extraction Methods 11-3 Physical Extraction Methods . 11-4 Change Data Capture . 11-5 Data Warehousing Extraction Examples 11-8 Extraction Using Data Files . 11-8 viii Extraction Via Distributed Operations 11-11 12 Transportation in Data Warehouses Overview of Transportation in Data Warehouses 12-2 Understanding Transportation Mechanisms in Data Warehouses 12-2 Transportation Using Flat Files 12-2 Transportation Through Distributed Operations 12-2 Transportation Using Transportable Tablespaces . 12-3 13 Loading and Transformation Overview of Loading and Transformation in Data Warehouses . 13-2 Transformation Flow 13-2 Loading Mechanisms . 13-4 SQL*Loader . 13-5 External Tables 13-6 OCI and Direct-path APIs . 13-8 Export/Import . 13-8 Transformation Mechanisms 13-8 Transformation Using SQL . 13-9 Transformation Using PL/SQL 13-16 Transformation Using Table Functions . 13-16 Loading and Transformation Scenarios 13-26 Parallel Load Scenario 13-26 Key Lookup Scenario . 13-34 Exception Handling Scenario 13-34 Pivoting Scenarios . 13-35 14 Maintaining the Data Warehouse Using Partitioning to Improve Data Warehouse Refresh . 14-2 Optimizing DML Operations During Refresh . 14-5 Implementing an Efficient Merge . 14-5 Maintaining Referential Integrity . 14-7 Purging Data 14-8 Refreshing Materialized Views 14-9 ix Complete Refresh . 14-10 Fast Refresh . 14-11 ON COMMIT Refresh 14-11 Manual Refresh Using the DBMS_MVIEW Package 14-11 Refresh Specific Materialized Views with REFRESH 14-12 Refresh All Materialized Views with REFRESH_ALL_MVIEWS . 14-13 Refresh Dependent Materialized Views with REFRESH_DEPENDENT . 14-13 Using Job Queues for Refresh . 14-15 When Refresh is Possible . 14-15 Recommended Initialization Parameters for Parallelism . 14-15 Monitoring a Refresh . 14-15 Checking the Status of a Materialized View . 14-16 Tips for Refreshing Materialized Views with Aggregates . 14-16 Tips for Refreshing Materialized Views Without Aggregates . 14-19 Tips for Refreshing Nested Materialized Views 14-20 Tips After Refreshing Materialized Views 14-21 Using Materialized Views With Partitioned Tables . 14-22 Fast Refresh with Partition Change Tracking 14-22 Fast Refresh with CONSIDER FRESH . 14-26 15 Change Data Capture About Oracle Change Data Capture . 15-2 Publish and Subscribe Model . 15-3 Example of a Change Data Capture System . 15-4 Components and Terminology for Synchronous Change Data Capture . 15-5 Installation and Implementation . 15-8 Security . 15-8 Columns in a Change Table 15-8 Views . 15-10 Synchronous Mode of Data Capture . 15-11 Publishing Change Data . 15-11 Subscribing to Change Data . 15-13 Steps Required to Subscribe to Change Data . 15-13 What Happens to Subscriptions When the Publisher Makes Changes 15-16 Export and Import Considerations 15-17 x 16 Summary Advisor Overview of the Summary Advisor in the DBMS_OLAP Package 16-2 Summary Advisor Wizard 16-6 Using the Summary Advisor 16-6 Identifier Numbers . 16-7 Workload Management . 16-8 Loading a User-Defined Workload 16-9 Loading a Trace Workload 16-11 Loading a SQL Cache Workload 16-15 Validating a Workload . 16-17 Removing a Workload . 16-18 Using Filters with the Summary Advisor . 16-18 Removing a Filter 16-22 Recommending Materialized Views 16-23 SQL Script Generation 16-27 Summary Data Report . 16-29 When Recommendations are no Longer Required 16-31 Stopping the Recommendation Process 16-32 Sample Sessions 16-32 Estimating Materialized View Size . 16-37 ESTIMATE_MVIEW_SIZE Parameters . 16-37 Is a Materialized View Being Used? 16-38 DBMS_OLAP.EVALUATE_MVIEW_STRATEGY Procedure . 16-39 Part V Warehouse Performance 17 Schema Modeling Techniques Schemas in Data Warehouses . 17-2 Star Schemas 17-2 Optimizing Star Queries . 17-4 Tuning Star Queries 17-4 Using Star Transformation 17-5 [...]... This section introduces basic data warehousing concepts It contains the following chapter: s Data Warehousing Concepts 1 Data Warehousing Concepts This chapter provides an overview of the Oracle data warehousing implementation It includes: s What is a Data Warehouse? s Data Warehouse Architectures Note that this book is meant as a supplement to standard texts about data warehousing This book focuses... Analysis Sales Reporting Inventory Mining Metadata Operational System Summary Data Raw Data Flat Files Note: Data marts are an important part of many warehouses, but they are not the focus of this book See Also: Data Mart Suites documentation for further information regarding data marts Data Warehousing Concepts 1-7 Data Warehouse Architectures 1-8 Data Warehousing Guide ... customer." s Historical Data Data warehouses usually store many months or years of data This is to support historical analysis OLTP systems usually store data from only a few weeks or months The OLTP system stores only historical data as needed to successfully meet the requirements of the current transaction 1-4 Data Warehousing Guide Data Warehouse Architectures Data Warehouse Architectures Data warehouses... this typical architecture Figure 1–3 Architecture of a Data Warehouse with a Staging Area Data Sources Staging Area Warehouse Operational System Users Analysis Metadata Operational System Flat Files 1-6 Data Warehousing Guide Summary Data Raw Data Reporting Mining Data Warehouse Architectures Data Warehouse Architecture (with a Staging Area and Data Marts) Although the architecture in Figure 1–3 is... historical data be moved to an archive A data warehouse’s focus on change over time is what is meant by the term time variant Contrasting OLTP and Data Warehousing Environments Figure 1–1 illustrates key differences between an OLTP system and a data warehouse Figure 1–1 Contrasting OLTP and Data Warehousing Environments OLTP Data Warehouse Complex data structures (3NF databases) Multidimensional data structures... are: s Data Warehouse Architecture (Basic) s Data Warehouse Architecture (with a Staging Area) s Data Warehouse Architecture (with a Staging Area and Data Marts) Data Warehouse Architecture (Basic) Figure 1–2 shows a simple architecture for a data warehouse End users directly access data derived from several source systems through the data warehouse Figure 1–2 Architecture of a Data Warehouse Data Sources... texts are: s The Data Warehouse Toolkit by Ralph Kimball (John Wiley and Sons, 1996) s Building the Data Warehouse by William Inmon (John Wiley and Sons, 1996) Data Warehousing Concepts 1-1 What is a Data Warehouse? What is a Data Warehouse? A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing It usually contains historical data derived... Operational System Users Analysis Metadata Summary Data Raw Data Operational System Reporting Flat Files Mining In Figure 1–2, the metadata and raw data of a traditional OLTP system is present, as is an additional type of data, summary data Summaries are very valuable in data warehouses because they pre-compute long operations in advance For example, a typical data warehouse query is to retrieve something... about Oracle9i’s data warehousing capabilities This preface contains these topics: s Audience s Organization s Related Documentation s Conventions s Documentation Accessibility xix Audience Oracle9i Data Warehousing Guide is intended for database administrators, system administrators, and database application developers who perform the following tasks: s design, maintain, and use data warehouses To... relational database concepts, basic Oracle server concepts, and the operating system environment under which you are running Oracle Organization This document contains: Chapter 1, Data Warehousing Concepts This chapter contains an overview of data warehousing concepts Chapter 2, Logical Design in Data Warehouses This chapter discusses the logical design of a data warehouse Chapter 3, Physical Design in Data . Oracle9 i Data Warehousing Guide Release 1 (9.0.1) June 2001 Part No. A90237-01 Oracle9i Data Warehousing Guide, Release 1 (9.0.1) Part. VI Miscellaneous A Glossary B Sample Data Warehousing Schema xvii Send Us Your Comments Oracle9 i Data Warehousing Guide, Release 9.0.1 Part No. A90237-01