Oracle® Database Data Warehousing Guide 11g Release 1 (11.1) B28313-02 September 2007 Oracle Database Data Warehousing Guide, 11g Release 1 (11.1) B28313-02 Copyright © 2001, 2007, Oracle. All rights reserved. Primary Author: Paul Lane Contributing Author: Viv Schupmann and Ingrid Stuart (Change Data Capture) Contributor: Patrick Amor, Hermann Baer, Mark Bauer, Subhransu Basu, Srikanth Bellamkonda, Randy Bello, Paula Bingham, Tolga Bozkaya, Lucy Burgess, Donna Carver, Rushan Chen, Benoit Dageville, John Haydu, Lilian Hobbs, Hakan Jakobsson, George Lumpkin, Alex Melidis, Valarie Moore, Cetin Ozbutun, Ananth Raghavan, Jack Raitto, Ray Roccaforte, Sankar Subramanian, Gregory Smith, Margaret Taft, Murali Thiyagarajan, Ashish Thusoo, Thomas Tong, Mark Van de Wiel, Jean-Francois Verrier, Gary Vincent, Andreas Walter, Andy Witkowski, Min Xiao, Tsae-Feng Yu The Programs (which include both the software and documentation) contain proprietary information; they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright, patent, and other intellectual and industrial property laws. Reverse engineering, disassembly, or decompilation of the Programs, except to the extent required to obtain interoperability with other independently created software or as specified by law, is prohibited. The information contained in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. This document is not warranted to be error-free. Except as may be expressly permitted in your license agreement for these Programs, no part of these Programs may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose. If the Programs are delivered to the United States Government or anyone licensing or using the Programs on behalf of the United States Government, the following notice is applicable: U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the Programs, including documentation and technical data, shall be subject to the licensing restrictions set forth in the applicable Oracle license agreement, and, to the extent applicable, the additional rights set forth in FAR 52.227-19, Commercial Computer Software—Restricted Rights (June 1987). Oracle USA, Inc., 500 Oracle Parkway, Redwood City, CA 94065. The Programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently dangerous applications. It shall be the licensee's responsibility to take all appropriate fail-safe, backup, redundancy and other measures to ensure the safe use of such applications if the Programs are used for such purposes, and we disclaim liability for any damages caused by such use of the Programs. Oracle, JD Edwards, PeopleSoft, and Siebel are registered trademarks of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. The Programs may provide links to Web sites and access to content, products, and services from third parties. Oracle is not responsible for the availability of, or any content provided on, third-party Web sites. You bear all risks associated with the use of such content. If you choose to purchase any products or services from a third party, the relationship is directly between you and the third party. Oracle is not responsible for: (a) the quality of third-party products or services; or (b) fulfilling any of the terms of the agreement with the third party, including delivery of products or services and warranty obligations related to purchased products or services. Oracle is not responsible for any loss or damage of any sort that you may incur from dealing with any third party. iii Contents Preface xxi Audience xxi Documentation Accessibility xxi Related Documents xxii Conventions xxii What's New in Oracle Database? xxiii Oracle Database 11g Release 1 (11.1) New Features in Data Warehousing xxiii Oracle Database 10g Release 2 (10.2) New Features in Data Warehousing xxv Part I Concepts 1 Data Warehousing Concepts What is a Data Warehouse? 1-1 Subject Oriented 1-2 Integrated 1-2 Nonvolatile 1-2 Time Variant 1-2 Contrasting OLTP and Data Warehousing Environments 1-2 Data Warehouse Architectures 1-3 Data Warehouse Architecture: Basic 1-4 Data Warehouse Architecture: with a Staging Area 1-4 Data Warehouse Architecture: with a Staging Area and Data Marts 1-5 Extracting Information from a Data Warehouse 1-6 Data Mining 1-6 Oracle Data Mining Functionality 1-6 Oracle Data Mining Interfaces 1-7 Part II Logical Design 2 Logical Design in Data Warehouses Logical Versus Physical Design in Data Warehouses 2-1 Creating a Logical Design 2-2 Data Warehousing Schemas 2-2 iv Star Schemas 2-3 Other Data Warehousing Schemas 2-3 Data Warehousing Objects 2-3 Data Warehousing Objects: Fact Tables 2-4 Requirements of Fact Tables 2-4 Data Warehousing Objects: Dimension Tables 2-4 Hierarchies 2-4 Typical Dimension Hierarchy 2-5 Data Warehousing Objects: Unique Identifiers 2-5 Data Warehousing Objects: Relationships 2-5 Example of Data Warehousing Objects and Their Relationships 2-5 Part III Physical Design 3 Physical Design in Data Warehouses Moving from Logical to Physical Design 3-1 Physical Design 3-1 Physical Design Structures 3-2 Tablespaces 3-2 Tables and Partitioned Tables 3-3 Table Compression 3-3 Views 3-3 Integrity Constraints 3-4 Indexes and Partitioned Indexes 3-4 Materialized Views 3-4 Dimensions 3-4 4 Hardware and I/O Considerations in Data Warehouses Overview of Hardware and I/O Considerations in Data Warehouses 4-1 Configure I/O for Bandwidth not Capacity 4-1 Stripe Far and Wide 4-2 Use Redundancy 4-2 Test the I/O System Before Building the Database 4-2 Plan for Growth 4-3 Storage Management 4-3 5 Partitioning in Data Warehouses 6 Indexes Using Bitmap Indexes in Data Warehouses 6-1 Benefits for Data Warehousing Applications 6-2 Cardinality 6-2 How to Determine Candidates for Using a Bitmap Index 6-4 Bitmap Indexes and Nulls 6-4 Bitmap Indexes on Partitioned Tables 6-5 Using Bitmap Join Indexes in Data Warehouses 6-5 v Four Join Models for Bitmap Join Indexes 6-5 Bitmap Join Index Restrictions and Requirements 6-7 Using B-Tree Indexes in Data Warehouses 6-7 Using Index Compression 6-8 Choosing Between Local Indexes and Global Indexes 6-8 7 Integrity Constraints Why Integrity Constraints are Useful in a Data Warehouse 7-1 Overview of Constraint States 7-2 Typical Data Warehouse Integrity Constraints 7-2 UNIQUE Constraints in a Data Warehouse 7-2 FOREIGN KEY Constraints in a Data Warehouse 7-3 RELY Constraints 7-4 NOT NULL Constraints 7-4 Integrity Constraints and Parallelism 7-5 Integrity Constraints and Partitioning 7-5 View Constraints 7-5 8 Basic Materialized Views Overview of Data Warehousing with Materialized Views 8-1 Materialized Views for Data Warehouses 8-2 Materialized Views for Distributed Computing 8-2 Materialized Views for Mobile Computing 8-2 The Need for Materialized Views 8-2 Components of Summary Management 8-3 Data Warehousing Terminology 8-5 Materialized View Schema Design 8-5 Schemas and Dimension Tables 8-6 Materialized View Schema Design Guidelines 8-6 Loading Data into Data Warehouses 8-7 Overview of Materialized View Management Tasks 8-8 Types of Materialized Views 8-8 Materialized Views with Aggregates 8-9 Requirements for Using Materialized Views with Aggregates 8-10 Materialized Views Containing Only Joins 8-11 Materialized Join Views FROM Clause Considerations 8-11 Nested Materialized Views 8-12 Why Use Nested Materialized Views? 8-12 Nesting Materialized Views with Joins and Aggregates 8-13 Nested Materialized View Usage Guidelines 8-13 Restrictions When Using Nested Materialized Views 8-14 Creating Materialized Views 8-14 Creating Materialized Views with Column Alias Lists 8-15 Naming Materialized Views 8-16 Storage And Table Compression 8-16 Build Methods 8-16 vi Enabling Query Rewrite 8-17 Query Rewrite Restrictions 8-17 Materialized View Restrictions 8-17 General Query Rewrite Restrictions 8-17 Refresh Options 8-18 General Restrictions on Fast Refresh 8-19 Restrictions on Fast Refresh on Materialized Views with Joins Only 8-20 Restrictions on Fast Refresh on Materialized Views with Aggregates 8-20 Restrictions on Fast Refresh on Materialized Views with UNION ALL 8-21 Achieving Refresh Goals 8-22 Refreshing Nested Materialized Views 8-22 ORDER BY Clause 8-23 Materialized View Logs 8-23 Using the FORCE Option with Materialized View Logs 8-24 Using Oracle Enterprise Manager 8-24 Using Materialized Views with NLS Parameters 8-24 Adding Comments to Materialized Views 8-24 Registering Existing Materialized Views 8-25 Choosing Indexes for Materialized Views 8-26 Dropping Materialized Views 8-27 Analyzing Materialized View Capabilities 8-27 Using the DBMS_MVIEW.EXPLAIN_MVIEW Procedure 8-27 DBMS_MVIEW.EXPLAIN_MVIEW Declarations 8-28 Using MV_CAPABILITIES_TABLE 8-28 MV_CAPABILITIES_TABLE.CAPABILITY_NAME Details 8-30 MV_CAPABILITIES_TABLE Column Details 8-31 9 Advanced Materialized Views Partitioning and Materialized Views 9-1 Partition Change Tracking 9-1 Partition Key 9-2 Join Dependent Expression 9-3 Partition Marker 9-4 Partial Rewrite 9-5 Partitioning a Materialized View 9-5 Partitioning a Prebuilt Table 9-5 Benefits of Partitioning a Materialized View 9-6 Rolling Materialized Views 9-6 Materialized Views in Analytic Processing Environments 9-7 Cubes 9-7 Benefits of Partitioning Materialized Views 9-8 Compressing Materialized Views 9-8 Materialized Views with Set Operators 9-8 Examples of Materialized Views Using UNION ALL 9-8 Materialized Views and Models 9-9 Invalidating Materialized Views 9-10 Security Issues with Materialized Views 9-11 vii Querying Materialized Views with Virtual Private Database (VPD) 9-11 Using Query Rewrite with Virtual Private Database 9-11 Restrictions with Materialized Views and Virtual Private Database 9-12 Altering Materialized Views 9-12 10 Dimensions What are Dimensions? 10-1 Creating Dimensions 10-3 Dropping and Creating Attributes with Columns 10-6 Multiple Hierarchies 10-7 Using Normalized Dimension Tables 10-8 Viewing Dimensions 10-8 Using Oracle Enterprise Manager 10-8 Using the DESCRIBE_DIMENSION Procedure 10-9 Using Dimensions with Constraints 10-9 Validating Dimensions 10-10 Altering Dimensions 10-10 Deleting Dimensions 10-11 Part IV Managing the Data Warehouse Environment 11 Overview of Extraction, Transformation, and Loading Overview of ETL in Data Warehouses 11-1 ETL Basics in Data Warehousing 11-1 Extraction of Data 11-1 Transportation of Data 11-2 ETL Tools for Data Warehouses 11-2 Daily Operations in Data Warehouses 11-2 Evolution of the Data Warehouse 11-2 12 Extraction in Data Warehouses Overview of Extraction in Data Warehouses 12-1 Introduction to Extraction Methods in Data Warehouses 12-2 Logical Extraction Methods 12-2 Full Extraction 12-2 Incremental Extraction 12-2 Physical Extraction Methods 12-2 Online Extraction 12-3 Offline Extraction 12-3 Change Data Capture 12-3 Timestamps 12-4 Partitioning 12-4 Triggers 12-4 Data Warehousing Extraction Examples 12-5 Extraction Using Data Files 12-5 Extracting into Flat Files Using SQL*Plus 12-5 viii Extracting into Flat Files Using OCI or Pro*C Programs 12-7 Exporting into Export Files Using the Export Utility 12-7 Extracting into Export Files Using External Tables 12-7 Extraction Through Distributed Operations 12-8 13 Transportation in Data Warehouses Overview of Transportation in Data Warehouses 13-1 Introduction to Transportation Mechanisms in Data Warehouses 13-1 Transportation Using Flat Files 13-1 Transportation Through Distributed Operations 13-2 Transportation Using Transportable Tablespaces 13-2 Transportable Tablespaces Example 13-2 Other Uses of Transportable Tablespaces 13-4 14 Loading and Transformation Overview of Loading and Transformation in Data Warehouses 14-1 Transformation Flow 14-1 Multistage Data Transformation 14-1 Pipelined Data Transformation 14-2 Loading Mechanisms 14-3 Loading a Data Warehouse with SQL*Loader 14-3 Loading a Data Warehouse with External Tables 14-4 Loading a Data Warehouse with OCI and Direct-Path APIs 14-5 Loading a Data Warehouse with Export/Import 14-5 Transformation Mechanisms 14-5 Transforming Data Using SQL 14-5 CREATE TABLE AS SELECT And INSERT /*+APPEND*/ AS SELECT 14-6 Transforming Data Using UPDATE 14-6 Transforming Data Using MERGE 14-6 Transforming Data Using Multitable INSERT 14-7 Transforming Data Using PL/SQL 14-9 Transforming Data Using Table Functions 14-9 What is a Table Function? 14-9 Error Logging and Handling Mechanisms 14-15 Business Rule Violations 14-16 Data Rule Violations (Data Errors) 14-16 Handling Data Errors in PL/SQL 14-16 Handling Data Errors with an Error Logging Table 14-17 Loading and Transformation Scenarios 14-18 Key Lookup Scenario 14-18 Business Rule Violation Scenario 14-19 Data Error Scenarios 14-20 Pivoting Scenarios 14-22 15 Maintaining the Data Warehouse Using Partitioning to Improve Data Warehouse Refresh 15-1 ix Refresh Scenarios 15-4 Scenarios for Using Partitioning for Refreshing Data Warehouses 15-5 Refresh Scenario 1 15-5 Refresh Scenario 2 15-5 Optimizing DML Operations During Refresh 15-6 Implementing an Efficient MERGE Operation 15-6 Maintaining Referential Integrity 15-9 Purging Data 15-9 Refreshing Materialized Views 15-10 Complete Refresh 15-11 Fast Refresh 15-11 Partition Change Tracking (PCT) Refresh 15-11 ON COMMIT Refresh 15-12 Manual Refresh Using the DBMS_MVIEW Package 15-12 Refresh Specific Materialized Views with REFRESH 15-12 Refresh All Materialized Views with REFRESH_ALL_MVIEWS 15-13 Refresh Dependent Materialized Views with REFRESH_DEPENDENT 15-14 Using Job Queues for Refresh 15-15 When Fast Refresh is Possible 15-15 Recommended Initialization Parameters for Parallelism 15-15 Monitoring a Refresh 15-16 Checking the Status of a Materialized View 15-16 Viewing Partition Freshness 15-16 Scheduling Refresh 15-18 Tips for Refreshing Materialized Views with Aggregates 15-19 Tips for Refreshing Materialized Views Without Aggregates 15-21 Tips for Refreshing Nested Materialized Views 15-22 Tips for Fast Refresh with UNION ALL 15-22 Tips After Refreshing Materialized Views 15-23 Using Materialized Views with Partitioned Tables 15-23 Fast Refresh with Partition Change Tracking 15-23 PCT Fast Refresh Scenario 1 15-23 PCT Fast Refresh Scenario 2 15-25 PCT Fast Refresh Scenario 3 15-25 Fast Refresh with CONSIDER FRESH 15-26 16 Change Data Capture Overview of Change Data Capture 16-1 Capturing Change Data Without Change Data Capture 16-1 Capturing Change Data with Change Data Capture 16-3 Publish and Subscribe Model 16-4 Publisher 16-4 Subscribers 16-6 Change Sources and Modes of Change Data Capture 16-8 Synchronous Change Data Capture 16-8 Asynchronous Change Data Capture 16-9 Asynchronous HotLog Mode 16-9 x Asynchronous Distributed HotLog Mode 16-10 Asynchronous AutoLog Mode 16-11 Change Sets 16-13 Valid Combinations of Change Sources and Change Sets 16-14 Change Tables 16-14 Getting Information About the Change Data Capture Environment 16-15 Preparing to Publish Change Data 16-16 Creating a User to Serve As a Publisher 16-17 Granting Privileges and Roles to the Publisher 16-17 Creating a Default Tablespace for the Publisher 16-17 Password Files and Setting the REMOTE_LOGIN_PASSWORDFILE Parameter 16-18 Determining the Mode in Which to Capture Data 16-18 Setting Initialization Parameters for Change Data Capture Publishing 16-19 Initialization Parameters for Synchronous Publishing 16-19 Initialization Parameters for Asynchronous HotLog Publishing 16-19 Initialization Parameters for Asynchronous Distributed HotLog Publishing 16-20 Initialization Parameters for Asynchronous AutoLog Publishing 16-22 Adjusting Initialization Parameter Values When Oracle Streams Values Change 16-25 Tracking Changes to the CDC Environment 16-25 Publishing Change Data 16-25 Performing Synchronous Publishing 16-25 Performing Asynchronous HotLog Publishing 16-28 Performing Asynchronous Distributed HotLog Publishing 16-31 Performing Asynchronous AutoLog Publishing 16-37 Subscribing to Change Data 16-43 Managing Published Data 16-47 Managing Asynchronous Change Sources 16-47 Enabling And Disabling Asynchronous Distributed HotLog Change Sources 16-47 Managing Asynchronous Change Sets 16-48 Creating Asynchronous Change Sets with Starting and Ending Dates 16-48 Enabling and Disabling Asynchronous Change Sets 16-48 Stopping Capture on DDL for Asynchronous Change Sets 16-49 Recovering from Errors Returned on Asynchronous Change Sets 16-50 Managing Synchronous Change Sets 16-52 Enabling and Disabling Synchronous Change Sets 16-53 Managing Change Tables 16-53 Creating Change Tables 16-53 Understanding Change Table Control Columns 16-54 Understanding TARGET_COLMAP$ and SOURCE_COLMAP$ Values 16-56 Using Change Markers 16-58 Controlling Subscriber Access to Change Tables 16-59 Purging Change Tables of Unneeded Data 16-60 Dropping Change Tables 16-61 Exporting and Importing Change Data Capture Objects Using Oracle Data Pump 16-62 Restrictions on Using Oracle Data Pump with Change Data Capture 16-62 Examples of Oracle Data Pump Export and Import Commands 16-63 Publisher Considerations for Exporting and Importing Change Tables 16-63 [...]... this typical architecture 1-4 Oracle Database Data Warehousing Guide Data Warehouse Architectures Figure 1–3 Architecture of a Data Warehouse with a Staging Area Data Sources Staging Area Warehouse Users Operational System Analysis Metadata Summary Data Operational System Raw Data Reporting Flat Files Mining Data Warehouse Architecture: with a Staging Area and Data Marts Although the architecture in... that historical data be moved to an archive Contrasting OLTP and Data Warehousing Environments Figure 1–1 illustrates key differences between an OLTP system and a data warehouse Figure 1–1 Contrasting OLTP and Data Warehousing Environments OLTP Data Warehouse Complex data structures (3NF databases) Multidimensional data structures Few Indexes Many Many Joins Some Normalized DBMS Duplicated Data Denormalized... basic data warehousing concepts It contains the following chapter: ■ Chapter 1, "Data Warehousing Concepts" 1 Data Warehousing Concepts This chapter provides an overview of the Oracle data warehousing implementation It includes: ■ What is a Data Warehouse? ■ Data Warehouse Architectures ■ Extracting Information from a Data Warehouse Note that this book is meant as a supplement to standard texts about data. .. Architecture: with a Staging Area Data Warehousing Concepts 1-3 Data Warehouse Architectures ■ Data Warehouse Architecture: with a Staging Area and Data Marts Data Warehouse Architecture: Basic Figure 1–2 shows a simple architecture for a data warehouse End users directly access data derived from several source systems through the data warehouse Figure 1–2 Architecture of a Data Warehouse Data Sources Warehouse... Derived Data and Aggregates Common 1-2 Oracle Database Data Warehousing Guide Data Warehouse Architectures One major difference between the types of system is that data warehouses are not usually in third normal form (3NF), a type of data normalization common in OLTP environments Data warehouses and OLTP systems have very different requirements Here are some examples of differences between typical data. .. Marts Data Sources Staging Area Operational System Data Marts Users Purchasing Warehouse Analysis Sales Reporting Inventory Mining Metadata Operational System Flat Files Summary Data Raw Data Data marts are an important part of many data warehouses, but they are not the focus of this book Note: Data Warehousing Concepts 1-5 Extracting Information from a Data Warehouse Extracting Information from a Data. .. example, data mining can be used in the life sciences to discover gene and protein targets and to identify leads for new drugs Oracle Data Mining performs data mining in the Oracle Database Oracle Data Mining does not require data movement between the database and an external mining server, thereby eliminating redundancy, improving efficient data storage and processing, ensuring that up-to-date data is... Operational System Users Analysis Metadata Summary Data Operational System Raw Data Reporting Data for Mining Flat Files Mining In Figure 1–2, the metadata and raw data of a traditional OLTP system is present, as is an additional type of data, summary data Summaries are very valuable in data warehouses because they pre-compute long operations in advance For example, a typical data warehouse query is to retrieve... in Oracle Database? This section describes the new features of Oracle Database 11g Release 1 (11.1) and provides pointers to additional information New features information from previous releases is also retained to help those users migrating to the current release The following section describes new features in Oracle Database: ■ Oracle Database 11g Release 1 (11.1) New Features in Data Warehousing. .. Distributed HotLog Configurations and Restrictions Oracle Database Releases for Source and Staging Databases Upgrading a Distributed HotLog Change Source to Oracle Release 11.1 Hardware Platforms and Operating Systems Requirements for Multiple Publishers on the Staging Database Requirements for Database Links Part V 17 Data Warehouse Performance Basic Query Rewrite Overview . Oracle® Database Data Warehousing Guide 11g Release 1 (11.1) B28313-02 September 2007 Oracle Database Data Warehousing Guide, 11g Release. Database? xxiii Oracle Database 11g Release 1 (11.1) New Features in Data Warehousing xxiii Oracle Database 10g Release 2 (10.2) New Features in Data Warehousing