Microsoft SQL Server 2008 R2 Unleashed- P205 docx

ptg 2034 CHAPTER 51 SQL Server 2008 Analysis Services FIGURE 51.3 A star-schema data warehouse design with a central fact table and multiple dimensions of these facts as the source for an OLAP cube in SSAS. Every cube has a schema from which the cube draws its source data. The central table in a schema is the fact table that yields the cube’s data measures. The other tables in the schema are the dimension tables that are the source of the cube dimensions. A classic star- schema data warehouse design has this central fact table along with multiple dimension tables. This is a great starting point for OLAP cube creation, as you can see in Figure 51.3. Here, we show you a high-tech company’s computer sales star-schema data warehouse that can be used as the source of building up an OLAP cube within SSAS. SSAS allows you to build dimensions and cubes from heterogeneous data sources. It can access relational OLTP databases, multidimensional data databases, text data, and any other source that has an OLE DB provider available. You don’t have to move all your data first; you just connect to its source. In SSAS, you can also design OLAP cubes from scratch. Then you can have SSAS create the relational schema of tables in SQL Server that you want to populate with the transactional data that will drive the OLAP cube. Essentially, cubes can be regular or local cubes. Regular cubes are based on real tables as the data source, have aggregations, and occupy physical storage space of some kind. If a data source that contributes to this cube changes, the cube must be reprocessed. Figure 51.4 shows this cube representation and that it consists of something called partitions. Local cubes are entirely contained in portable SSAS files (that is, tables) and can be browsed without a connection to an SSAS instance. This is really like being in “discon- nected” mode. Write-enabled dimensions within a cube enable updates (that is, writes) of data that can be shared back (that is, written back) with the data sources. ptg 2035 Understanding the SSAS Environment Wizards 51 Partitions SQL Server 2008 Partitions OLAP Cube Cubes FIGURE 51.4 The SSAS cube representations: regular OLAP cubes and partitions. Following is a quick summary of all the essential cube terms in SSAS: . Database—A database is a logical container of one or more cubes. Cubes are defined within Analysis Server databases. . Cube—A cube is a multidimensional representation of the business facts. Types of cubes are regular and local. . Data source—The data source is the origin of a cube’s data. . Measure group—This group is a collection (or grouping) of one or more measures into some type of logical unit for business purposes. A measure group does not occupy any physical space. It is metadata only. . Measure—A measure is a data fact representation. A measure is typically a data value fact, such as price, unit, or quantity. . Cell—A cell is the part of a data measure that is at the intersection of the dimensions. The cell contains the data value. If an intersection (that is, cell) has no value yet, it does not physically exist until it is populated. . Dimension—A cube’s dimension is defined by the aggregation levels of the data that are needed to support the data requirements. A dimension can be shared with other cubes, or it can be private to a cube. The structure of a dimension is directly related to the dimension table columns, member properties, or structure of OLAP data mining models. This structure becomes the hierarchy and should be organized accordingly. You can also have strict parent/child dimensions in which two columns are identified as being parent and child and the dimension is organized according to them. In a regular dimension, each column in the dimension contributes a hierarchy level. ptg 2036 CHAPTER 51 SQL Server 2008 Analysis Services . Level—A level includes the nodes of the hierarchy or data mining model. Each level contains the members. Millions of members are possible for each level. . Partition—One or more partitions comprise a cube. Using a partition is a way to physically separate parts of a cube. This separation essentially lets you deal with individual slices of a data cube separately, querying only the relevant data sources. If you partition by dimension, you can perform incremental updates to change that dimension independently of the rest of the cube. Consequently, you have to reprocess only the aggregations that are affected by those changes. This is an excel- lent feature for scalability. . Hierarchy—A hierarchy is a set of members in a dimension and their position rela- tive to each other. Hierarchies can either be balanced or unbalanced. Being balanced simply means that all branches of the hierarchy descend to the same level. An unbalanced hierarchy allows for branches to descend to different levels. It is also possible to define more than one hierarchy for a single dimension. A great example of this is “fiscal calendar time” and “Gregorian calendar time” being defined in one dimension—a Time dimension that contains both time.gregorian and time.fiscal. As mentioned previously, SSAS has many wizards. Which wizards you use depends on what you need to create. The “Creating an OLAP Database” section, later in this chapter, outlines the order and path through this maze of wizards. OLAP Versus OLTP One of the primary goals of OLAP is to increase data retrieval speed for business-related queries that are critical to decisions. Very often, there is a need to broaden the scope of a business query or to drill down into more granular details of the query. OLAP was created to facilitate this type of capability. A multidimensional schema is not a typical normalized relational database; redundant data is stored to facilitate quick retrieval. The data in a multidimensional database should be relatively static; in fact, data is not useful for decision support if it changes constantly. The information in a data warehouse is built out of carefully chosen snapshots of business data from OLTP systems. If you capture data at the right times for transfer to the data warehouse, you can quickly make accurate compar- isons of important business activities over time. In an OLTP system, transaction speed is paramount. Data modification operations must be quick, deal with concurrency (locking/holding of resources), and provide transactional consistency. An OLTP system is constantly changing; snapshots of the OLTP system, even if taken only a few seconds apart, are all different. Although historical information is certainly available in an OLTP system, using it for BI-type analysis might be impractical. Storing old data in an OLTP system becomes expensive, and you might need to recon- struct history dynamically from a series of transactions. In addition, OLTP designs and indexes usually don’t support large-scale decision support querying. SSAS supports three OLAP storage methods—MOLAP, ROLAP, and HOLAP—providing flex- ibility to the data warehousing solution and enabling powerful partitioning and aggregation optimization capabilities. ptg 2037 Understanding the SSAS Environment Wizards 51 FIGURE 51.5 MOLAP, HOLAP, and ROLAP storage continuum. Figure 51.5 shows the MOLAP, HOLAP, and ROLAP storage continuum. MOLAP stores all data locally (to SSAS), and ROLAP is the opposite (storing all data in the relational database). MOLAP is by far the most often used storage approach. The following sections take a closer look at them. MOLAP Multidimensional OLAP (MOLAP) is an approach in which cubes are built directly from OLTP data sources or from dimensional databases and downloaded to a persistent store. In SSAS, data is downloaded to the server, and the details and aggregations are stored in a native Microsoft OLAP format. No zero-activity records are stored. The dimension keys in the fact tables are compressed, and bitmap indexing is used. A high-speed MOLAP query processor retrieves the data. ROLAP Relational OLAP (ROLAP) uses fact data in summary tables in the OLTP data source to make data much more current (real-time). The summary tables are populated by processes in the OLTP system and are not downloaded to SSAS. The summary tables are known as materialized views and contain various levels of aggregation, depending on the options you select when building data cubes with SSAS. SSAS builds the summary tables with a column for each dimension and each measure. It indexes each dimension column and creates an additional index on all the dimension columns. HOLAP SSAS implements a combination of MOLAP and ROLAP called hybrid OLAP (HOLAP). Here, the facts are left in the OLTP data source, and aggregations are stored in the SSAS server. You use SSAS to boost query performance. This approach helps avoid data duplica- tion, but performance suffers a bit when you query fact data in the OLTP summary tables. The amount of performance degradation depends on the level of aggregation selected. ROLAP and HOLAP are useful in situations in which an organization wants to leverage its investment in relational database technology and existing infrastructure. The summary tables of facts are also accessible in the OLTP system via normal data access methods. However, when you are using SSAS, both ROLAP and HOLAP require more storage space because they don’t use the storage optimizations of the pure MOLAP- compressed implementation. ptg 2038 CHAPTER 51 SQL Server 2008 Analysis Services An Analytics Design Methodology A data warehouse can be built from the top down or from the bottom up. To build a top- down warehouse, you need to form a complete picture or logical data model for the entire organization (or all the subsystems within the scope of the project, such as all financial systems). In contrast, building a warehouse from the bottom up takes a much more departmental or specific business-area focus (for example, a sales order system only). This breaks the task of modeling the data into more manageable chunks. Such a departmental approach produces data marts that are potentially subsets of the overall data warehouse. The bottom-up approach can simplify implementation. It helps get departmental or business-area information to the people who need it, makes it easier to protect sensitive data, and results in better query response times because data marts deal with less data than a voluminous transactional system. The potential risk in the data mart approach is that disparity in data mart implementation can result in a logically disjointed enterprise data warehouse if efforts aren’t carefully coordinated across the organization. Before you embark on an OLAP database creation effort, the time you spend understanding the underlying requirements is the best time you can give your effort. If scope is set correctly, you will be able to achieve an industrial-strength OLAP design without much difficulty. First, you need to take care of some groundwork: 1. Carefully assess the scope of what you want to represent in the BI environment. Start small, as the bottom-up approach suggests. For instance, just tackle the sales data facts. 2. Coordinate your efforts with other related BI efforts. Let people know that you are carving out a specific subject area or departmental data and, when you finish, publish your design to everyone. 3. Seek out any shared dimensions that might have already been created for other cubes. You want to leverage these as much as possible for the sake of data consistency and nonredundant processing. 4. Understand your data sources. The OLAP cube you create will be only as good as the data you put into it. It’s best to understand the dirty data issues of what you are about to touch long before you try to build an OLAP cube with it. An Analytics Mini-Methodology To successfully build OLAP solutions, you are advised to carefully assess the requirements of your end users in as detailed fashion as is possible. A mini-methodology that focuses on the essential usages and characteristics of an Analytic solution can prove invaluable. The following sections outline a solid approach to nailing down your BI requirements and yielding optimal OLAP designs that solve your end users’ needs. Assumption: You are building a business area–focused OLAP cube. ptg 2039 An Analytics Design Methodology 51 Requirements Phase 1. Identify the processing requirements for this DSS. What analysis do you need to do? Are trend reporting, forecasting, and so on necessary? These can often be repre- sented in use case form (via UML). a. Ask each user what business decision questions he or she needs to have answered. b. Ask each user how often he or she needs these questions answered and exactly when the questions must be answered. c. Ask each user how current the data must be to get accurate answers. (This speaks to data latency.) 2. Identify the data needed to fulfill these requirements. What data must be touched to provide answers? The best way to capture this type of information is a logical data model. Even a rough model is better than none at all. This is the point where you focus on the facts that need to be analyzed. 3. Identify all possible hierarchies and level representations (that is, aggregations). This is how the data is used. Most users are likely to tell you that they want to see product data in the product hierarchy structure that has already been set up (for example, product family, product groups). 4. Identify the time hierarchies that the users need. Because time is usually implicit, it just needs to be clarified in terms of levels of aggregation (for example, years, quarters, months, weeks, days) and whether it needs to be fiscal versus Gregorian calendar, both, or something else. 5. Understand the data that each user can view from a security point of view. Design Phase 1. Analyze which data sources are needed to fulfill the requirements. See whether dimensions or OLAP cubes that already exist can be shared. 2. Understand what data transformations need to be done to the source data to provide it to the OLAP world. This might include pre-aggregation, reformatting, data integrity verifications, and so on. 3. Translate these requirements into an OLAP model design: a. Translate to MOLAP if your data sources are not going to be leveraged at all and you will be taking full advantage of OLAP storage. b. Translate to ROLAP if you are going to leverage an existing relational design and storage. c. Translate to HOLAP if you are going to partially utilize the source data storage and partially utilize OLAP storage. This is the most frequently used approach. Construction Phase 1. Implement data extraction, transformation, and loading (ETL) logic (via T-SQL, SSIS, or other methods). ptg 2040 CHAPTER 51 SQL Server 2008 Analysis Services 2. Create the data sources to be used. 3. Create the dimensions. 4. Create the cube. 5. Select data measures (that is, the data facts) for the cube. 6. Design the storage and aggregations. 7. Process the cube. This brings the data into the OLAP environment. 8. Verify data integrity. Implementation Phase 1. Define the security roles in the cube. 2. Train the user to use the system. 3. Process the data into the OLAP environment (from production data sources). 4. Verify data integrity. 5. Allow users to use the OLAP cube. Maintenance Phase 1. Evaluate access optimization in the OLAP cube via usage analysis. 2. Do data mining discovery, if desired. 3. Make schema changes/enhancements, as necessary. An OLAP Requirements Example: CompSales International Following is an abbreviated requirement that reflects an actual implementation that was done for a large Silicon Valley company. We follow the mini-methodology as closely as possible to implement this requirement in SSAS, pointing out which facilities of SSAS should be used for which purpose along the way. CompSales International Requirements A large computer manufacturer named CompSales International needs to do basic analyti- cal processing of its product data in a new BI environment. The main business issues at hand are related to minimizing channel inventory and better understanding market demand for the company’s most popular products. The detailed data processing requirements are as follows: 1. You want to view sales unit actuals and sales returns for system and nonsystem products for the past two years via the product hierarchy (All Products, Product Types, Product Lines, Product Families, SKUs), geography hierarchy (All Geos, Major Geos, Countries, Channels, Customers), and different time levels (All Time, Years, Quarters, Months). 2. You want to view data primarily at the yearly and monthly levels, although the finance department also uses it a little bit at quarterly levels. ptg 2041 An OLAP Requirements Example: CompSales International 51 3. You want to view net sales (sales minus returns) at all levels of the hierarchy. 4. The fiscal and Gregorian calendar are the same for CompSales International. 5. One day past month-end processing, all “actuals” data from the prior month is available (sales units and returns). You need to implement some general design decisions using SSAS, including the following: . Hierarchies (dimensions)—This includes product, geography, and time. . Facts (measures)—This includes sales units, sales returns, and net sales (units minus returns) calculated. . OLAP storage—This will be MOLAP or HOLAP (if you want to use the star-schema data mart that already contains most of what you are after). . Physical tables that exist—This includes Geo_Dimension, Prod_Dimension, Time_Dimension, and CompSalesFactoid (the fact table that will become your measures in the OLAP cube). This data is updated weekly. Each of these tables uses an artificial key into the main facts table for performance reasons ( GeoID, ProductID, TimeID). In addition, several member/value description tables are associated with each dimension table. Basically, there is one table for each level in a dimension. These description tables can be leveraged to make the result rows from OLAP queries much more user friendly (look back at Figure 51.3 and you can see all tables includ- ed in CompSales and how they are related via primary/foreign key references). Figure 51.6 illustrates the desired hierarchies and facts for CompSales International’s requirements. TIME GEOGRAHY All Product Product Type Product Line All Geo Country Channel All Time Quarter Month Sales Units 450 333 1203 Returns 20 35 14 22 Net Sales 430 961 319 1181 Year Product Family SKU Major Geo Customer TIME TIME PRODUCT Facts (Measures) OLAP Cube PRODUCT GEOGRAPHY Jan06 Feb06 Mar06 Apr06 996 FIGURE 51.6 CompSales International’s multidimensional OLAP requirements. ptg 2042 CHAPTER 51 SQL Server 2008 Analysis Services OLAP Cube Creation A star-schema data mart/warehouse named CompSales2008 is used as the basis of creating the OLAP cube example in this chapter. You can download this data mart, CompsSales2008.zip, from the Sams Publishing website for this book title at www. samspublishing.com, and it is also on this book’s CD. You can easily unzip and attach this database to any SQL Server 2008 database instance. This is not an SSAS database; it is a SQL Server database of a star-schema data warehouse/mart. We use this SQL Server database as the source for the exercises in this chapter. You will build the SSAS OLAP cube yourself (by following the steps outlined here). You’ll spend most of the construction phase using SQL Server Business Intelligence Development Studio (BIDS; also known as Visual Studio) and Microsoft SQL Server Management Studio (SSMS). All wizards and editors are invoked from either BIDS or SSMS. As mentioned earlier, Microsoft has moved to a project orientation. For this reason, you need to start out in the BIDS (which actually invokes Visual Studio with the BI plug-ins). You must have already installed SSAS. In general, here’s what you’ll be doing in this example: 1. Create a BI project. 2. Identify data sources and data source views that you want to use for a new cube. 3. Define the basic dimensions for the cube (Time, Geography, Product). 4. Define the hierarchies. 5. Process the dimensions. 6. Create a cube structure. 7. Define the measure groups/measures. 8. Process the cube. 9. Deploy the solution. 10. Use the cube. Using SQL Server BIDS The SQL Server BIDS (a.k.a. Visual Studio with the BI plug-ins) is launched from the SQL Server 2008 Program group on the Start menu or from the Visual Studio 2008 Program group on the Start menu. We will assume you have installed Visual Studio and SQL Server Analysis Services. When this is open, you choose File, New Project, Business Intelligence Projects. Figure 51.7 shows the New Project dialog from which you should highlight the Analysis Services Project template option and specify a project name, project location, and solution name for this new BI project. In this case, the solution name is CompSalesUnleashed. NOTE You can also start a new project by leveraging any other existing SSAS database project. You can easily clone an existing project and tweak it a bit to fit your new needs. To do this, you use the Import Analysis Services Database option. ptg 2043 An OLAP Requirements Example: CompSales International 51 FIGURE 51.7 The SQL Server BIDS New Project dialog. After you create a new project, a set of objects is presented to you in the upper-right pane, which is the Solution Explorer. Figure 51.8 shows the Solution Explorer for the new project. All OLAP project objects reside here, including data sources, dimensions, cubes, mining structures, and roles. FIGURE 51.8 The Solution Explorer view for the new CompSalesUnleashed project. . cube. Using SQL Server BIDS The SQL Server BIDS (a.k.a. Visual Studio with the BI plug-ins) is launched from the SQL Server 2008 Program group on the Start menu or from the Visual Studio 2008 Program group. spend most of the construction phase using SQL Server Business Intelligence Development Studio (BIDS; also known as Visual Studio) and Microsoft SQL Server Management Studio (SSMS). All wizards. mart, CompsSales2008.zip, from the Sams Publishing website for this book title at www. samspublishing.com, and it is also on this book’s CD. You can easily unzip and attach this database to any SQL Server 2008

Định dạng
Số trang	10
Dung lượng	402,25 KB