Hướng dẫn học Microsoft SQL Server 2008 part 155 pdf

Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1502 Part X Business Intelligence By default, the pure MOLAP setting is chosen. This setting works well for traditional data warehousing applications because the partitions can be processed by the same procedure that loads large batches of data into the warehouse. Pure MOLAP is also an excellent choice for historical partitions for which data additions and updates are not expected. However, if a partition is built based on frequently changing source data (e.g., directly on OLTP tables), then proactive caching can manage partition updates automatically. Proactive caching Proactive caching is used to describe the many ways in which Analysis Services can automatically update the contents of a cube based on a relational data source. It is controlled by the many options on the Storage Options dialog, but these options are all controls on the same basic procedure: Analysis Services is notified each time underlying data is updated; it waits for a pause in the updates, and then begins rebuilding the cache (partition). If an update notification is received before the rebuild completes, then the rebuild will be restarted. If the rebuild process takes longer than allowed, then the cache reverts to ROLAP (SQL Queries) until the rebuild is complete. The options that control the rebuild process are located on the Storage Options’ General tab (select the Enable Proactive Caching check box to enable updates): ■ Silence Interval: Amount of quiet time since the last update notification before beginning the rebuild process. An appropriate setting depends on the table usage profile, and should be long enough to identify when a batch of updates has completed. ■ Silence Override Interval: After this amount of time, begin the rebuild even if no silence interval has been detected. ■ Latency: The amount of time from receipt of the first notification until queries revert to ROLAP. Essentially, this guarantees that data returned by Analysis Services will never be more outdated than the specified amount of time. Of course, this may represent a significant extra load against the SQL database(s) and server(s) that must service the queries in the interim, depending on the number and complexity of the queries. ■ Rebuild Interval: Causes a cache rebuild even when no update notifications are received. The value specifies the time since the last rebuild that a new rebuild will be triggered. This option can be used independently of data changes (e.g., don’t listen for notifications, just rebuild this partition every four hours) or as a backup to update notifications, as update notification may not be guaranteed. ■ Bring online immediately: Causes a newly created cube to come online immediately in ROLAP mode without waiting for the MOLAP build process to complete. Improves availabil- ity at the possible expense of extra load on the relational database(s) and server(s) that will process the ROLAP queries. ■ Enable ROLAP aggregations: Creates views to support ROLAP aggregations. ■ Apply storage settings to dimension: Only available at the cube or measure group levels, this option applies the same storage settings to all dimensions related to the cube or measure group. The Notifications tab specifies how Analysis Services will be notified of underlying relational data changes. Notifications options must be set for each object (partition and dimension) individually. These options are relevant only when rebuilds on data change are enabled (for example, Silence Interval and Silence Override Interval are set). 1502 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1503 Building Multidimensional Cubes with Analysis Services 71 SQL Server notifications SQL Server notifications use trace events to tell Analysis Services when either an insert or an update has occurred in the underlying table. Because event delivery is not guaranteed, this approach is often coupled with periodic rebuilds to ensure that missed events are included on a regular schedule. Enabling trace events requires that Analysis Services connects to SQL Server with an appropriately privileged account. A partition that relies directly on an underlying table, without the use of query binding or a named query in the data source view, does not require tracking tables to be specified. Other partitions will need to have tracking tables specified, listing the underlying tables that when changed indicate the partition is out of date. Dimensions must always list tracking tables. These tracking tables are simply the tables on which the intermediate named query or view are based. Client-initiated notifications Client-initiated notification enables a custom application that changes data tables to notify Analysis Ser- vices when a table has been changed. The application sends a NotifyTableChange command to the server to specify which table has been changed. Otherwise, processing behaves much like SQL Server notification. Scheduled polling notifications Scheduled polling notification is simple to configure and works for non-SQL Server data sources, but only recognizes when new rows have been added to the table. If update-only transactions are common against a table, then combine polling with the periodic rebuild option to incorporate missed updates. Polling works by running a query that returns a high-water mark from the source table and notices when the mark changes. For example, a partition built on the factSales tablewithanalways increasing primary key SalesID can poll using the following query: SELECT MAX(SalesID) FROM factSales; Enable polling by selecting the polling interval and entering the corresponding polling query. Multiple polling queries can be used if the object (such as a multi-table dimension) relies on multiple tables for data. Notification based on polling can help implement incremental updates. Incremental updates become important as partition sizes grow, increasing processing time and resource requirements beyond convenient levels. Incremental processing is based on a query that returns only data added since the partition was last processed. Continuing the preceding example, Analysis Services replaces the first parameter (?) with the last value previously processed, and the second parameter with the current polled value: SELECT * FROM factSales WHERE SalesID > COALESCE(?,-1) AND SalesID <= ?; The COALESCE function handles the empty table case where no data had been previously processed. Enable incremental updates by selecting the Enable Incremental Updates check box and entering a processing query and tracking table for each polling query. 1503 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1504 Part X Business Intelligence Data Integrity Data integrity functionality in Analysis Services addresses inconsistencies that would otherwise cause improper data presentation. Analysis Services views these inconsistencies in two categories: ■ Null Processing: When nulls are encountered in source data. For example, if a measure contains a null, should it be reported as zero or remain a null value? ■ Key Errors: When keys are missing, duplicated, or otherwise don’t map between tables. For example, how should a CustomerID in the fact table without a corresponding entry in the customer dimension be handled? Basing a cube on a traditional data warehouse helps minimize data integrity issues by addressing these problems during the warehouse load. Best Practice A key strength of OLAP in general and the UDM in particular is consistent interpretation of data. Data integrity settings and centralized calculations are examples of the many ways that UDM centralizes data interpretation for downstream data consumers. Address these issues in the design of the warehouse and UDM to deliver the most useful product. Think of it as building a ‘‘data object’’ complete with information hiding and opportunities for reuse. Null processing How nulls are treated depends on the NullProcessing property of the object in question. For measures, the NullProcessing property appears as part of the source definition with four possible values: ■ ZeroOrBlank: The server converts nulls to zero for numeric data items, and to blank for string data items. ■ Automatic: Same as ZeroOrBlank ■ Error: The server will trigger an error and discard the record. ■ Preserve: Stores the null value without change A good way to choose among these settings is to consider how an average value should be calculated on the data for a given measure. If the best interpretation is averaging only the non-null values, then Preserve will yield that behavior. Otherwise, ZeroOrBlank will yield an average that considers null measures as zero. For dimensions, the NullProcessing property can take on an additional value: ■ UnknownMember: Interprets the null value as the unknown member 1504 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1505 Building Multidimensional Cubes with Analysis Services 71 The NullProcessing property appears in several contexts for dimensions: ■ Each dimension attribute’s NameColumn, if defined, contains NullProcessing as part of the source definition. This setting is used when a null name is encountered when building the dimension. ■ Each dimension attribute’s KeyColumns collection contains a NullProcessing property for every column in the collection. This setting is used when null key column(s) are encountered when building the dimension. ■ Each cell on the Dimension Usage tab that relates a measure group and dimension via a regular relationship contains a NullProcessing property. The NullProcessing property is located on the Advanced (Measure Group Bindings) dialog, and is used when the related column in the fact table is null. For dimension relationships, the default setting of Automatic NullProcessing is actually quite dan- gerous. As the key column is read as null, it will be converted to 0 (or blank), which may be a valid key into some dimensions, causing null entries to be assigned to that member. Usually a better setting is UnknownMember if nulls are expected, or Error if nulls are not expected. Alternately, a dimension member could be created for a 0 (or blank) key value and assigned a name such as ‘‘Invalid’’ to match the automatic processing behavior. Unknown member Choosing an unknown member option, either as part of null processing or in response to an error, requires the unknown member to be configured for the affected dimension. Once the unknown member is enabled for a dimension, the member will be added to every attribute in the dimension. The UnknownMember dimension property can take on three possible settings: ■ None: The unknown member is not enabled for this dimension and any attempt to assign data to the unknown member will result in an error. This is the default setting. ■ Visible: The unknown member is enabled and is visible to queries. ■ Hidden: The unknown member is enabled, but not directly visible in queries. However, the (All) level of the dimension will contain the unknown member’s contribution and the MDX UnknownMember function can access the unknown member’s contribution directly. The default name of the unknown member is simply ‘‘Unknown,’’ which can be changed by entering a value for the dimension’s UnknownMemberName property. Error Configuration For data integrity errors as described and several others, the ErrorConfiguration property specifies how errors will be handled. Initially, the setting for this property will be (default), but choose the (custom) setting from the list and eight properties will appear. The ErrorConfiguration properties are available on several objects, but are primarily set for dimensions and measure groups. 1505 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1506 Part X Business Intelligence The error configuration properties are as follows: ■ KeyDuplicate: Triggered when a duplicate key is seen while building the dimension. The default is IgnoreError; other settings are ReportAndContinue and ReportAndStop. IgnoreError will cause all the attribute values to be incorporated into the dimension, but Analysis Services randomly chooses which values to associate with the key. For example, if a product dimension table has two rows for productID 73, one with the name ‘‘Orange’’ and the other with the name ‘‘Apple,’’ then both Orange and Apple will appear as members of the product name attribute, but only one of those names will have transactions associated with it. Conversely, if both product names are Apple, then there will be only one Apple in the product name dimensions, and users of the cube will be unable to tell that there were any duplicate records. ■ KeyErrorAction: Triggered when a KeyNotFound error is encountered. This occurs when a key value cannot be located in its associated table. For measure groups, this happens when the fact table contains a dimension key not found in the dimension table. For snowflaked dimension tables, it similarly implies one dimension table referencing a non-existent key in another dimension table. Settings are either ConvertToUnknown (the default) or DiscardRecord. ■ KeyErrorLimit: The number of key errors allowed before taking the KeyErrorLimitAction. The default value of 0 causes the first error to trigger the KeyErrorLimitAction.Setto−1 for no limit. ■ KeyErrorLimitAction: The action triggered by exceeding the KeyErrorLimit. Settings are either StopProcessing (default) or StopLogging. StopLogging will continue processing and allow any number of key errors, but will log only the first KeyErrorLimit errors. ■ KeyErrorLogFile: File to log all key errors. ■ KeyNotFound: Determines how KeyNotFound errors interact with the KeyErrorLimit. The default setting of ReportAndContinue counts the error against KeyErrorLimit, whereas a setting of IgnoreError does not count the error against KeyErrorLimit.The setting of ReportAndStop will log the error and stop processing immediately without regard for any KeyErrorLimit or KeyErrorAction settings. The IgnoreError setting is useful when multiple KeyNotFound errors are expected, allowing the expected map- ping to an unknown member to occur while counting other types of key errors against the KeyErrorLimit. ■ NullKeyConvertedToUnknown: Identical in concept to the KeyNotFound property, but for null keys converted to an unknown member ( NullProcessing=UnknownMember) instead of KeyNotFound errors. The default setting is IgnoreError. ■ NullKeyNotAllowed: Identical in concept to the KeyNotFound property, but for disallowed null keys ( NullProcessing=Error) instead of KeyNotFound errors. The default setting is ReportAndContinue. These same properties can be set as server properties to establish different defaults. They can also be set for a particular processing run to provide special handling for certain data. 1506 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1507 Building Multidimensional Cubes with Analysis Services 71 Summary Analysis Services provides the capability to build fast, consistent, and relevant repositories of data suit- able for both end-user and application use. The details are extensive, but generally simple problems can be resolved easily by using default behaviors, and more complex problems need the flexibility provided by the breadth of this server and its design environment. Other chapters in this section detail the design, loading, analyzing, querying, and reporting of BI data. 1507 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1508 www.getcoolebook.com Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1509 Programming MDX Queries IN THIS CHAPTER Cube addressing basics MDX SELECT statements Commonly used MDX functions MDX named sets and calculated members Adding named sets, calculated members, and business intelligence to cube definitions M ultidimensional Expressions (MDX) is to Analysis Services what SQL is to the relational database, providing both definition (DDL) and query (DML) capabilities. MDX queries even look somewhat like SQL, but the ideas behind them are dramatically different. Certainly, MDX returns multidimensional cell sets instead of two-dimensional result sets, but more important, MDX does not contain a JOIN statement, as the cube contains explicit relationships between all the data it summarizes. Instead, hierarchically organized dimension data is manipulated in sets to determine both the content and structure of the result. Learning to write basic MDX queries goes quickly for most people, especially those with other database experience. However, many beginners have a tendency to stall at the basic query level. This learning plateau seems to stem from a lack of understanding of only a dozen or so terms and concepts. These are the same concepts presented at the beginning of this chapter: tuples, sets, the parts of a dimension, and so on. To avoid b eing stalled, attack MDX i n manageable bites: 1. Read the ‘‘Basic Select Query’’ section following this list, and then practice basic queries until you become comfortable with them. 2. Return to and reread ‘‘Basic Select Query’’ carefully until you master the concepts and terminology. These basics will enable you to read the documentation of advanced features in ‘‘Advanced Select Query’’ later in this chapter with confidence. Practice advanced queries. 3. Get started defining sets and calculations within the cube structure by reading the ‘‘MDX Scripting’’ section toward the end of this chapter. For background in creating the cubes that MDX queries, see Chapter 71, ‘‘Building Multidimensional Cubes with Analysis Services.’’ 1509 www.getcoolebook.com Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1510 Part X Business Intelligence Basic Select Query Like SQL, the SELECT statement in MDX is the means by which data is retrieved from the database. A common MDX form is as follows: SELECT { Set1 } ON COLUMNS, { Set2 } ON ROWS FROM Cube WHERE ( Set3 ) This query will return a simple table, with Set1 and Set2 defining the column and row headers, Cube providing the data, and Set3 limiting which parts of the cube are summarized in the table. Cube addressing A set is a list of one or more tuples. A tuple is an address that references some portion of the cube. Consider the example cube in Figure 72-1, which has been limited to three dimension hierarchies (Product, Year, Measure) so that it can be represented graphically. An MDX query summarizes the individual blocks of the cube, called cells, into the geometry specified by the query. FIGURE 72-1 A simple cube with three dimension hierarchies Sales Amount Order Count 2007 2008 A 2009 Apple Orange Pear Tuples are specified by listing entries from every hierarchy providing coordinates in the cube, selecting all the cells where those coordinates i ntersect. For example, (Pear, 2008, Order Count) addresses the cell marked as ‘‘A’’ in Figure 72-1. Tuples can also address groups of cells using a dimension’s All 1510 www.getcoolebook.com Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1511 Programming MDX Queries 72 level — for example, (Pear, All Years, Order Count) refers to three cells: the cell marked as ‘‘A’’ and the cells immediately above (Pear, 2007, Order Count) and below (Pear, 2009, Order Count) it. In fact, even when a tuple does not explicitly list a dimension, MDX uses the dimension’s default member (usually the All level) to fill i n the missing information. Thus, (Sales Amount) is the same as (All Products, All Years, Sales Amount). Sets are b uilt from tuples, so {(Pear, 2008, Order Count)} is a set of one cell, while {(Apple), (Orange)} consists of 12 cells — all the cells that don’t have Pear as the product. Of course, the MDX syntax is a bit more f ormal than these examples show, and practical cubes are more complex, but the addressing concepts remain the same. Dimension structure Each dimension is a subject area that can be used to organize the results of an MDX query. Within a dimension are hierarchies, essentially topics within that subject area. For example, a customer dimension might have hierarchies for city, country, and postal code, describing where a customer lives. Each hierarchy in turn has one or more levels that actually contain the members or dimension data. Members are used to build the sets that form the basis of MDX queries. Dimension references Referring to one of the components of a dimension is as simple as stringing together its lineage sepa- rated by periods. Here are some examples: [Customer] Customer dimension [Customer].[Country] Country hierarchy [Customer].[Country].[Country] Country level [Customer].[Country].[Country].&[Germany] Germany member While it is technically acceptable to omit the square brackets around each identifier, many cube names have embedded spaces and special characters, so it is customary to include them consis- tently. The ampersand (&) before the member denotes a reference by key — every member can be referenced by either its key or its name, although keys are recommended. In this case, the key and the name are the same thing, so [Customer].[Country].[Country].[Germany]refers to the same member as the sample does. Members from other hierarchies may have more cryptic keys — f or example, [Customer].[Customer].[Customer].&[20755]may be equivalent to [Customer].[Customer].[Customer].[Mike White]. In addition to referring to individual members in a dimension, most dimensions also have an [All] level that refers to all the members in that dimension. The default name for this level is [All], but it can be changed by the cube developer. For example, the [All] level for the AdventureWorks Customer dimension is named [All Customers]. The [All] level can be referenced from either the dimension or the hierarchy with equivalent results: [Customer].[All Customers] Customer [All] level from dimension [Customer].[Country].[All Customers] Customer [All] level from hierarchy 1511 www.getcoolebook.com . 07/21/2009 3:53pm Page 1503 Building Multidimensional Cubes with Analysis Services 71 SQL Server notifications SQL Server notifications use trace events to tell Analysis Services when either an insert. schedule. Enabling trace events requires that Analysis Services connects to SQL Server with an appropriately privileged account. A partition that relies directly on an underlying table, without the use. application sends a NotifyTableChange command to the server to specify which table has been changed. Otherwise, processing behaves much like SQL Server notification. Scheduled polling notifications Scheduled

Định dạng
Số trang	10
Dung lượng	1,01 MB