Hướng dẫn học Microsoft SQL Server 2008 part 152 ppsx

Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1472 Part X Business Intelligence Data can also be left in the relational database, or ROLAP store, which generally results in the fastest processing times at the expense of query times. Without aggregations, queries against a ROLAP store cause the equivalent SQL to be executed as needed. Aggregations can be pre-calculated for ROLAP, but doing so requires processing all the detailed data, so MOLAP is the preferred option. A relational database in this context is not limited to SQL Server, but may be any data source for which an OLE DB provider exists. A compromise between the speed of MOLAP storage and the need for preprocessing, called proactive caching, serves queries out of MOLAP storage when possible, but queries the relational database to retrieve the latest data not yet processed into the MOLAP store. Finally, the Analysis Services server uses XML for Analysis (XMLA) as its sole protocol, which is why you see XMLA inside the arrow in Figure 71-1. Client Clients communicate with Analysis Services, like any other Web Service, via the Simple Object Access Protocol (SOAP). Client applications can hide XMLA and SOAP details by using the provided data access interfaces to access Analysis Services: ■ All .NET languages can use ADOMD.NET. ■ Win32 applications (such as C++) can use the OLE DB for OLAP driver. ■ Other COM-based applications (such as VB6, VBA, scripting) can use ADOMD. While the server will only speak XMLA via TCP/IP, clients have the option of using the HTTP protocol for their communications, if an appropriately configured IIS server is available to translate. In addition to custom applications, Analysis Services can be accessed by several provided tools, including the following: ■ Business Intelligence Development Studio, for defining database structure ■ SQL Server Management Studio, for managing and querying the server ■ Reporting Services, which can base report definitions on Analysis Services data ■ Excel features and add-ins, for querying and analyzing data A wide variety of third-party tools are also available to exploit the features of Analysis Services. Building a Database An Analysis Services database is built by identifying the data to include in the database, specifying the relationships between that data, defining dimension structures on that data, and finally building one or more cubes to combine the dimensions and measures. This section describes the overall process with an emphasis on gathering the data needed to define the database. Subsequent sections describe the many facets of dimensions and cubes. 1472 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1473 Building Multidimensional Cubes with Analysis Services 71 Business Intelligence Development Studio The process of building an Analysis Services database begins by opening a new Analysis Services project in the Business Intelligence Development Studio. Each project corresponds to a database that will be created on the target server when the project is deployed. Best Practice A long with opening an Analysis Services project, it is also possible to directly open an existing database in Business Intelligence Development Studio. While this is a useful feature for examining the configuration of a running server, changes should be made in a project, deployed first to a development server, and deployed to production only after testing. Keep the project and related files in source control. Be sure to set the target server before attempting to deploy your new database. Right-click on the project in the Solution Explorer and choose Properties. Set the target server in the deployment property page for the configuration(s) of interest (for example, development vs. production). Taking care with this setup when you create a project will prevent inadvertently creating a database on the wrong server. Data sources Define a data source for each distinct database or other source of data needed for the Analysis Services database. Each data source encapsulates the connection string, authentication, and properties for reading a particular set of data. A data source can be defined on any data for which an OLE DB provider exists, enabling Analysis Services to use many types of data beyond the traditional relational sources. Start the New Data Source Wizard by right-clicking the Data Sources folder in the Solutions Explorer and selecting the New option. After you view the optional welcome screen, the ‘‘Select how to define a connection’’ screen appears and presents a list of connections. Select the appropriate connection if it exists. If the appropriate connection does not exist, bring up the connection manager by clicking the New button and add it. Within the connection manager, choose an appropriate provider, giving preference to native OLE DB providers for best performance. Then enter the server name, authentication information, database name, and any other properties required by the chosen provider. Review entries on the All tab and test the connection before clicking OK to complete the connection creation. Work through the remaining wizard screens, choosing the appropriate login (impersonation) information for the target environment and finally the name of the data source. The choice of impersonation method depends on how access is granted in your environment. Any method that provides access to the necessary tables is sufficient for development. ■ Use a specific Windows user name and password allows the entry of the credential to be used when connecting to the relational database. This option is best when the developer and target server would not otherwise have access to the necessary data. 1473 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1474 Part X Business Intelligence ■ Use the service account will use the account that the Analysis Server service is logged in under to connect to the relational database. This is the simplest option provided that the login specified for the service has been granted access to the relational database. ■ Use the credentials of the current user uses the current developer’s login to read the relational database. This can be a good choice for development, but it won’t work when the database is deployed to a server because there is no ‘‘current user.’’ ■ Inherit uses the Analysis Services database impersonation method, which defaults to using the service account, but it can be changed in database properties. When managing multiple projects in a single solution, basing a data source in one project on information in another project can be useful. For those cases, instead of choosing a connection at the ‘‘Select how to define a connection’’ window, select the option to ‘‘Create a data source based on another object.’’ This leads the wizard through the ‘‘Data sources from existing objects’’ page. This page offers two alternatives: ■ ‘‘Create a data source based on an existing data source in your solution’’ minimizes the number of places in which connection information must be edited when it changes. ■ ‘‘Create a data source based on an Analysis Services project’’ enables two projects to share data. This functionality is similar to using the Analysis Services OLE DB provider to access an existing database, but in this case the databases can be developed simultaneously without deployment complications. Data source view Whereas a data source describes where to look for tables of data, the data source view specifies which available tables to use and how they relate to each other. The data source view also associates metadata, such as friendly names and calculations, with those tables and columns. Creating the data source view The following steps create a data source view: 1. Add needed tables and named queries to a data source view. 2. Establish logical primary keys for tables without a primary key. 3. Establish relationships between related tables. 4. Annotate tables/columns with friendly names and calculations. Begin by creating the data source view via the wizard: Right-click on the Data Source Views folder and select the New option. There are several pages in the wizard: ■ SelectaDataSource:Choose one of the data sources to be included in this data source view. If more than one data source is to be included in the data source view, then the first data source must be a SQL Server data source. Pressing the Advanced button to limit the schemas retrieved can be helpful if there are many tables in the source database. ■ Name Matching: This page appears only when no foreign keys exist in the source database, providing the option of defining relationships based on a selection of common naming con- ventions. Matching can also be enabled via the NameMatchingCriteria property once the data source view has been created, identifying matches as additional tables added to an existing view. 1474 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1475 Building Multidimensional Cubes with Analysis Services 71 ■ Select Tables and Views: Move tables to be included from the left pane (available objects) to the right (included objects) pane. To narrow the list of available objects, enter any part of a table name in the Filter box and press the Filter button. To add objects related to included objects, select one or more included objects and press the Add Related Tables button. This same dialog is available as the Add/Remove Tables dialog after the data source view has been created. ■ Completing the Wizard: Specifyanameforthedatasourceview. Once the data source view has been created, more tables can be added by right-clicking in the diagram and choosing Add/Remove Tables. Use this method to include tables from other data sources as well. Similar to a SQL view, named queries can be defined, which behave as if they were tables. Either right-click on the diagram and choose New Named Query or right-click on a table and choose Replace Table/with New Named Query to bring up a Query Designer to define the contents of the named query. If the resulting named query will be similar to an existing table, then it is preferable to replace that table because the Query Designer will default to a query that is equivalent to the replaced table. Using named queries avoids the need to define views in the underlying data sources and allows all metadata to be centralized in a single model. As tables are added to the data source view, primary keys and unique indexes in the underlying data source are imported as primary keys in the model. Foreign keys and selected name matches (see Name Matching presented earlier in the section ‘‘Creating the data source view’’) are automatically imported as relationships between tables. For cases in which primary keys or relationships are not imported, they must be defined manually. For tables without primary keys, select one or more columns that define the primary key in a given table, right-click and select Set Logical Primary Key. Once primary keys are in place, any tables without appropriate relationships can be related by dragging and dropping the related columns between tables. If the new relationship is valid, the model will show the new relationship without additional prompting. If errors occur, the Edit Relationship dialog will appear. Resolving the error may be as simple as pressing Reverse to correct the direction of the relationship, as shown in Figure 71-2, or it may take additional effort depending on the type of error. A common issue when working with multiple data sources is different data types. For example, a key in one database may be a 16-bit integer, while another database may store the same information in a 32-bit integer. This situation can be addressed by using a named query to cast the 16-bit integer as its 32-bit equivalent. The Edit Relationship dialog can also be accessed by double-clicking an existing relationship, by right- clicking the diagram, and from toolbar and menu selections. Be sure to define all relationships, including relationships between different columns of the fact table and the same dimension table (for example, OrderDate and ShipDate both relate to the Time dimension table), as this enables role-playing dimension functionality when a cube is created. Managing the data source view As the number of tables participating in the data source view grows, it can become difficult to view all the tables and relationships at once. An excellent way to manage the complexity is to divide the tables into a number of diagrams. The Diagram Organizer pane in the upper-left corner of the Data Source View page is initially populated with a single <All Tables> diagram. Right-click in the Diagram 1475 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1476 Part X Business Intelligence Organizer pane and choose the New Diagram option to define a new diagram, and then drag and drop tables from the lower-left corner Tables pane to add tables to the new diagram. Alternately, right-click the diagram and use the Show Tables dialog to include tables currently in the <All Tables> diagram. However, don’t confuse the Show Tables dialog, which determines the data source view in which tables appear in a given diagram, with the Add/Remove Tables dialog, which determines which tables are in the data source view as a whole. FIGURE 71-2 The Edit Relationship dialog Other tools for managing data source views include the following: ■ Tables pane: All the tables in a data source view are listed in the Tables pane. Click on any table, and it will be shown and highlighted in the current diagram (provided the table exists in the current diagram). You can also drag tables from the Tables pane onto diagrams as an alternative to the Show Tables dialog. ■ Find Table: Invoked from toolbar or menu, this dialog lists only tables in the current diagram and allows filtering to speed the search process. Once chosen, the diagram shows and highlights the selected table. ■ Locator: The locator tool enables quick scrolling over the current diagram. Find it at the lower-right corner at the intersection of the scroll bars. Click and drag the locator to move around quickly within the diagram. ■ Switch layout: Right-click the diagram to toggle between rectangular and diagonal layout. The rectangular layout is table oriented and good for understanding many relationships at once. The diagonal layout is column oriented and thus good for inspecting relationship details. 1476 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1477 Building Multidimensional Cubes with Analysis Services 71 ■ Explore data: Looking at a sample of the data in a table can be very useful when building a data source view. Right-click any table to open the Explore page, which presents four tabbed views: The table view provides a direct examination of the sample data, while the pivot table and pivot chart views enable exploration of patterns in the data. The chart view shows a series of charts, breaking down the sample data by category based on columns in the sample data. The columns selected for analysis are adjustable using the drop-down at the top of the page, as are the basic charting options. The size and type of sample is adjustable from the Sampling Options button on the page’s toolbar. After adjust- ing sampling characteristics, press the Resample button to refresh the currently displayed sample. The data source view can be thought of as a cache of underlying schemas that enables a responsive modeling environment, and like all cache it can become outdated. When the underlying schema changes, right-click on the diagram and choose Refresh to reflect the latest version of the schema inthedatasourceview.Therefreshfunction,also available from the toolbar and menu, opens the Refresh Data Source View dialog, which lists all the changes affecting the data source view. Before accepting the changes, scan the list for deleted tables, canceling changes if any deleted tables are found. Inspect the underlying schema for renamed and restructured tables to determine how equivalent data can be retrieved, and resolve any conflicts before attempting the refresh again. For example, right-click on a renamed table and choose Replace Table/with Other Table to select the new table. This approach prevents losing relationship and other context information during the refresh. Refining the d ata source view One of the strengths of the UDM is that queries against that model do not require an understanding of the underlying table structures and relationships. However, even the table name itself often conveys important semantics to the user. For example, referencing a column as accounting.hr.staff .employee.hourly_rate indicates that this hourly rate is on the accounting server, hr database, staff schema, and employee table, which suggests this hourly rate column contains an employee pay rate and not the hourly charge for equipment rental. Because the source of this data is hidden by the unified dimensional model, these semantics will be lost. The data source view enables the definition of friendly names for every table and column. It also includes a description property for every table, column, and relationship. Friendly names and descriptions enable the preservation of existing semantics and the addition of others as appropriate. Best Practice M ake the data source view the place where metadata lives. If a column needs to be renamed to give it context at query time, give it a friendly name in the data source view, rather than rename a measure or dimension attribute — the two names are displayed side by side in the data source view and help future modelers understand how data is used. Use description properties for non-obvious notes, capturing the results of research required in building and modifying the model. 1477 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1478 Part X Business Intelligence Add a friendly name or description to any table or column by selecting the item and updating the corresponding properties in the Properties pane. Similarly, add a description to any relationship by selecting the relationship and updating the Properties pane, or by entering the description from the Edit Relation- ship dialog. The display of friendly names can be toggled by right-clicking the diagram. Best Practice A pplications and reports based on Analysis Services data are likely a large change for the target organization. Assign friendly names that correspond to the names commonly used throughout the organization to help speed adoption and understanding. Many simple calculations are readily included in the data source view as well. As a rule of thumb, place calculations that depend on a single row of a single table or named query in the data source view, but implement multi-row or multi-table calculations in MDX. Add calculations to named queries by coding them as part of the query. Add calculations to tables by right-clicking the table and choosing New Named Calculation. Enter a name and any expression the underlying data provider can interpret. For example, if SQL Server’s relational database is your data source, basic math, null replacement, and data conversion are all available for creating named calculations (think of any expression that can be written in T-SQL). Creating a cube Thedatasourceviewformsthebasisforcreatingthecubes,whichinturnpresent data to database users. Running the Cube Wizard generally provides a good first draft of a cube. Begin by right-clicking the Cubes folder and selecting New, and then work through these pages: ■ Select Build Method: Choose ‘‘Use existing tables.’’ The ‘‘generate tables in the data source’’ option is outlined previously in the ‘‘Analysis Services Quick Start’’ section. The option ‘‘Create an empty cube’’ does exactly that, essentially bypassing the wizard. ■ Select Measure Group Tables: Choose the appropriate data source view from the drop- down, and then indicate which tables are to be used as fact tables — meaning they will contain measures. Pressing the Suggest button will make an educated guess about which tables to check, but the guesses are not accurate in all cases. ■ Select Measures: The wizard presents a list of numeric columns that may be measures from the measure group tables. Check/uncheck columns as appropriate; measures can also be added/removed/adjusted at the conclusion of the wizard. Both Measure Groups and Measures can be renamed from this page. ■ Select Existing Dimensions: If the current project already has dimensions defined, then this page will be displayed to enable those dimensions to be included in the new cube. Check/uncheck dimensions as appropriate for the created cube. ■ Select New Dimensions: The wizard presents a list of dimensions and the tables that will be used to construct those dimensions. Deselect any dimensions that are not desired or any tables that should not be included in that dimension. Dimensions, but not tables, can be renamed from this page. 1478 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1479 Building Multidimensional Cubes with Analysis Services 71 ■ Completing the Wizard: Enter a name for the new cube and optionally review the measures and dimensions that will be created. Upon completion of the wizard, a new cube and associated dimensions will be created. Dimensions Recall from the discussion of star schema that dimensions are useful categorizations used to summarize the data of interest, the ‘‘group by’’ attributes that would be used in a SQL query. Dimensions created by a wizard generally prove to be good first drafts, but they need refinement before deploying a database to production. Background on Business Intelligence and Data Warehousing concepts is presented in Chapter 70, ‘‘BI Design.’’ Careful study of the capabilities of a dimension reveal a complex topic, but fortunately the bulk of the work involves relatively simple setup. This section deals first with that core functionality and then expands into more complex topics in ‘‘Beyond Basic Dimensions.’’ Dimension Designer Open any dimension from the Solution Explorer to use the Dimension Designer, shown in Figure 71-3. This designer presents informationinfourtabbedviews: ■ Dimension Structure: Presents the primary design surface for defining the dimension. Along with the ever-present Solution Explorer and Properties panes, three panes present the dimension’s structure: ■ Data Source View (right): Shows the portion of the data source view on which the dimension is built ■ Attributes (left): Lists each attribute included in the dimension ■ Hierarchies (center): Provides a space to organize attributes into common drill-down paths ■ Attribute Relationships: Displays a visual designer to detail how dimension attributes relate ■ Translations: Provides a place to define alternative language versions of both object captions and the data itself ■ Browser: Displays the dimension’s data as last deployed to the target analysis server Unlike data sources and data source views, cubes and dimensions must be deployed before their behavior (e.g., browsing data) can be observed. The process of deploying a dimension consists of two parts. First, during the build phase, the dimension definition (or changes to the definition as appropriate) is sent to the target analysis server. Examine the progress of the build process in the Output pane. Second, during the process phase, the Analysis Services server queries underlying data and populates dimension data. Progress of this phase is displayed in the Deployment Progress pane, usually positioned as a tab of the Properties pane. The Business Intelligence Development Studio attempts to build or process only the changed portions of the project to minimize the time required for deployment. 1479 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1480 Part X Business Intelligence FIGURE 71-3 Dimension Designer with AdventureWorks Customer dimension New in 2008 B est Practice warnings are now displayed in the Dimension Designer, appearing as either caution icons or blue underlines. These warnings are informational and are not applicable to every situation. Following the procedures outlined below, such as establishing hierarchies and attribute relationships, will eliminate many of these warnings. See the ‘‘Best Practice Warnings’’ topic later in this chapter for additional details. Attributes Attributes are the items that are available for viewing within the dimension. For example, a time dimension might expose year, quarter, month, and day attributes. Dimensions built by the Cube Wizard only have the key attribute defined. Other attributes must be manually added by dragging columns from the Data Source View pane to the Attributes pane. Within the attribute list, the key icon denotes the key 1480 www.getcoolebook.com Nielsen c71.tex V4 - 07/21/2009 3:53pm Page 1481 Building Multidimensional Cubes with Analysis Services 71 attribute (Usage property = Key), which corresponds to the primary key in the source data used to relate to the fact table. There must be exactly one key attribute for each dimension. Attribute source columns and ordering Columns from the data source view are assigned to an attribute’s KeyColumns and NameColumn properties to drive which data is retrieved in populating the attribute. During processing, Analysis Services will include both key and name columns in the SELECT DISTINCT it performs against the underlying data to populate the attribute. The KeyColumns assignment determines which items will be included as members in the attribute. The optional NameColumn assignment can give a display value to the key(s) when the key itself is not adequately descriptive. For example, a product dimension might assign a ProductID to the KeyColumn and the ProductName to the NameColumn. For the majority of attributes, the single key column assigned when the attribute is initially created will suffice. For example, an Address attribute in a customer dimension is likely to be a simple string in the source table with no associated IDs or codes; the default of assigning that single Address column as the KeyColumns value with no NameColumn will suffice. Some scenarios beyond the simple case include the following: ■ Attributes with both an ID/code and a name: The approach for this case, which is very common for dimension table primary keys (key attributes), depends on whether the ID or code is commonly understood by those who will query the dimension. If the code is common, then leave its NameColumn blank to avoid hiding the code. Instead, model the ID/Code and Name columns as separate attributes. If the ID or code is an internal application or warehouse value, then hide the ID by assigning both the KeyColumns and NameColumn properties on a single attribute. ■ ID/Code exists without a corresponding name: If the ID or code can take on only a few values (such as Yes or No), then derive a column to assign as the NameColumn by adding a named calculation in the data source view. If the ID or code has many or unpredictable values, then consider adding a new snowflaked dimension table to provide a name. ■ Non-Unique keys: It is important that the KeyColumns assigned uniquely identify the members of a dimension. For example, a time dimension table might identify months with numbers 1 through 12, which are not unique keys from one year to the next. In this case, it makes sense to include both year and month columns to provide a good key value. Once multiple keys are used, a NameColumn assignment is required, so add a named calculation to the data source view to synthesize a readable name (e.g., Nov 2008) from existing month and year columns. In the preceding non-unique keys scenario, it might be tempting to use the named calculation results (e.g., Jan 2009, Feb 2009) as the attribute’s key column were it not for ordering issues. Numeric year and month data is required to keep the attribute’s members in calendar, rather than alphabetic, order. The attribute’s OrderBy property enables members to be sorted by either key or name. Alternately, the OrderBy options AttributeKey and AttributeName enable sorting of the current attribute’s members based on the key or name of another attribute, providing that the other attribute has been defined as a member property of the current attribute. Member properties are described in detail in the next section. 1481 www.getcoolebook.com . Intelligence Development Studio, for defining database structure ■ SQL Server Management Studio, for managing and querying the server ■ Reporting Services, which can base report definitions on Analysis. use ADOMD. While the server will only speak XMLA via TCP/IP, clients have the option of using the HTTP protocol for their communications, if an appropriately configured IIS server is available. data, so MOLAP is the preferred option. A relational database in this context is not limited to SQL Server, but may be any data source for which an OLE DB provider exists. A compromise between the

Định dạng
Số trang	10
Dung lượng	672,72 KB