ptg 2094 CHAPTER 51 SQL Server 2008 Analysis Services reduced within the past year or so, and the ease of transparently applying this type of solu- tion to OLAP is a natural fit. It affects both the OLAP data population process and the day- to-day what-if usage by the end users. You should keep these types of surgical incisions in mind when you face OLAP performance issues in this platform. They are easy to apply, the gains are huge, and you quickly get a return on your investment. MPP Data Warehouse Option from Microsoft A few years ago, Microsoft acquired DATAllegro’s massively parallel data warehouse appli- ance company. This basically lifted any limitations for data warehousing that SSAS or SQL Server 2008 R2 itself had. Massively parallel means to scale horizontally on CPU and storage to grow with your size and processing needs. There is no practical limit here. The underlying architecture relies on standards-based technologies. Essentially, there is a sepa- ration of storage and compute nodes that allows you to spread out your data across vast storage (EMC storage) so that it is very shallow (easy to get to quickly across all data storage). The compute power is also horizontally scalable and allows any query to process data access in parallel to surface data needed by any query (and assemble it for delivery). Figure 51.70 shows the high-level architecture of Microsoft’s DATAllegro v3 offering. Not only is the DATAllegro v3 architecture massively parallel and fast, but the multinode architecture also makes it highly available. If any node fails, hot spares kick in to pick up the load. Any failed node can easily be replaced and brought online with zero processing interruption. Moreover, multiple appliances can be combined on a common InfiniBand backbone to create large-scale and extremely powerful multitier or hub-and-spoke data warehouses with rapid, parallel data movement between the various appliances. Believe it or not, there is an Ingres SQL engine at the heart of the database portion of this appliance. Dual 4GB FC Controller Dual 4GB FC Controller Ingres Compute Nodes Dual 4GB Fiber Channel Ingres Dual 4GB Fibre Channel 16GB RAM 16GB RAM Cisco – Redundant Infiband Network Cisco – Redundant Infiband Network Storage Nodes Dual Fiber Channel Network Hot Spare FIGURE 51.70 The DATAllegro v3 MPP architecture. ptg 2095 An OLAP Requirements Example: CompSales International 51 Master Data Services Completing the business intelligence picture is a new focus on the data quality that is needed at all tiers of data information delivery. Microsoft has been pouring an enormous amount of effort (and money) into creating and embedding master data services throughout its BI and transactional platforms. By using Microsoft’s Master Data Services, organizations can align operational and analytical data across the enterprise and across lines of business systems with a guaranteed level of data quality for most core data cate- gories (such as customer data, product data, and other core data of the business). Microsoft has created data stewardship capabilities complete with workflows and notifica- tions of any business user who might be impacted by core data change. Managing hierar- chies is also an important part of mastering data that has a natural hierarchical structure, such as customer hierarchies (parent company to subsidiaries and so on). Each master data change within the system is treated as a transaction; and the user, date, and time of each change are logged, as well as pertinent audit details, such as type of change, member code, and prior versus new value. In addition to being a very useful audit trail, the transaction log can be used to selectively reverse changes. Customizable data quality rules create default values, enable data validation, and trigger actions such as email notifications and workflows. Rules can be built by IT professionals or business users directly from the stew- ardship portal. Microsoft is still getting the kinks out of Master Data Services, so you should look for much maturing to come in the next few years. Other competing products that have many years’ headstart provide this capability to companies around the globe, but Microsoft is catching up fast. Security and Roles Security is straightforward in SSAS. For each database or cube, roles are identified with varying levels of granularity for users. Roles are used when accessing the data in cubes. The process works like this: a role is defined, and then an individual user or group who is a member of that role is assigned that role. To create the roles you need for this data, you right-click on the Roles entry in the Solution Explorer and select New Role. Figure 51.71 shows the creation of a database role with process database and read definition permissions. The other tabs of the role designer allow you to further specify the controls, such as which members you want to have this role (Membership tab), what data source access you want (Data Sources tab), which cubes can be used (Cubes tab), what specific cell data the role has access to (Cell Data tab), what dimensions can be accessed (Dimensions tab), what dimensional data can be accessed (Dimension Data tab), and what mining structures are allowed to be used (Mining Structures tab). These are additive. As you can see in Figure 51.72, you can also specify full MDX queries as part of the process of filtering what a member and role can have access to. ptg 2096 CHAPTER 51 SQL Server 2008 Analysis Services FIGURE 51.71 Creating a database role and permissions in the role designer. FIGURE 51.72 Specifying MDX-based filtering, using the role designer. ptg 2097 Summary 51 Summary This chapter discusses the OLAP approach, SSAS terms, and the tools Microsoft provides to enable OLAP cubes. It presents a mini-methodology to follow that should help you get an OLAP project off the ground and running smoothly. These efforts are typically not simple, and a well-trained data warehouse analyst, BI specialist, or data architect is usually worth his or her weight in gold because of the results (and value) that can be achieved through good OLAP cube design. Sometimes it is difficult to engage end users and get them to use an OLAP cube success- fully. Easy-to-use third-party tools can greatly help with this problem. From an SSAS point of view, the ease of control of storage methods, dimension creation, degrees of aggregation, cube partitioning, and usage-based optimization are features that make this product a serious data warehousing tool. It is getting easier and easier to publish OLAP data via websites or other means. SSAS is truly the land of the wizards, but having a wizard lead you through a good OLAP cube design is critical. The wizards significantly reduce the expense and complexity of a data warehouse or data mart OLAP solution, enabling you to build many more much-needed solutions for your end users. This chapter also introduces the new paths Microsoft is pursuing around massively parallel data warehouse appliances and the integration of Master Data Services into their business intelligence and transactional fabric to raise their levels of performance and data quality across the board. The next chapter, “SQL Server 2008 Integration Services,” ventures into the very robust offering from Microsoft in regards to data enablement, manipulation, and aggregation for not only Analysis Services, but most other production platforms that require complex data transformations. ptg This page intentionally left blank ptg CHAPTER 52 SQL Server Integration Services IN THIS CHAPTER . What’s New with SSIS666 . SSIS Basics667 . SSIS Architecture and Concepts671 . SSIS Tools and Utilities676 . A Data Transformation Requirement682 . Running the SSIS Wizard682 . The SSIS Designer693 . The Package Execution Utility702 . Connection Projects in Visual Studio716 . Change Data Capture Addition with R2718 . Using bcp718 . Logged and Nonlogged Operations737 As you may be aware, SQL Server 2000’s Data Transformation Services (DTS) was completely redeployed into and integrated with the Business Intelligence (BI) Development Studio, Visual Studio environments, and SQL Server Management Studio (SSMS). This chapter describes the SQL Server Integration Services (SSIS) environment and how SSIS addresses complex data movement and integra- tion needs. SSIS focuses on importing, exporting, and transforming data from one or more data sources to one or more data targets. This is Microsoft’s version of extraction, transformation, and loading (ETL) on steroids. Competing ETL products include Informatica, but Microsoft has simply bundled this functionality together with SQL Server, thus providing more reasons to purchase SQL Server and not have to buy any expensive competing products. Other Microsoft solutions exist for importing and exporting data (such as the Bulk Copy Program, bcp), but SSIS can be used for a larger variety of data transformation purposes, and its strength is in direct data access and complex data transformation. If you have existing DTS implementations (that is, DTS packages), you can convert them to SSIS packages with little to no effort, or you can simply execute them as is (with some restrictions). If you still use the Bulk Copy Program ( bcp), a section at the end of this chapter describes this legacy SQL Server capabil- ity. bcp is still the workhorse of many production environ- ments and cannot just be discarded every time a new version of SQL Server comes along. We estimate that bcp will be around for years to come. ptg 2100 CHAPTER 52 SQL Server Integration Services The alternatives to SSIS and bcp in the Microsoft SQL Server 2008 environment include replication, distributed queries, BULK INSERT, and SELECT INTO/INSERT. This chapter helps you determine how and when to use both SSIS and bcp as opposed to these other alternatives. What’s New with SSIS In SQL Server 2008, Microsoft has further extended the capabilities of SSIS into a much more comprehensive and robust data integration platform—with the emphasis on the word platform. The following are some of the highlights of SSIS 2008: . Continued support for SQL Server 2000 Data Transformation Services (DTS). This includes DTS runtime, the object model that it exposes, and the dtsrun.exe command-line utility. This support will likely be deprecated in the next full release of SQL Server, though. There are several 64-bit restrictions with DTS. . Extensive performance enhancements to leverage caching for lookup transforma- tions, previously a major performance bottleneck during transformations. This also includes sharing caches in a single package and between separate packages. . New ADO.NET components for both source and destinations. . New data profiling tasks and a Data Profile Viewer. . A new Integration Services Connections Project Wizard that speeds the creation of the connection information needed by packages. . A new script environment called Visual Studio Tools for Applications (VSTA) envi- ronment. VSTA supports both Microsoft Visual Basic 2008 and Visual C# 2008. . Package upgrades from 2005 (or earlier) to 2008 package format. . Enhanced data type handling in the SQL Server Import and Export Wizard and a few new data types, such as new Date and Time data types. . SQL statement enhancements that allow you to perform multiple data manipula- tions at the same time with MERGE. . The ability to use SQL Server 2008’s Change Data Capture technology from within Integration Services. This one is really a big deal and has been added for R2 via Microsoft partners. . The ability to create debug dump files that provide information about your pack- age’s execution. SSIS Basics As the world becomes ever more data oriented, much greater emphasis is being placed on getting data from one place to another. To complicate matters, data can be stored in many different formats, contexts, filesystems, and locations. In addition, the data often requires ptg 2101 SSIS Basics 52 SQL Server 2008 Data Mart SQL Server 2000 Master Data Warehouse Distributing periodic updates to Data Marts from a “master” Data Warehouses Data Mart SQL Server 2005 Data Mart ORACLE SSIS SSIS SSIS FIGURE 52.1 Distributing periodic updates to data marts. significant transformation and conversion processing as it is being moved around. Whether you are trying to move data from Excel to SQL Server, create a data mart (or data warehouse), or distribute data to heterogeneous databases, you are essentially enabling someone with data. This section describes the SSIS environment and how it is addressing these needs. As mentioned earlier, the focus is on importing, exporting, and transforming data from one or more data sources to one or more data targets. Common requirements of SSIS might include the following: . Exporting data out of SQL Server tables to other applications and environments (for example, ODBC or OLE DB data sources or via flat files) . Importing data into SQL Server tables from other applications and environments (for example, ODBC or OLE DB data sources or via flat files) . Initializing data in some data replication situations, such as initial snapshots . Aggregating data (that is, data transformation) for distribution to/from data marts or data warehouses . Changing the data’s context or format before importing or exporting it (that is, data conversion) Some typical business scenarios for SSIS might include the following: . Enabling data marts to receive data from a master data warehouse through periodic updates (see Figure 52.1) ptg 2102 CHAPTER 52 SQL Server Integration Services FIGURE 52.2 Populating a data warehouse from one or more data sources. . Populating a master data warehouse from legacy systems (see Figure 52.2) . Initializing heterogeneous replication subscriber tables on Oracle from a SQL Server 2008 Publisher (see Figure 52.3) . Pulling sales data directly into SQL Server 2008 from an Access or Excel application (see Figure 52.4) . Exporting static time-reporting data files (that is, flat files) for distribution to remote consultants . Importing new orders directly or indirectly from a sales force automation or distrib- uted sales systems In general, you need SSIS if any of the following conditions exist: . You need to import data directly into SQL Server from one or more ODBC data sources, .NET and OLE DB data providers, or via flat files. . You need to export data directly out of SQL Server to one or more ODBC data sources, .NET and OLE DB data providers, or via flat files. . You need to perform data conversions, data cleansing/data standardization, transfor- mations, merges, or aggregations on data from one or more data sources for distribu- tion to one or more data targets. You also need SSIS if you need to access the data directly via any ODBC data source, .NET or OLE DB data providers, or via flat files. ptg 2103 SSIS Basics 52 FIGURE 52.3 Initializing a heterogeneous replication subscriber (such as Oracle). FIGURE 52.4 Pulling data from other disparate applications. . requires ptg 2101 SSIS Basics 52 SQL Server 2008 Data Mart SQL Server 2000 Master Data Warehouse Distributing periodic updates to Data Marts from a “master” Data Warehouses Data Mart SQL Server 2005 Data Mart ORACLE SSIS SSIS SSIS FIGURE. VSTA supports both Microsoft Visual Basic 2008 and Visual C# 2008. . Package upgrades from 2005 (or earlier) to 2008 package format. . Enhanced data type handling in the SQL Server Import and. be around for years to come. ptg 2100 CHAPTER 52 SQL Server Integration Services The alternatives to SSIS and bcp in the Microsoft SQL Server 2008 environment include replication, distributed