Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1512 Part X Business Intelligence Tuples and simple sets As outlined earlier in the section ‘‘Cube Addressing,’’ tuples are constructed by listing one member from each hierarchy; or, if no member is explicitly specified for a particular hierarchy, the default member (usually the [All] level) for that hierarchy is implicitly included in the tuple. Parentheses are used to group the tuple’s member list. For example, the following tuple references all the cells that contain Internet sales volume for German customers over all time, all territories, all products, and so on: ([Customer].[Country].[Country].&[Germany], [Measures].[Internet Sales Amount]) This example includes a measure, which is part of every tuple. Unlike a dimension, it is not possible to have an [All] level when it is omitted, but the cube can be configured with a default measure, which will be used if no measure is specified. Also notice the simplified syntax to refer to a measure: [Measures].[Measure Name]. When a simple tuple is specified with only one hierarchy member, the parentheses can be omitted, so all German customers becomes the following: Customer].[Country].[Country].&[Germany]. The simplest way to build a set is by listing one or more tuples inside of braces. For example, using simple tuples without parentheses, the following is a set of French and German customers: {[Customer].[Country].[Country].&[France], [Customer].[Country].[Country].&[Germany]} Basic SELECT statement Simple sets enable the construction of a basic SELECT statement. The following example query returns Internet sales to French and German customers for the calendar years 2003 and 2004: SELECT {[Customer].[Country].[Country].&[France], [Customer].[Country].[Country].&[Germany]} ON COLUMNS, {[Date].[Calendar Year].[Calendar Year].&[2003], [Date].[Calendar Year].[Calendar Year].&[2004]} ON ROWS FROM [Adventure Works] WHERE ([Measures].[Internet Sales Amount]) All queries in this chapter are constructed to run against the AdventureWorks Cube sam- ples for SQL Server 2008. See www.codeplex.com/SqlServerSamples to download the sample Analysis Services and Relational databases. Result: France Germany CY 2003 $1,026,324.97 $1,058,405.73 CY 2004 $ 922,179.04 $1,076,890.77 1512 www.getcoolebook.com Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1513 Programming MDX Queries 72 This example places sets on two axes, rows and columns, which become the row and column headers. The WHERE clause in this example limits the query to only cells containing Internet sales amount. The WHERE clause is often called the slicer, as it limits the scope of the query to a particular slice of the cube. Think of the slicer as determining how each hierarchy that isn’t part of some axis definition will contribute to the query. Any number of headers can be specified for an axis by including more than one non-default hierarchy in each tuple that builds the axis set. The following example creates two row headers by listing both the Product Line and Sales Reason Type hierarchies in each row tuple: SELECT {[Customer].[Country].[Country].&[France], [Customer].[Country].[Country].&[Germany]} ON COLUMNS, {([Product].[Product Line].[Product Line].&[S], [Sales Reason].[Sales Reason Type].[Sales Reason Type]. &[Marketing]), ([Product].[Product Line].[Product Line].&[S], [Sales Reason].[Sales Reason Type].[Sales Reason Type]. &[Promotion]), ([Product].[Product Line].[Product Line].&[M], [Sales Reason].[Sales Reason Type].[Sales Reason Type]. &[Marketing]), ([Product].[Product Line].[Product Line].&[M], [Sales Reason].[Sales Reason Type].[Sales Reason Type]. &[Promotion]) } ON ROWS FROM [Adventure Works] WHERE ([Measures].[Internet Sales Amount], [Date].[Calendar Year].[Calendar Year].&[2004]) Result: France Germany Accessory Marketing $ 962.79 $ 349.90 Accessory Promotion $ 2,241.84 $ 2,959.86 Mountain Marketing $ 189.96 $ 194.97 Mountain Promotion $100,209.88 $126,368.03 This hierarchy-to-header mapping provides a way to control the geometry of the result set, but it also places a restriction on creating sets for an axis: The hierarchies specified for tuples in the set must remain consistent for every item in the set. For example, having some of the tuples in the preceding example use the Product Category hierarchy instead of the Product Line hierarchy would not be allowed. Think of inconsistencies between the tuples as causing blank header cells, which MDX doesn’t know how to handle. Another restriction on creating sets for an MDX query is that each hierarchy can appear on only one axis or slicer definition. If the Calendar Year hierarchy is explicitly named in the row definition, then 1513 www.getcoolebook.com Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1514 Part X Business Intelligence it cannot appear again in the slicer. This restriction applies purely to the hierarchy — another hierar- chy that contains the calendar year data (for example, the Calendar hierarchy in AdventureWorks) can appear on one axis while the Calendar Year hierarchy appears elsewhere. Measures Measures are the values that the cube is created to present. They are available in MDX as members of the always present Measures dimension. The Measures dimension has no hierarchies or levels, so each measure is referenced directly from the dimension level as [Measures].[measure name]. If no mea- sure is specified for a query, then the cube’s default measure is used. Generating sets from functions Developing MDX code would quickly become very tedious if every set had to be built by hand. Several functions can be used to generate sets using data within the cube; some of the most popular are listed below. To save space, each example has been constructed such that it provides the missing row set for this example query: SELECT {[Measures].[Internet Sales Amount], [Measures].[Internet Total Product Cost]} ON COLUMNS, { } ON ROWS FROM [Adventure Works] ■ .Members: Lists all of the individual members of either a hierarchy or a level. Used with a level, all the members of that level are listed (e.g., [Date].[Calendar].[Month].Members returns all calendar months). When used with a hierarchy, all members from every level are listed (e.g., [Date].[Calendar].Members returns every year, semester, quarter, month, and day). ■ .Children: Lists all the children of a given member (e.g., [Date]. [Calendar].[Calendar Quarter].&[2002]&[1].Children returns all the months in the first quarter of 2002). ■ Descendants(start [,depth [,show]]): Lists the children, grandchildren, and so on, of a member or set of members. Specify start as the member or set of members, depth as either a specific level name or the number of levels below start. By default, if depth is specified, only descendants at that depth are listed; the show flag can alter that behavior by allowing levels above, at, or below to be shown as well — values include SELF, AFTER, BEFORE, BEFORE_AND_AFTER, SELF_AND_AFTER, SELF_AND_BEFORE, SELF_BEFORE_AFTER. Some examples are as follows: ■ Descendants([Date].[Calendar].[Calendar Year].&[2003])lists the year, semesters, quarters, months, and days in 2003. ■ Descendants([Date].[Calendar].[Calendar Year].&[2003],[Date]. [Calendar].[Month])lists the months in 2003. ■ Descendants([Date].[Calendar].[Calendar Year].&[2003],3,SELF _AND_AFTER) lists the months and days in 2003. 1514 www.getcoolebook.com Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1515 Programming MDX Queries 72 ■ LastPeriods(n, member): Returns the last n periods ending with member (e.g., LastPeriods(12,[Date].[Calendar].[Month].&[2004]&[6])lists July 2003 through June 2004). If n is negative, then future periods are returned beginning with member. ■ TopCount(set, count [,numeric_expression]): Returns the top n (count) of a set sorted by the numeric_expression (e.g., TopCount([Date].[Calendar].[Month]. Members, 5, [Measures].[Internet Sales Amount]) returns the top five months for Internet sales). Omitting the numeric_expression argument just returns the first count entries of the set. Very similar to functions BottomCount, TopPercent,and BottomPercent. Unlike TopCount and its cousins, most set functions do not involve sorting as part of their function, and instead return a set with members in their default cube order. The Order function can be used to sort a set: Order(set, sort_by [ , { ASC | DESC | BASC | BDESC } ]). Specify the set to be sorted, the expression to sort by, and optionally the order in which to sort (defaults to ASC). The ASCending and DESCending options sort within the confines of the hierarchy. For example, sorting months within the AdventureWorks Calendar hierarchy using one of these options, months will move around within a quarter, but will not cross quarter (hierarchy) boundaries. The ‘‘break hierarchy’’ options, BASC and BDESC, will sort without regard to the parent under which a member normally falls. Generated sets frequently have members for which no measure data are available. These members can be suppressed by prefixing the axis definition with NON EMPTY. The following example shows sales by salesperson for months in 2004; NON EMPTY is used for the column headers because the cube does not contain data for all months in 2004, and NON EMPTY is useful in the row definition because not every employee is a salesperson. In addition, the Order function is used to rank the salespeople by total sales in 2004. Note that the sort_by is a tuple specifying sales for the year of 2004. Had the [Date].[Calendar].[Calendar Year].&[2004]been omitted, the ranking would have instead been sales over all time. SELECT NON EMPTY {Descendants([Date].[Calendar].[Calendar Year].&[2004], 3)} ON COLUMNS, NON EMPTY { Order( [Employee].[Employee].Members, ([Date].[Calendar].[Calendar Year].&[2004], [Measures].[Reseller Sales Amount]), BDESC ) } ON ROWS FROM [Adventure Works] WHERE ([Measures].[Reseller Sales Amount]) 1515 www.getcoolebook.com Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1516 Part X Business Intelligence The preceding query yields the following: January 2004 February 2004 June 2004 All Employees $1,662,547.32 $2,700,766.80 $3,415,479.07 Linda C. Mitchell $ 117,697.41 $ 497,155.98 $ 282,711.04 Jae B. Pak $ 219,443.93 $ 205,602.75 $ 439,784.05 Stephen Y. Jiang $ 70,815.36 (null) $ 37,652.92 Amy E. Alberts $ 323.99 $ 42,041.96 (null) Syed E. Abbas $ 3,936.02 $ 1,376.99 $ 4,197.11 These generated sets all contain a single hierarchy, so how are multiple headers generated? The Crossjoin function will generate the cross-product of any number of sets, resulting in a single, large set with tuples made in every combination of the source sets. For example, the following query provides two levels of headers listing Product Line and Sales Territory Country. Alternately, the cross-join operator ‘‘ *’’ can be placed between sets to generate the cross-product: SELECT NON EMPTY {Descendants([Date].[Calendar].[Calendar Year].&[2004], 3)} ON COLUMNS, NON EMPTY { Crossjoin([Product].[Product Line].[Product Line].Members, [Sales Territory].[Sales Territory Country].[Sales Territory Country].Members) } ON ROWS FROM [Adventure Works] WHERE ([Measures].[Reseller Sales Amount]) Using SQL Server Management Studio The names of objects within a cube can be very long and difficult to type correctly. Fortunately, SQL Server Management Studio provides a convenient drag-and-drop interface for specifying both object names and MDX functions. Begin by opening a new Analysis Services MDX query, and choose the appropriate Analysis Services database in the toolbar, and the target cube in the upper left-hand corner of the query window. The Metadata tab (see Figure 72-2) is automatically populated with all the measures, dimensions, and so on, for that cube. MDX queries can then be built up by dragging objects onto the script pane or by switching to the Functions tab and similarly dragging function definitions. For more details about working in SQL Server Management Studio, see Chapter 6, ‘‘Using Management Studio.’’ The cube developer may choose to group dimension hierarchies into folders, also shown in Figure 72-2. Folders provide a handy way to organize long lists of hierarchies and have no effect on the structure of the cube or how MDX is written. 1516 www.getcoolebook.com Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1517 Programming MDX Queries 72 FIGURE 72-2 SQL Server Management Studio Metadata tab Currently selected database Currently selected cube/perspective Drag & Drop function definitions from this tab Account Dimension Folder organizing hierarchies into groups Attribute (single-level) Hierarchy User (multi-level) Hierarchy First Level (single dot) in hierarchy Members in first level of hierarchy Advanced Select Query Beyond the basic table generation described so far in this chapter, the syntax described here includes the most commonly used features: [ WITH <calc | set> [ , <calc | set> ] ] SELECT [ <set> on 0 [ , <set> on 1 ] ] FROM <cube> | <subcube> [ WHERE ( <set> ) ] The SELECT statement can return from 0 to 128 axes, with the first five having aliases of ROWS, COLUMNS, PAGES, SECTIONS,andCHAPTERS. Alternately, axis numbers can be specified as AXIS(0), AXIS(1), etc. 1517 www.getcoolebook.com Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1518 Part X Business Intelligence Best Practice A s the complexity of a query increases, the need for clarity and documentation increases as well. Break long queries onto several lines and use indentation to organize nested arguments. Add comments about the intent and meaning using ‘‘ ’’ or ‘‘//’’ for end of line comments, or /*comment*/ for embedded or multi-line comments. Subcubes Subcubes are helpful for breaking complex logic into manageable segments. They can also be helpful for building applications when either a consistent view of a changing population (e.g., top five salespeo- ple) or a fixed population with alternate views (e.g., only displaying sales figures to employees of that department) is desired. For example, an application that displays a subset of data based on which user is logged in could build all its queries based on a user-specific subcube. Specify a subcube in the FROM clause by enclosing another SELECT within parentheses where a cube name would normally appear. This works much like a derived table in SQL, except that whereas a derived table includes o nly the columns explicitly identified, a subcube includes all hierarchies in the result, though some of the hierarchies will have limited membership. The following example creates a subcube of the top five products and top five months for U.S. Internet sales, and then summarizes order counts by day of the week and subcategory: SELECT {[Date].[Day Name].Members} on Columns, {[Product].[Subcategory].[Subcategory].Members} ON ROWS FROM (SELECT {TOPCOUNT([Product].[Model Name].[Model Name].Members, 10, [Measures].[Internet Sales Amount])} ON COLUMNS, {TOPCOUNT([Date].[Calendar].[Month], 5, [Measures].[Internet Sales Amount])} ON ROWS FROM [Adventure Works] WHERE ([Customer].[Country].&[United States])) WHERE ([Measures].[Internet Order Count]) WITH clause The WITH clause enables the creation of sets and calculated members. While some of the functionality provided can be performed directly within axis definitions, it is good practice to use sets and members to break logic apart into units that can be more easily constructed and understood. Don’t confuse these constructs with similar syntax in T-SQL, as they behave quite differently. Best Practice S ets and calculations can also be defined as part of the cube (see the ‘‘MDX Scripting’’ section that follows). If any item is to be used in more than a handful of queries, create it as part of the cube, making it globally available and adjustable by changes in a single location. 1518 www.getcoolebook.com Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1519 Programming MDX Queries 72 Sets Add a named set to the WITH clause using the syntax SET set_name AS definition,where set_name is any legal identifier, and definition specifies a set appropriate for use in an axis or WHERE clause. The following example builds three sets to explore the nine-month trends on products with ratios over 5% in 2004: WITH SET [ProductList] AS Filter( [Product].[Product].[Product].Members, ([Date].[Calendar Year].&[2004], [Measures].[Internet Ratio to All Products])>0.05 ) SET [TimeFrame] AS LastPeriods(9,[Date].[Calendar].[Month].&[2004]&[6]) SET [MeasureList] AS { [Measures].[Internet Order Count], [Measures].[Internet Sales Amount] } SELECT {[MeasureList]*[ProductList]} ON COLUMNS, {[TimeFrame]} ON ROWS FROM [Adventure Works] The preceding query yields the following: Internet Internet Internet Internet Order Count Order Count Sales Amount Sales Amount Mountain-200 Mountain-200 Mountain-200 Mountain-200 Silver, 38 Black, 46 Silver, 38 Black, 46 October 2003 29 29 $ 67,279.71 $ 66,554.71 November 2003 28 31 $ 64,959.72 $ 71,144.69 December 2003 32 42 $ 74,239.68 $ 96,389.58 January 2004 28 36 $ 64,959.72 $ 82,619.64 February 2004 36 34 $ 83,519.64 $ 78,029.66 March 2004 35 33 $ 81,199.65 $ 75,734.67 April 2004 45 34 $104,399.55 $ 78,029.66 May 2004 48 50 $111,359.52 $114,749.50 June 2004 62 44 $143,839.38 $100,979.56 This example uses the Filter function to limit the set of products to those with ratios over 5%. The Filter function has the following general form: Filter(set, condition). 1519 www.getcoolebook.com Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1520 Part X Business Intelligence Best Practice P erhaps the most important query optimization available is limiting the size of sets as early as possible in the query, before cross joins or calculations are performed. Many optimizations a developer can expect when writing T-SQL queries are not available in MDX. Calculated Members Although the syntax of a calculated member is similar to that of a set, MEMBER member_name as definition , the member name must fit in to an existing hierarchy, as shown in the following example: WITH MEMBER [Measures].[GPM After 5% Increase] AS ( [Measures].[Internet Sales Amount]*1.05 - [Measures].[Internet Total Product Cost] ) / [Measures].[Internet Sales Amount], FORMAT_STRING = ‘Percent’ MEMBER [Product].[Subcategory].[Total] AS [Product].[Subcategory].[All Products] SELECT {[Measures].[Internet Gross Profit Margin], [Measures].[GPM After 5% Increase]} ON 0, NON EMPTY{[Product].[Subcategory].[Subcategory].Members, [Product].[Subcategory].[Total]} ON 1 FROM [Adventure Works] WHERE ([Date].[Calendar].[Calendar Year].&[2004]) This query yields the following: Internet Gross Profit Margin GPM after 5% Increase Bike Racks 62.60% 67.60% Bike Stands 62.60% 67.60% Bottles and Cages 62.60% 67.60% Touring Bikes 37.84% 42.84% Vests 62.60% 67.60% Total 41.45% 46.45% This query examines the current and what-if gross profit margin by product subcategory, including a subcategory ‘‘total’’ across all products. Note how the names are designed to match the other hierarchies used on their query axis. FORMAT_STRING is an optional modifier to set the display format for a calcu- lated member. The source cube contains default formats for each measure, but new measures created by 1520 www.getcoolebook.com Nielsen c72.tex V4 - 07/21/2009 3:55pm Page 1521 Programming MDX Queries 72 calculation will likely require formatting. [Product].[Subcategory].[Total], like most totals and subtotals, can rely on a parent member (in this case, the [All] level) to provide the appropriate value: WITH SET [Top20ProductList] AS TOPCOUNT([Product].[Product].[Product].Members, 20, ([Date].[Calendar].[Calendar Year].&[2004], [Measures].[Internet Order Count])) SET [NotTop20ProductList] AS Order( Filter( {[Product].[Product].[Product].Members - [Top20ProductList] }, NOT IsEmpty([Measures].[Internet Order Count])), [Measures].[Internet Order Count],BDESC) MEMBER [Measures].[Average Top20ProductList Order Count] AS AVG([Top20ProductList],[Measures].[Internet Order Count]) MEMBER [Measures].[Difference from Top20 Products] AS [Measures].[Internet Order Count] - [Measures].[Average Top20ProductList Order Count] MEMBER [Product].[Product].[Top 20 Products] AS AVG([Top20ProductList]) SELECT {[Measures].[Internet Order Count], [Measures].[Difference from Top20 Products] } ON COLUMNS, {[Product].[Product].[Top 20 Products], [NotTop20ProductList]} ON ROWS FROM [Adventure Works] WHERE ([Date].[Calendar].[Month].&[2004]&[6]) Result: Internet Order Count Difference from Top 20 Products Top 20 Products 176 0 Hydration Pack - 70 oz. 76 −100 Mountain-200 Silver, 38 62 −114 Touring-3000 Yellow, 54 4 −172 Touring-3000 Yellow, 58 4 −172 Mountain-500 Black, 40 2 −174 This example compares the average June 2004 order count of the top 20 products to the other products ordered that month. A contrived example to be sure, but it demonstrates a number of concepts: ■ Top20ProductList: Builds a set of the top 20 products based on orders for the entire year of 2004. 1521 www.getcoolebook.com . chapter are constructed to run against the AdventureWorks Cube sam- ples for SQL Server 2008. See www.codeplex.com/SqlServerSamples to download the sample Analysis Services and Relational databases. Result: France. ([Measures].[Reseller Sales Amount]) Using SQL Server Management Studio The names of objects within a cube can be very long and difficult to type correctly. Fortunately, SQL Server Management Studio provides. slicer, as it limits the scope of the query to a particular slice of the cube. Think of the slicer as determining how each hierarchy that isn’t part of some axis definition will contribute to the