Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 452 Part III Beyond Relational INNER JOIN Customers c ON c.CustomerID = oh.CustomerID INNER JOIN Items i ON i.ItemNumber = x.value(’@ItemNumber’,’CHAR(4)’) /* OrderID Customer Item Quantity Price 1 Jacob Sebastian DELL XPS 1130 Laptop 1 900.00 1 Jacob Sebastian XBOX 360 Console 1 200.00 2 Jacob Sebastian DELL XPS 1130 Laptop 1 900.00 */ The preceding example joins the OrderHeader table with the OrderXML table and this join is between a relational column and an XML node. Again, another join is established between the Items table and the Item elements returned by the nodes() method. Another way to write this query is to embed the join operators as part of the XQuery expression itself. The following example demonstrates this: SELECT oh.OrderID, c.Name AS Customer, i.ItemDescription AS Item, x.value(’@Quantity’,’INT’) AS Quantity, x.value(’@Price’,’MONEY’) AS Price FROM OrderHeader oh INNER JOIN Customers c ON c.CustomerID = oh.CustomerID CROSS JOIN OrderXML CROSS JOIN Items i CROSS APPLY ItemData.nodes(’ /Order[@OrderID=sql:column("oh.OrderID")] /Item[@ItemNumber=sql:column("i.ItemNumber")]’) o(x) /* OrderID Customer Item Quantity Price 1 Jacob Sebastian DELL XPS 1130 Laptop 1 900.00 1 Jacob Sebastian XBOX 360 Console 1 200.00 2 Jacob Sebastian DELL XPS 1130 Laptop 1 900.00 */ The capability to join relational tables with XML nodes opens up a wide variety of possibilities to per- form join operations between relational and XML data. Using variables and filters in XQuery expressions SQL Server allows only string literals as XQuery expressions. The following is illegal in SQL Server 2008: DECLARE @node VARCHAR(100) SELECT @node = ‘/Order/Item’) 452 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 453 Manipulating XML Data 18 SELECT /* columns here */ FROM OrderXML CROSS APPLY ItemData.nodes(@node) o(x) Although you might want to use such an expression,itwon’twork;IseethisinXMLforumsallthe time. Don’t make the same mistake. There are two common scenarios in which one might need to use variables in XQuery expressions: ■ To apply filters on the value of elements or attributes; for example, to retrieve the nodes with itemnumber = ‘‘Z001’’ or OrderID = ‘‘1’’ ■ To retrieve the value of an element or attribute that is not known in advance, such as in cases where the name of the element or attribute is passed as an argument SQL Server allows using variables as part of an XQuery expression using the sql:variable() func- tion. The following example uses a variable to filter an item number from the XML node: DECLARE @ItemNumber CHAR(4) SELECT @ItemNumber = ‘D001’ SELECT x.value(’@ItemNumber’,’CHAR(4)’) AS ItemNumber, x.value(’@Quantity’,’INT’) AS Quantity, x.value(’@Price’,’MONEY’) AS Price FROM OrderXML CROSS APPLY ItemData.nodes(’ /Order/Item[@ItemNumber=sql:variable("@ItemNumber")]’ ) o(x) /* ItemNumber Quantity Price D001 1 900.00 D001 1 900.00 */ Returning the values of elements or attributes no t known in advance is a little trickier. This can be achieved by using the XQuery function local-name() and by matching it with the value of the given variable: DECLARE @Att VARCHAR(50) SELECT @Att = ‘ItemNumber’ SELECT x.value(’@*[local-name()=sql:variable("@Att")][1]’, ‘VARCHAR(50)’) AS Value FROM OrderXML CROSS APPLY ItemData.nodes(’/Order/Item’) o(x) /* 453 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 454 Part III Beyond Relational Value D001 Z001 D001 */ The preceding example retrieves the value of an attribute that is not known in advance. The name of the attribute is stored in a variable and the XQuery function local-name() is used to match the name of the attribute with the variable. Supporting variables as part of XQuery expressions greatly extends the power and flexibility of XQuery programming possibilities within T-SQL. Accessing the parent node Most of the time, a query retrieving information from an XML document needs to access information from nodes at different levels in the XML tree. The easiest way to achieve this may be b y using the par- ent node accessor, as shown in the following example: SELECT x.value(’ /@OrderID’,’INT’) AS OrderID, x.value(’@ItemNumber’,’CHAR(4)’) AS ItemNumber, x.value(’@Quantity’,’INT’) AS Quantity, x.value(’@Price’,’MONEY’) AS Price FROM OrderXML CROSS APPLY ItemData.nodes(’/Order/Item’) o(x) /* OrderID ItemNumber Quantity Price 1 D001 1 900.00 1 Z001 1 200.00 2 D001 1 900.00 */ The preceding example uses the parent node accessor ( ) to retrieve the OrderID attribute. While this syntax is pretty simple and easy to use, it may not be good in terms of performance. When the parent node accessor is used, the XQuery processor needs to go backward to read the parent node information while processing each row, which might slow down the query. The following example demonstrates a more optimized way of writing the preceding query using CROSS APPLY : SELECT h.value(’@OrderID’,’INT’) AS OrderID, x.value(’@ItemNumber’,’CHAR(4)’) AS ItemNumber, x.value(’@Quantity’,’INT’) AS Quantity, x.value(’@Price’,’MONEY’) AS Price 454 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 455 Manipulating XML Data 18 FROM OrderXML CROSS APPLY ItemData.nodes(’/Order’) o(h) CROSS APPLY h.nodes(’Item’) i(x) /* OrderID ItemNumber Quantity Price 1 D001 1 900.00 1 Z001 1 200.00 2 D001 1 900.00 */ The first CROSS APPLY operator used in the preceding query retrieves an accessor to the Order element, and the second CROSS APPLY returns an accessor to each Item element. This eliminates the need to use the parent node accessor in the query to read information from the Order element. The parent node accessor may be fine with small tables and small XML documents, but it is not recom- mended for large XML documents or tables. A better way of accessing the parent node is by using the CROSS APPLY approach demonstrated in the preceding example. Generating XML Output Using FOR XML FOR XML is a row set aggregation function that returns a one-row, one-column result set containing an NVARCHAR(MAX) value. The TYPE directive can be used along with FOR XML to produce XML data type output instead of NVARCHAR(MAX). FOR XML can be used with the AUTO, RAW, PATH and EXPLICIT directives to achieve different levels of control over the structure and format of the XML output. FOR XML AUTO FOR XML AUTO is one of the easiest options available to generate XML output from results of a SELECT query. It returns XML output having nested XML elements. Though it is easy to use and has a simple syntax, FOR XML AUTO does not provide much control over the structure of the XML output. FOR XML AUTO, as the name suggests, ‘‘automatically’’ identifies the element names, hierarchies, and so on, based on the table name, aliases, and joins used in the query. The following example demonstrates a basic use of FOR XML AUTO: SELECT OrderNumber, CustomerID FROM OrderHeader FOR XML AUTO /* <OrderHeader OrderNumber="SO101" CustomerID="1" /> <OrderHeader OrderNumber="SO102" CustomerID="1" /> */ 455 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 456 Part III Beyond Relational The element name is determined based on the name or alias of the table. In the preceding example, an element named OrderHeader is created because the name of the table is OrderHeader. By adding an alias to the table name, a different element name can be generated: SELECT OrderNumber, CustomerID FROM OrderHeader o FOR XML AUTO /* <o OrderNumber="SO101" CustomerID="1" /> <o OrderNumber="SO102" CustomerID="1" /> */ These examples produce XML fragments, and not valid XML documents. A valid XML document can have only one top-level element. A root element can be added to the output of a FOR XML AUTO query by specifying the ROOT directive. The ROOT directive takes an optional argument that specifies the name of the root element. If this argu- ment is not specified, the name of the top-level element will always be ‘‘root.’’ The following example adds a top-level element named SalesOrder to the XML output: SELECT OrderNumber, CustomerID FROM OrderHeader FOR XML AUTO, ROOT(’SalesOrder’) /* <SalesOrder> <OrderHeader OrderNumber="SO101" CustomerID="1" /> <OrderHeader OrderNumber="SO102" CustomerID="1" /> </SalesOrder> */ If the query has more than one table, FOR XML AUTO will generate hierarchical XML output based on the joins used in the query. The example given here joins the Order table with the Customers table: SELECT [Order].OrderNumber, [Order].OrderDate, Customer.CustomerNumber, Customer.Name FROM OrderHeader [Order] INNER JOIN Customers Customer ON [Order].CustomerID = Customer.CustomerID FOR XML AUTO /* <Order OrderNumber="SO101" OrderDate="2009-01-23T00:00:00"> <Customer CustomerNumber="J001" Name="Jacob Sebastian" /> </Order> */ 456 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 457 Manipulating XML Data 18 By default, FOR XML AUTO generates elements for each row, and values are generated as attributes. This behavior can be changed by specifying the ELEMENTS directive, which forces SQL Server to generate values as attributes: SELECT [Order].OrderNumber, [Order].OrderDate, Customer.CustomerNumber, Customer.Name FROM OrderHeader [Order] INNER JOIN Customers Customer ON [Order].CustomerID = Customer.CustomerID FOR XML AUTO, ELEMENTS /* <Order> <OrderNumber>SO101</OrderNumber> <OrderDate>2009-01-23T00:00:00</OrderDate> <Customer> <CustomerNumber>J001</CustomerNumber> <Name>Jacob Sebastian</Name> </Customer> </Order> */ A FOR XML query returns a result set with one row and one column containing an NVARCHAR(MAX) value. The TYPE directive can be used to request SQL Server to return an XML data type value, instead of NVARCHAR(MAX).TheTYPE directive is explained in this chapter. FOR XML AUTO can be used with additional directives such as XSINIL, XMLDATA,andXMLSCHEMA, each of which is covered in detail later in this chapter. FOR XML RAW FOR XML RAW is very similar to FOR XML AUTO, differing from it in only a couple of ways. One of the basic differences between FOR XML AUTO and FOR XML RAW is that the former doesn’t allow altering the name of the elements. FOR XML AUTO always generates XML elements based on the name of the table or alias. Conversely, FOR XML RAW generates e lements named <row> by default, and allows customizing it. An optional element name can be specified with the RAW directive and SQL Server will generate the ele- ments with the specified name: SELECT OrderNumber, CustomerID FROM OrderHeader FOR XML RAW(’Order’) /* <Order OrderNumber="SO101" CustomerID="1" /> <Order OrderNumber="SO102" CustomerID="1" /> */ 457 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 458 Part III Beyond Relational Another difference worth noticing is that FOR XML AUTO generates a top-level element for each table used in the query. FOR XML RAW generates only one top-level element for each row in the query result: SELECT OrderNumber, CustomerNumber FROM OrderHeader o INNER JOIN Customers c ON o.CustomerID = c.CustomerID FOR XML RAW(’Order’) /* <Order OrderNumber="SO101" CustomerNumber="J001" /> <Order OrderNumber="SO102" CustomerNumber="J001" /> */ If the preceding query were executed with FOR XML AUTO, it would create a top-level element for the order information and a child element for the customer information. If the ELEMENTS directive is speci- fied, SQL Server will create child elements for each column instead of attributes. Like the AUTO directive, RAW also supports an optional ROOT directive that generates a root element with the specified name. If the root name is not specified, a top-level element named <root> will be created: SELECT OrderNumber, CustomerID FROM OrderHeader FOR XML RAW(’Order’), ROOT(’Orders’) /* <Orders> <Order OrderNumber="SO101" CustomerID="1" /> <Order OrderNumber="SO102" CustomerID="1" /> </Orders> */ FOR XML RAW can also be used with additional directives such as XSINIL, XMLDATA,andXMLSCHEMA, covered in detail later in this chapter. FOR XML EXPLICIT FOR XML EXPLICIT is the most powerful clause available to generate XML output. It can generate very complex XML structures and offers a great deal of control over the output structure. However, most people find it too complicated to use. To use FOR XML EXPLICIT, each row should have two mandatory columns: Tag and Parent.The data should be such that a hierarchical relationship is established between the rows using Tag and Parent. Other columns should be named in a certain format that provides some additional metadata information. The following FOR XML EXPLICIT query generates an XML document with nested elements up to two levels: SELECT 1 AS Tag, 458 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 459 Manipulating XML Data 18 NULL AS Parent, CustomerNumber AS ‘Customer!1!CustNo’, NULL AS ‘LineItems!2!ItemNo’, NULL AS ‘LineItems!2!Qty’ FROM OrderHeader o INNER JOIN Customers c ON o.CustomerID = c.CustomerID AND o.OrderID = 1 UNION ALL SELECT 2 AS Tag, 1 AS Parent, NULL, i.ItemNumber, o.Quantity FROM Items i INNER JOIN OrderDetails o ON i.ItemID = o.ItemID AND o.OrderID = 1 FOR XML EXPLICIT /* <Customer CustNo="J001"> <LineItems ItemNo="D001" Qty="1" /> <LineItems ItemNo="Z001" Qty="1" /> </Customer> */ The results of the query without the FOR XML EXPLICIT clause look like the following: Tag Parent Customer!1!CustNo LineItems!2!ItemNo LineItems!2!Qty 1 NULL J001 NULL NULL 2 1 NULL D001 1 2 1 NULL Z001 1 Essentially, to run a FOR XML EXPLICIT query, SQL Server needs data in a format similar to the one given above. Tag and Parent are mandatory columns and they should maintain a valid parent-child (hierarchical) relationship. There should be at least one more additional column to run a FOR XML EXPLICIT query successfully. The XML generator will read the r esults of the query and identify the hierarchy o f the XML document based on the hierarchical relationship specified by Tag and Parent columns. The name of each column specifies the name of its parent element, its position in the hierarchy, and the name of the attribute. LineItems!2!Qty indicates that the value should apply to an attribute named Qty under the LineItems element at hierarchy level two. Due to this complex structuring requirement of the query results, most people feel that FOR XML EXPLICIT is overly complex. However, on the positive side, it is very powerful and offers a number of XML generation capabilities that no other FOR XML directive offers. 459 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 460 Part III Beyond Relational FOR XML EXPLICIT allows creating XML documents to satisfy almost every complex customization requirement. The trick is to generate a result set with the required hierarchical relationship, and the XML output will be generated accordingly. As shown in the preceding example, the names of the columns can be used to pass important metadata information to SQL Server. The previous example demonstrated a three-part column-naming convention that specified the element name, tag number, and attribute name. An optional fourth part can be used to control a number of aspects of the XML generation process. By default, FOR XML EXPLICIT generates values as attributes. The ELEMENTS directive can be used to instruct SQL Server to generate a given column as an e lement. The ELEMENTS directive can be spec- ified along with each column. The following example demonstrates a slightly modified version of the previous query that generates a customer number a s an element: SELECT 1 AS Tag, NULL AS Parent, CustomerNumber AS ‘Customer!1!CustNo!ELEMENT’, NULL AS ‘LineItems!2!ItemNo’, NULL AS ‘LineItems!2!Qty’ FROM OrderHeader o INNER JOIN Customers c ON o.CustomerID = c.CustomerID AND o.OrderID = 1 UNION ALL SELECT 2 AS Tag, 1 AS Parent, NULL, i.ItemNumber, o.Quantity FROM Items i INNER JOIN OrderDetails o ON i.ItemID = o.ItemID AND o.OrderID = 1 FOR XML EXPLICIT /* <Customer> <CustNo>J001</CustNo> <LineItems ItemNo="D001" Qty="1" /> <LineItems ItemNo="Z001" Qty="1" /> </Customer> */ FOR XML EXPLICIT processes rows in the same order as that returned by the SELECT query. If the data in the output should appear in a specific order, then the order clause should be specified in the SELECT query. Sometimes, it can happen that the order should be done on a column that is not needed in the output XML. This might be a little tricky. Because the final output is generated by a series of queries that uses UNION ALL, the sort operation can be applied only on a column that already exists in the query results. However, often the column or expression to be used for the ordering might not be required in the XML output. 460 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 461 Manipulating XML Data 18 FOR XML EXPLICIT supports another directive, HIDE, that can be applied on columns that should be excluded from the final output. A column or expression that a query needs for sorting of the final results can be marked as HIDE and will be excluded from the XML output. An example might best explain this. The example demonstrated earlier in this section generated XML output with information taken from only one order. The next example tries to generate an XML docu- ment with information from all the orders in the sample database. It attempts to generate XML output as follows: <Orders> <Order CustNo="J001" OrderNo="SO101"> <LineItems ItemNo="D001" Qty="1" /> <LineItems ItemNo="Z001" Qty="1" /> </Order> <Order CustNo="J001" OrderNo="SO102"> <LineItems ItemNo="D001" Qty="1" /> </Order> </Orders> Just as with RAW and AUTO, EXPLICIT supports the ROOT directive to generate a top-level element. The following example shows a slightly modified version of the previous query that has a ROOT clause and the filter for order ID removed: SELECT 1 AS Tag, NULL AS Parent, CustomerNumber AS ‘Order!1!CustNo’, OrderNumber AS ‘Order!1!OrderNo’, NULL AS ‘LineItems!2!ItemNo’, NULL AS ‘LineItems!2!Qty’ FROM OrderHeader o INNER JOIN Customers c ON o.CustomerID = c.CustomerID UNION ALL SELECT 2 AS Tag, 1 AS Parent, NULL, NULL, i.ItemNumber, o.Quantity FROM Items i INNER JOIN OrderDetails o ON i.ItemID = o.ItemID FOR XML EXPLICIT, ROOT(’Orders’) /* <Orders> <Order CustNo="J001" OrderNo="SO101" /> <Order CustNo="J001" OrderNo="SO102"> <LineItems ItemNo="D001" Qty="1" /> <LineItems ItemNo="Z001" Qty="1" /> 461 www.getcoolebook.com . data. Using variables and filters in XQuery expressions SQL Server allows only string literals as XQuery expressions. The following is illegal in SQL Server 2008: DECLARE @node VARCHAR(100) SELECT @node. name of the element or attribute is passed as an argument SQL Server allows using variables as part of an XQuery expression using the sql: variable() func- tion. The following example uses a variable. metadata information to SQL Server. The previous example demonstrated a three -part column-naming convention that specified the element name, tag number, and attribute name. An optional fourth part can be