ptg 1874 CHAPTER 47 Using XML in SQL Server 2008 <MountainBikeSpecials xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”> <Product> <Color>White</Color> <Id>710</Id> <Name>Mountain Bike Socks, L</Name> <Size>L</Size> <Offer> <Id>1</Id> <Desc>No Discount</Desc> </Offer> </Product> <Product> <Color>White</Color> <Id>709</Id> <Name>Mountain Bike Socks, M</Name> <Size>M</Size> <Offer> <Id>1</Id> <Desc>No Discount</Desc> </Offer> <Offer> <Id>2</Id> <Desc>Volume Discount 11 to 14</Desc> </Offer> <Offer> <Id>3</Id> <Desc>Volume Discount 15 to 24</Desc> </Offer> <Offer> <Id>4</Id> <Desc>Volume Discount 25 to 40</Desc> </Offer> </Product> </MountainBikeSpecials> With AUTO mode, the keywords BINARY BASE64 have the same effect as with RAW mode, with one major difference: RAW mode generates an error if binary data is selected and BINARY BASE64 is not specified; therefore, it is required. With AUTO mode, binary data may be selected without specifying BINARY BASE64, although SQL Server requires that the primary key of the table containing the binary data be selected. This is so that SQL Server can generate a path to the binary field, using the primary key to address the row (in place of the encoded data), of the following form: ’dbobject/SchemaName.TableName[@PrimaryKeyName=”PrimaryKeyValue”]/@ColumnName’ ptg 1875 Relational Data As XML: The FOR XML Modes 47 This special XPath-like output is unique to AUTO mode and is useful for applications that incorporate SQLXML’s URL-based querying to return the desired binary data. Listing 47.7 illustrates this XML production. LISTING 47.7 Addressing Binary Data That Uses FOR XML AUTO SELECT Top 1 Photo.ProductPhotoId, ThumbNailPhoto, Color, Offer.SpecialOfferId Id, Product.ProductId Id, Name, Description [Desc], Size FROM Sales.SpecialOffer Offer JOIN Sales.SpecialOfferProduct OP ON OP.SpecialOfferId = Offer.SpecialOfferId JOIN Production.Product Product ON Product.ProductId = OP.ProductId JOIN Production.ProductProductPhoto PhotoJunction ON Product.ProductId = PhotoJunction.ProductId JOIN Production.ProductPhoto Photo ON Photo.ProductPhotoId = PhotoJunction.ProductPhotoId WHERE Name LIKE ‘Mountain Bike%’ FOR XML AUTO, ELEMENTS XSINIL, ROOT(‘MountainBikeSpecials’) go <MountainBikeSpecials xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”> <Photo> <ProductPhotoId>1</ProductPhotoId> <ThumbNailPhoto> dbobject/Production.ProductPhoto[@ProductPhotoID=’1’]/@ThumbNailPhoto </ThumbNailPhoto> <Product> <Color>White</Color> <Id>710</Id> <Name>Mountain Bike Socks, L</Name> <Size>L</Size> <Offer> <Id>1</Id> <Desc>No Discount</Desc> </Offer> </Product> </Photo> </MountainBikeSpecials> Notice how you can generate an additional level of nesting (with the Photo element) in the XML hierarchy simply by selecting a value from an additional table. SQL Server has a set of rules it uses for nesting elements in AUTO mode. As rows are streamed to output, the XML engine studiously compares the values in adjacent columns ptg 1876 CHAPTER 47 Using XML in SQL Server 2008 to check for differences from the first row on down to the last. When one or more primary keys have been selected in the query, only the primary key values are used in the column comparison. When no primary keys have been selected, all column values are used in the comparison, except for columns of type ntext, text, image, or xml, whose values are always assumed to be different. The following example includes primary keys in the SELECT statement: SELECT Offer.SpecialOfferId, Product.ProductId, Name FROM Sales.SpecialOffer Offer JOIN Sales.SpecialOfferProduct OP ON OP.SpecialOfferId = Offer.SpecialOfferId JOIN Production.Product Product ON Product.ProductId = OP.ProductId WHERE Name LIKE ‘Mountain Bike%’ go SpecialOfferId ProductId Name 1 710 Mountain Bike Socks, L 1 709 Mountain Bike Socks, M 2 709 Mountain Bike Socks, M 3 709 Mountain Bike Socks, M 4 709 Mountain Bike Socks, M (5 row(s) affected) As the XML engine works down this result set, it sees that SpecialOfferId has the same value in the first and second rows, but ProductId differs in the same rows. It therefore creates one Offer element and nests the two different Product values in Product subele- ments. Column selection order is also a determining factor in AUTO mode XML composition. Notice that even though in Rows 2–5, the ProductId remains 709, the XML engine still nests Product under Offer because Offer.SpecialOfferId is specified first in the list of selected columns. When FOR XML AUTO is added to the preceding query, it results in the following: <MountainBikeSpecials> <Offer SpecialOfferId=”1”> <Product ProductId=”710” Name=”Mountain Bike Socks, L” /> <Product ProductId=”709” Name=”Mountain Bike Socks, M” /> </Offer> <Offer SpecialOfferId=”2”> <Product ProductId=”709” Name=”Mountain Bike Socks, M” /> </Offer> <Offer SpecialOfferId=”3”> <Product ProductId=”709” Name=”Mountain Bike Socks, M” /> </Offer> ptg 1877 Relational Data As XML: The FOR XML Modes 47 <Offer SpecialOfferId=”4”> <Product ProductId=”709” Name=”Mountain Bike Socks, M” /> </Offer> </MountainBikeSpecials> To tell the XML engine that you prefer to nest Offer under Product, you simply change the column order in the SELECT statement: SELECT Product.ProductId, Offer.SpecialOfferId, Name FROM Sales.SpecialOffer Offer JOIN Sales.SpecialOfferProduct OP ON OP.SpecialOfferId = Offer.SpecialOfferId JOIN Production.Product Product ON Product.ProductId = OP.ProductId WHERE Name LIKE ‘Mountain Bike%’ FOR XML AUTO, ROOT(‘MountainBikeSpecials’) go <MountainBikeSpecials> <Product ProductId=”710” Name=”Mountain Bike Socks, L”> <Offer SpecialOfferId=”1” /> </Product> <Product ProductId=”709” Name=”Mountain Bike Socks, M”> <Offer SpecialOfferId=”1” /> <Offer SpecialOfferId=”2” /> <Offer SpecialOfferId=”3” /> <Offer SpecialOfferId=”4” /> </Product> </MountainBikeSpecials> EXPLICIT Mode FOR XML EXPLICIT is a powerful, oft-maligned, somewhat daunting mode of SQL Server XML production. It allows for the shaping of row data in any desirable XML structure, but the SQL required to produce it can easily end up being hundreds (or, in some cases, thou- sands) of lines long, leading to a potential maintenance headache. With EXPLICIT mode, the query author is responsible for making sure the XML is well formed and that the rowset generated behind the scenes corresponds to a very particular format. The FOR XML PATH statement renders FOR XML EXPLICIT obsolete except when you need to output column values as CDATA. This section therefore briefly covers the required query structure for and provides an example of this particular case. ptg 1878 CHAPTER 47 Using XML in SQL Server 2008 NOTE It’s not an easy task to understand EXPLICIT mode just by reading. Practice is essen- tial. After you’ve succeeded in using it a few times, it will begin to feel like an intuitive, albeit complex, way of doing things. Microsoft calls the relational structure behind EXPLICIT mode queries the universal table. The universal table has a hierarchical structure sometimes known as the adjacency list model. Put simply, this means that the first column in the table is the primary key, and the second column is a foreign key referencing it, creating a parent–child relationship between rows in the same table. XML similarly models this relationship through the nesting of elements because nodes contained inside other nodes also hold a parent–child relationship. Each level of hierarchical depth in the universal table is created by a separate SELECT state- ment, and each SELECT is unioned to the next, producing the complete rowset. Some details on the table structure help make this clearer: . The first column in the universal table (think of it as the primary key) must be named Tag and hold an integer value. The value of Tag can be thought of as repre- senting the depth of the node that will be produced. . The second column must be named Parent and must refer to a valid value of Tag, or null, in the case of the first branch. . The rest of the selected columns in the query are mapped either to attributes, subelements, or CDATA nodes, or they may be selected but not produced in the resultant XML. Listing 47.8 shows a query that returns a universal table. Later, you can change it so that it returns XML by adding FOR XML EXPLICIT. LISTING 47.8 A Query That Generates the Universal Table Rowset Format SELECT 1 as Tag, NULL as Parent, Reason.ScrapReasonId ‘ScrapReason!1!ScrapReasonId!element’, Name ‘ScrapReason!1!!cdata’, WorkOrderId ‘WorkOrder!2!WorkOrderId’, NULL ‘WorkOrder!2!ScrappedQuantity’ FROM Production.ScrapReason Reason JOIN Production.WorkOrder WorkOrder ON Reason.ScrapReasonId = WorkOrder.ScrapReasonID WHERE Reason.ScrapReasonId = 12 UNION ALL ptg 1879 Relational Data As XML: The FOR XML Modes 47 SELECT 2 as Tag, 1 as Parent, Reason.ScrapReasonId, NULL, WorkOrderId, ScrappedQty FROM Production.ScrapReason Reason JOIN Production.WorkOrder WorkOrder ON Reason.ScrapReasonId = WorkOrder.ScrapReasonID WHERE Reason.ScrapReasonId = 12 The first SELECT statement in the union must use a special column alias syntax that tells the XML generator how to shape each column. This is the syntax: element_name!corresponding_Tag_value!attribute_or_subelement_name[!directive] The following list explains each part of the preceding syntax: . element_name—The name of the generated element associated with each row. . corresponding_Tag_value—The value of Tag for the context rowset. . attribute_or_subelement_name—The name of the attribute or subelement associ- ated with the column in the context row. . directive—An optional directive to the XML generator. The possible values are . element—When specified, tells the XML generator to produce the column associated with attribute_or_subelement_name as a subelement. (An attribute is produced by default.) . hide—Tells the XML generator not to show the associated column data at all in the produced XML. This may be needed if there is some side effect desired from selecting the column but the data does not need to be shown. . cdata—Tells the XML generator to output the associated column data as a CDATA section. . xml—Disables entitization of text data. This can lead to non-well-formed XML because the XML special characters ( &, ’, ”, <, >) are output directly. In all subsequent SELECT statements, the columns corresponding to the rowsets identified by Tag are selected according to the layout specified in the first SELECT. Notice how in Listing 47.8, NULL is selected for WorkOrder!2!ScrappedQuantity. This is done because the value for that column will be filled in by the SELECT statement having a Tag value of 2, as specified in corresponding_Tag_value. Likewise, ScrappedQty is selected only in the second SELECT statement (where NULL is supplied for ScrapReason!1!!cdata) because Name is selected in this column in the first SELECT. The primary key ptg 1880 CHAPTER 47 Using XML in SQL Server 2008 ( ScrapReasonId) that is the common thread joining both sets of rows must be specified in both SELECT statements for this query to work. Now that you have an understanding of the universal table structure that must be built, the only thing left to do is add FOR XML EXPLICIT to the query in Listing 47.8 and then order the output according to the desired element hierarchy. Listing 47.9 illustrates the final query and its result. LISTING 47.9 Using FOR XML EXPLICIT SELECT 1 as Tag, NULL as Parent, Reason.ScrapReasonId ‘ScrapReason!1!ScrapReasonId!element’, Name ‘ScrapReason!1!!cdata’, WorkOrderId ‘WorkOrder!2!WorkOrderId’, NULL ‘WorkOrder!2!ScrappedQuantity’ FROM Production.ScrapReason Reason JOIN Production.WorkOrder WorkOrder ON Reason.ScrapReasonId = WorkOrder.ScrapReasonID WHERE Reason.ScrapReasonId = 12 UNION ALL SELECT 2 as Tag, 1 as Parent, Reason.ScrapReasonId, NULL, WorkOrderId, ScrappedQty FROM Production.ScrapReason Reason JOIN Production.WorkOrder WorkOrder ON Reason.ScrapReasonId = WorkOrder.ScrapReasonID WHERE Reason.ScrapReasonId = 12 ORDER BY ‘ScrapReason!1!ScrapReasonId!element’, ‘WorkOrder!2!WorkOrderId’ FOR XML EXPLICIT, ROOT(‘ScrappedWorkOrders’) go <ScrappedWorkOrders> <ScrapReason> <ScrapReasonId>12</ScrapReasonId> <![CDATA[Thermoform temperature too high]]> <WorkOrder WorkOrderId=”2573” ScrappedQuantity=”14” /> </ScrapReason> <ScrapReason> <ScrapReasonId>12</ScrapReasonId> <![CDATA[Thermoform temperature too high]]> <WorkOrder WorkOrderId=”4972” ScrappedQuantity=”1” /> ptg 1881 Relational Data As XML: The FOR XML Modes 47 </ScrapReason> <ScrapReason> <ScrapReasonId>12</ScrapReasonId> <![CDATA[Thermoform temperature too high]]> <WorkOrder WorkOrderId=”7771” ScrappedQuantity=”6” /> </ScrapReason> <ScrapReason> <ScrapReasonId>12</ScrapReasonId> <![CDATA[Thermoform temperature too high]]> <WorkOrder WorkOrderId=”9071” ScrappedQuantity=”1” /> </ScrapReason> <ScrapReason> <ScrapReasonId>12</ScrapReasonId> <![CDATA[Thermoform temperature too high]]> <WorkOrder WorkOrderId=”10274” ScrappedQuantity=”1” /> </ScrapReason> { } </ScrappedWorkOrders> In the ORDER BY clause, you tell the XML generator to first produce ScrapReason elements and then nest the WorkOrder elements underneath them. Like the other modes, FOR XML EXPLICIT supports the BINARY BASE64 keywords, although base-64 encoding is performed automatically by the parser, even if not specified. The ROOT keyword can also be used, although not when specifying XMLDATA. XMLSCHEMA is not supported as of this writing. ELEMENTS and XSINIL are also not supported, probably because you can get along without them, thanks to the many shaping options available. PATH Mode PATH mode is the latest and best addition to the FOR XML syntax. It provides a straightfor- ward way of using a limited XPath syntax to specify the shaping of query-produced XML. It is also a very compact syntax in comparison with some of the other modes, especially EXPLICIT. Let’s look at how PATH mode works by re-creating the XML produced in Listing 47.9, this time using PATH mode. Listing 47.10 illustrates this mode. LISTING 47.10 Using FOR XML PATH to Simplify an EXPLICIT Query SELECT Reason.ScrapReasonId, Name ‘text()’, WorkOrderId ‘WorkOrder/@WorkOrderId’, ScrappedQty ‘WorkOrder/@ScrappedQuantity’ FROM Production.ScrapReason Reason ptg 1882 CHAPTER 47 Using XML in SQL Server 2008 JOIN Production.WorkOrder WorkOrder ON Reason.ScrapReasonId = WorkOrder.ScrapReasonID WHERE Reason.ScrapReasonId = 12 FOR XML PATH(‘ScrapReason’), ROOT(‘ScrappedWorkOrders’) go <ScrappedWorkOrders> <ScrapReason> <ScrapReasonId>12</ScrapReasonId> Thermoform temperature too high <WorkOrder WorkOrderId=”2573” ScrappedQuantity=”14” /> </ScrapReason> <ScrapReason> <ScrapReasonId>12</ScrapReasonId> Thermoform temperature too high <WorkOrder WorkOrderId=”4972” ScrappedQuantity=”1” /> </ScrapReason> <ScrapReason> <ScrapReasonId>12</ScrapReasonId> Thermoform temperature too high <WorkOrder WorkOrderId=”7771” ScrappedQuantity=”6” /> </ScrapReason> <ScrapReason> <ScrapReasonId>12</ScrapReasonId> Thermoform temperature too high <WorkOrder WorkOrderId=”9071” ScrappedQuantity=”1” /> </ScrapReason> { } </ScrappedWorkOrders> The only difference between Listing 47.10 and Listing 47.9 is that here you aren’t outputting a CDATA section—just a text node for the ScrapReason.Name column. Which FOR XML query would you rather maintain? As the query in Listing 47.10 illustrates, the PATH keyword works like RAW in that all columns values are wrapped in a default element. Like RAW, PATH takes a parameter to specify the name of this default element. If a name is not specified, row is used, just as it is with RAW. Unlike RAW, PATH mode is element-centric. When a column is not specified to be generated as an attribute (for example, using an XPath column alias, such as WorkOrderId ‘@Id’), it is produced as a subelement of the default tag. You can also specify the ROOT keyword and the ELEMENTS XSINIL keywords in the same manner as RAW, although using ELEMENTS is somewhat redundant because PATH mode defaults to element-centric XML. Using ELEMENTS XSINIL is still the only way to produce null values in the XML. ptg 1883 Relational Data As XML: The FOR XML Modes 47 You are not allowed to specify XMLSCHEMA and XMLDATA. BINARY BASE64 may be specified, but it is not required because base-64 encoded data is automatically generated. To build the XML, the engine first works down the column list to figure out the desired XML shape to be output. XML is then generated for each row, based on the shape speci- fied by the column names or aliases. Columns can be aliased using the literal string XPath format, or they may have no alias at all. In the example in Listing 47.10, the following structure is specified by the column selections: . For Reason.ScrapReasonId, output a subelement of ScrapReason (specified by PATH(‘ScrapReason’)) called ScrapReasonId. When no alias is specified, the default shape is element-centric. . For Name, output the value as a text-only child node of ScrapReason. . For WorkOrderId, output a child node of ScrapReason called WorkOrder and add an attribute called WorkOrderId to it. . For ScrappedQty, output an attribute of WorkOrder called ScrappedQuantity. Usually, when you set out to shape XML, you intuitively know where you want your values to be, so it’s more a matter of practice and application than memorization. When you know the basics, the syntax is intuitive enough to create whatever XML you desire. FOR XML PATH has a few other neat features, which Listing 47.11 illustrates in one fell swoop. LISTING 47.11 Demonstrating Several Features of FOR XML PATH in a Single Query SELECT Reason.ScrapReasonId ‘*’, ‘Comment: Name = ‘ + Name ‘comment()’, ModifiedDate ‘processing-instruction(ModDatePI)’, ( SELECT WorkOrderId ‘data()’ FROM Production.WorkOrder WorkOrder JOIN Production.ScrapReason Reason ON Reason.ScrapReasonId = WorkOrder.ScrapReasonID WHERE Reason.ScrapReasonId = 12 ORDER BY WorkOrderId desc FOR XML PATH(‘’) ) ‘WorkOrders/@WorkOrderIds’ FROM Production.ScrapReason Reason WHERE Reason.ScrapReasonId = 12 FOR XML PATH(‘ScrappedWorkOrder’), ROOT(‘ScrappedWorkOrders’) go <ScrappedWorkOrders> . without specifying BINARY BASE64, although SQL Server requires that the primary key of the table containing the binary data be selected. This is so that SQL Server can generate a path to the binary. ptg 1874 CHAPTER 47 Using XML in SQL Server 2008 <MountainBikeSpecials xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”> <Product> <Color>White</Color> <Id>710</Id> <Name>Mountain. engine studiously compares the values in adjacent columns ptg 1876 CHAPTER 47 Using XML in SQL Server 2008 to check for differences from the first row on down to the last. When one or more primary