Microsoft SQL Server 2008 R2 Unleashed- P193 ppsx

ptg 1914 CHAPTER 47 Using XML in SQL Server 2008 Removing XML Nodes by Using delete delete uses its XPath parameter to locate the node to remove. In the example in Listing 47.22, any alph node that has a name attribute with a value of B is deleted. Then the remaining values for alph/@name are selected, using nodes() to illustrate the success of the deletion. LISTING 47.22 Deleting Nodes Using delete DECLARE @XmlVar xml SET @XmlVar = ‘ <alphnumerics> <item> <alph name=”A” val=”65”/> </item> <item> <alph name=”B” val=”66”/> </item> <item> <alph name=”C” val=”67”/> </item> <item> <num name=”1” val=”49”/> </item> <item> <num name=”2” val=”50”/> </item> <item> <num name=”3” val=”51”/> </item> </alphnumerics>’ SET @XmlVar.modify(‘delete(//item/alph[@name=”B”])’) SELECT XmlTable.XmlCol.value(‘./@name’, ‘char(1)’) as RemainingAlphNames FROM @XmlVar.nodes(‘//item/alph’) as XmlTable(XmlCol) go AlphNames A C (2 row(s) affected) Modifying XML with insert and replace value of You can insert and update new nodes in document trees by using insert. In these situations, node position counts most. Let’s look at a real-world example for the scenarios in this section: say that a content author is building a structured document. Each node has both its respective level (or depth) and its order of appearance. Your DML operations must respect both. The markup ptg 1915 Using the xml Data Type 47 and table storage for such a scenario might look something like the untyped XML in Listing 47.23. LISTING 47.23 Simple Untyped XML Markup for a Book CREATE TABLE SimpleBook (BookId int IDENTITY(1,1) PRIMARY KEY CLUSTERED, BookXml xml) GO INSERT SimpleBook SELECT ‘<book book_id=”1”> <title>A Great Work</title> <chapter chapter_id=”1”> <title>An Excellent Chapter</title> <section id=”1”> <title>A Boring Section</title> <paragraph para_id=”1”> Something boring. </paragraph> </section> <section id=”2”> <title>Another Fine Section</title> <paragraph para_id=”2”> Another fine paragraph. </paragraph> </section> </chapter> </book>’ In this listing, notice that the XML element content in the first section seems out of place, considering the laudatory content of the chapter and book titles. You can fix this by using replace value of, which has the following syntax: replace value of old_expression with new_expression NOTE When you are updating typed xml values, the value specified in new_expression must be of the same XSD-declared type as the value selected in old_expression. Here is the update for the book’s incongruous content: UPDATE SimpleBook SET BookXml.modify(‘ ptg 1916 CHAPTER 47 Using XML in SQL Server 2008 replace value of (/book/chapter/section[@id=”1”]/title/text())[1] with “A Fine Section” ‘) WHERE BookId = 1 GO UPDATE SimpleBook SET BookXml.modify(‘ replace value of (/book/chapter/section/paragraph[@para_id=”1”]/text())[1] with “A Fine Paragraph” ‘) WHERE BookId = 1 (1 row(s) affected) (1 row(s) affected) You can also add a new section to the document by using the insert function, which has the following syntax: insert new_node_expression ( {{{as first | as last} into} | after | before} reference_node_expression ) In new_node_expression, you specify the nodes to be inserted, using the familiar direct or computed constructor syntax, discussed earlier in this chapter, in the section “Selecting XML by Using query().” New to SQL Server 2008 is the capability to insert variable values of type xml using insert. What’s different about insert is that it allows for the specification of where, with respect to the reference_node_expression, the constructed nodes are to be placed. To specify that the new nodes are to be inserted as children of the reference node, you use as first into when specifying the first child. You use as last into when specifying the last child. To specify that the new node is to be inserted as a sibling of the reference node, you use after to specify the next sibling or before to specify that the new node is a previous sibling of the reference node (that is, the new node is now to be the leftmost sibling). You can finish the sample document by adding a new chapter to the book, using the code in Listing 47.24. LISTING 47.24 Inserting Nodes by Using insert UPDATE SimpleBook SET BookXml.modify(‘ insert <chapter chapter_id=”2”> <title>This is Chapter 2</title> </chapter> after ptg 1917 Using the xml Data Type 47 (/book/chapter[@chapter_id=1])[1] ‘) WHERE BookId = 1 GO UPDATE SimpleBook SET BookXml.modify(‘ insert <section id=”3”> <title>This is Section 3</title> </section> as last into (/book/chapter[@chapter_id=2])[1] ‘) WHERE BookId = 1 GO SELECT BookXml FROM SimpleBook GO <book book_id=”1”> <title>A Great Work</title> <chapter chapter_id=”1”> <title>An Excellent Chapter</title> <section id=”1”> <title>A Fine Section</title> <paragraph para_id=”1”>A Fine Paragraph</paragraph> </section> <section id=”2”> <title>Another Fine Section</title> <paragraph para_id=”2”> Another fine paragraph. </paragraph> </section> </chapter> <chapter chapter_id=”2”> <title>This is Chapter 2</title> <section id=”3”> <title>This is Section 3</title> </section> </chapter> </book> The first call to modify() inserts a new chapter after the first chapter, as its right-most sibling. The second call to modify() inserts a new section as the last child of the new section. ptg 1918 CHAPTER 47 Using XML in SQL Server 2008 TIP Both reference_node_expression of insert and new_expression of replace value of require a singleton to be matched in their XPath expressions; otherwise, SQL Server raises an error. Using a singleton here is sometimes hard to do because you have to think like an XML parser in terms of how many possible nodes may be matched. Even though you may know that there’s only one node in the instance document matching a complex predicate such as /book/chapter/section/paragraph[@para_id=”1”]/text(), the parser knows that more than one is possible because the position of the nodes has not been specified. It’s usually best to enclose the matching XPath expression in parentheses and then apply the positional predicate (that is, [1]) to the entire sequence, as the examples illustrate. Otherwise, your XPath expressions need to look as ugly as the following, where the position is specified for every node in the sequence: /book[1]/chapter[1]/section[1]/paragraph[1] [@para_id=”1” and position() = 1]/text()[1] All three XML DML functions that use modify() have the side effect of causing any XML indexes on the xml column to be repropagated to reflect the changes, just as with relational indexes. The next section covers how to create and maintain primary and secondary indexes on your xml columns. Indexing and Full-Text Indexing of xml Columns Just as with relational data, xml column data, whether typed or untyped, can be indexed. Indexing xml Columns Two levels of indexing are available for xml columns: primary and secondary. Three types of secondary indexing are available, based on the different kinds of XQuery queries that will be performed on the column: PATH for path-based querying, PROPERTY for property bag scenarios, and VALUE for value-based querying. If you want to create a primary XML index on a table, it must meet a few requirements: . The table must have a clustered primary key (with fewer than 16 columns in it). The reason is that the primary XML index contains a copy of the primary key for back referencing. It is also required for table partitioning because it ensures that the primary XML index is partitioned in the same manner as the table. The primary key of the table thus cannot be modified unless all the XML indexes on the table are dropped. ptg 1919 Indexing and Full-Text Indexing of xml Columns 47 . Your SET options must have the following values when you’re creating or rebuilding XML indexes or when you’re attempting to use the modify() xml data type method, which triggers index maintenance: SET ANSI_NULLS ON SET ANSI_PADDING ON SET ANSI_WARNINGS ON SET ARITHABORT ON SET CONCAT_NULL_YIELDS_NULL ON SET NUMERIC_ROUNDABORT OFF SET QUOTED_IDENTIFIER ON Note that these are the SET values in a default SQL Server installation. You can view them by calling DBCC USEROPTIONS in T-SQL. As with many other operations, indexes can be created both by using the dialogs in SSMS and also in T-SQL. The following syntax can be used to create a primary XML index on an xml column: CREATE PRIMARY XML INDEX IndexName ON TableName(XmlColumnName) For example, using the SimpleBook table from the previous section, you would execute CREATE PRIMARY XML INDEX PrimaryXmlIndex_BookXml ON SimpleBook(BookXml) To drop an XML index, you execute DROP INDEX IndexName ON TableName To do the same thing in SSMS, you right-click the table name in Object Explorer, click Modify, and then right-click the xml column and select XML Indexes. Then you use the Add or Delete buttons to create or drop indexes. NOTE Dropping the primary XML index also drops all secondary indexes because they are dependent on the columns of the shredded Infoset’s table of the primary XML index (discussed in the next section). You can disable XML indexes using the following syntax: ALTER INDEX XmlIndexName on TableName DISABLE You can rebuild them using the following syntax: ALTER INDEX XmlIndexName on TableName REBUILD You can also query XML indexes like other indexes, using the catalog view sys.indexes. XML indexes are different from relational indexes in a few important ways. Let’s consider their underlying structure and how they work at runtime. ptg 1920 CHAPTER 47 Using XML in SQL Server 2008 Understanding XML Indexes XML indexes store the xml column data for a table in a compressed B + tree (pronounced B plus tree) data structure. The XML data is stored there in its shredded rather than original XML format (remember the universal table?). XML Infoset information items (that is, nodes), the navigational paths used to find each item, and other crucial data are stored in the columns of the index. NOTE XML Infoset is a W3C recommendation defining an abstract data set and a corresponding set of terms used to refer to any item in any well-formed XML document. For example, each element in a document is considered to be an element information item, each attribute an attribute information item, and so forth. A B + tree is a tree data structure that stores content such that the values for every node in the tree are exclusively kept in its leaves; the branches contain only pointers to the leaves. B + trees are optimized for fast insertion and removal of nodes. When retrieving xml, SQL Server builds a query plan that consists of both the relational and XML portions of the query. The XML portion is built using the primary XML index. Secondary indexes are chosen based on cost after the query is optimized. The Primary XML Index When the primary XML index is created, each xml column value is shredded into a relational representation of its Infoset and stored. The index itself is clustered on the column that contains the ordpath: a node labeling scheme that captures a document’s order and hierarchy, which allows for insertion of new nodes without node relabeling and provides efficient access to nodes, using range scans. 1 Let’s look at an example of how ordpaths work. Assume that some node is labeled 1.1. All nodes are initially labeled in document order during index creation, using odd numbers, allowing inserted nodes to be labeled with even numbers without changing the existing node labels. The original children of 1.1 would thus be labeled 1.1.1, 1.1.3, and so forth. Any children inserted after labeling would get an even number, such as 1.1.4. Each number in the ordpath represents a node, and each dot represents an edge of depth. To see the actual columns of our primary XML index, you can run the following query: SELECT * FROM sys.columns sc JOIN sys.indexes si ON si.object_id = sc.object_id AND si.name LIKE ‘PrimaryXmlIndex_BookXml’ AND si.type = 1 1. S. Pal, , S., I. Cseri, O. Seeliger, M. Rys, G. Schaller, W. Yu, D. Tomic, A. Baras, B. Berg, D. Churin, and E. Kogan. “XQuery Implementation in a Relational Database System,” in Proceedings of the 31st International Conference on Very Large Data Bases (VLDB 2005), 1175-1186. New York: ACM Press, 2005. ptg 1921 Indexing and Full-Text Indexing of xml Columns 47 Given the XML document used in Listings 47.23 and 47.24, the shredded rows for it in the index might look something like those shown in Table 47.1. The real index’s column names are underlined beside the conceptual names; conceptual names and values are supplied to make the table easy to understand. TABLE 47.1 Shredded Infoset Rows for the XML Instance in Listing 47.23 2 BookId (pk1) Ordpath (id) Tag (nid) NodeType (tid) Value (value) PathId (hid) 1 1 1 (book) 1 (Element) Null #1 1 1.1 2 (book_id) 2 (Attribute) 1 #2#1 1 1.3 3 (title) 1 ’A Great Work’ #3#1 1 1.5 4 (chapter) 1 Null #4#1 1 1.5.1 5 (chapter_id) 2 1 #5#1 1 1.5.3 6 (title) 1 ’An Excellent Chapter’ #6#4#1 1 1.5.5 7 (section) 1 Null #7#4#1 1 1.5.5.1 8 (id) 1 1 #8#4#1 1 1.5.5.3 9 (title) 1 ’A Boring Section’ #9#7#4#1 1 1.5.5.5 10 (paragraph) 1 ’Something Boring’ #10#7#4#1 1 1.5.5.5.1 11 (para_id) 2 1 #11#7#4#1 1 1.5.7 7 (section) 1 Null #7#4#1 1 1.5.7.1 8 (id) 2 2 #8#4#1 1 1.5.7.3 9 (title) 1 ’Another Fine Section’ #9#7#4#1 1 1.5.7.5 10 (paragraph) 1 ’Another Fine Paragraph’ #10#7#4#1 1 1.5.7.5.1 11 (para_id) 2 2 #11#7#4#1 2 S. Pal, I. Cseri, O. Seeliger, M. Rys, G. Schaller, W. Yu, D. Tomic, A. Baras, B. Berg, D. Churin, E. Kogan, “XQuery Implementation in a Relational Database System,” Proceedings of the 31st International Conference on Very Large Data Bases (VLDB 2005), ACM Press, New York (2005), pp. 1175-1186. ptg 1922 CHAPTER 47 Using XML in SQL Server 2008 The NodeType column holds an integer based on the Infoset type of the node. The Value column holds the value of the node (if any) or a pointer to that value. The Tag column holds a nonunique integer assigned to each Infoset item. These numbers repeat for similar items, as when a second section or para_id appears in the content. The PathId column is computed based on the path from the root to the current item. For example, the section element with Ordpath value 1.5.5 has the same Tag value as the section element with Ordpath value 1.5.7. When calculating PathId, SQL Server recognizes that the path from either section back to the root is the same. That is, from either section (Tag = 7), through chapter (Tag = 4), to book (Tag = 1), the path is the same: #7#4#1. The Tag and PathId values for these groups of rows are thus the same. Another way of looking at this is to consider that the XPath /book/chapter/section would return both section nodes, regardless of their text values or positions. The PathId value is stored with the path in reverse order for the purpose of optimizing when the descendant-or-self ( //) XPath axis is specified in the queries; in that case, only the final node names in a path such as //section/title are known. When XQuery queries are executed against the xml columns, they are translated into relational queries against this Infoset table. First, the primary key of the table (in this case, BookId) is scanned to find the group of rows that contain the nodes. Then the PathId and Value columns are used to find the matching paths and values requested in the XPath of the XQuery. When found, the resulting nodes are serialized up from the Infoset table and reassembled into XML. The Secondary XML Indexes Secondary XML indexes are useful when specific types of XQuery queries are run against the XML documents. The syntax for creating a secondary XML index is as follows: CREATE XML INDEX SecondaryXmlIndexName ON TableName(XmlColumnName) USING XML INDEX PrimaryXmlIndexName FOR ( PROPERTY | VALUE | PATH) Secondary XML indexes are dropped in the same way as primary XML indexes. The PATH Secondary XML Index Generally speaking, the PATH secondary index is useful when the bulk of your queries attempt to locate nodes via a simple path to the node (for example, /book/chapter/section/title ). At runtime, the XPath is translated to the value of PathId in the Infoset table, and then the matching PathId values are used to retrieve the unique Ordpath of the matching nodes. Note that Value is used secondarily to PathId in this type of index. The VALUE Secondary XML Index When many of the XPath queries to the XML are value based, meaning that the value of an element or attribute is specified in a predicate, a VALUE secondary index may improve seek times. In this case, the Value column of the Infoset table is primarily relied on during index searches, and then PathId. The following example shows how to use a value-based XQuery: SELECT BookXml.query(‘ /book[@book_id=1]/chapter[@chapter_id=1]//paragraph[contains(text()[1], “fine”)] ptg 1923 Indexing and Full-Text Indexing of xml Columns 47 ‘) FROM SimpleBook WHERE BookId = 1 go <paragraph para_id=”2”>Another fine paragraph.</paragraph> (1 row(s) affected) The PROPERTY Secondary XML Index When the XML in the xml column is used to encapsulate multiple properties of an object (for example, in an object serialization scenario) and these properties are often retrieved together, it may be useful to create a PROPERTY secondary index. For example, if your markup resembles DECLARE @objectXml xml SET @objectXml = ‘<object id=”111”> <name>MyObject</name> <value>Value 1</value> <coordinateX>24</coordinateX> <coordinateY>636</coordinateY> </object>’ and your XQuery queries often retrieve multiple values simultaneously, such as SELECT @objectXml.value(‘(/object/name)[1]’, ‘varchar(20)’) as OName, @objectXml.value(‘(/object/value)[1]’, ‘varchar(20)’) as OValue, @objectXml.value(‘(/object/coordinateX)[1]’, ‘int’) as X, @objectXml.value(‘(/object/coordinateY)[1]’, ‘int’) as Y WHERE @objectXml.exist(‘(/object[@id=111])[1]’) = 1 the PROPERTY index should help to optimize index seek time. The reason is that PROPERTY indexes rely primarily on the Value column of the index and secondarily on PathId. NOTE Every call to value() requires an additional SELECT statement against the Infoset table, so it’s important to try to index for this scenario, when applicable. XML Index Performance Considerations You know that indexing works well with untyped XML, but it actually works better with typed xml columns. When the XML is untyped, node values are stored internally as Unicode strings. Each time a value comparison must be made, those strings must typecast to the corresponding SQL for the XML type used in the XQuery. This type conversion must also be made for every possible value match in Infoset table, and this operation . XML in SQL Server 2008 TIP Both reference_node_expression of insert and new_expression of replace value of require a singleton to be matched in their XPath expressions; otherwise, SQL Server raises. default SQL Server installation. You can view them by calling DBCC USEROPTIONS in T -SQL. As with many other operations, indexes can be created both by using the dialogs in SSMS and also in T -SQL. . discussed earlier in this chapter, in the section “Selecting XML by Using query().” New to SQL Server 2008 is the capability to insert variable values of type xml using insert. What’s different

Định dạng
Số trang	10
Dung lượng	199,77 KB