Hướng dẫn học Microsoft SQL Server 2008 part 53 doc

Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 482 Part III Beyond Relational The system stored procedure xp_xml_preparedocument takes an optional fourth argument that accepts the namespace declarations. If the XML document contains namespace declarations, this parameter can be used to specify the namespaces declared in the XML document. The following example shows how to do this: DECLARE @hdoc INT DECLARE @xml VARCHAR(MAX) SET @xml =’ <itm:Items xmlns:itm="http://www.sqlserverbible.com/items"> <itm:Item ItemNumber="D001" Quantity="1" Price="900.0000" /> <itm:Item ItemNumber="Z001" Quantity="1" Price="200.0000" /> </itm:Items>’ Step 1: initialize XML Document Handle EXEC sp_xml_preparedocument @hdoc OUTPUT, @xml, ‘<itm:Items xmlns:itm="http://www.sqlserverbible.com/items"/>’ Step 2: Call OPENXML() SELECT * FROM OPENXML(@hdoc, ‘itm:Items/itm:Item’) WITH ( ItemNumber CHAR(4) ‘@ItemNumber’, Quantity INT ‘@Quantity’, Price MONEY ‘@Price’ ) Step 3: Free document handle exec sp_xml_removedocument @hdoc /* ItemNumber Quantity Price D001 1 900.00 Z001 1 200.00 */ Because OPENXML() needs a three-step process to shred each XML document, it is not suitable for set-based operations. It cannot be called from a scalar or table-valued function. If a table has an XML column, and a piece of information is to be extracted from more than one row, with OPENXML() a WHILE loop is needed. Row-by-row processing has significant overhead and will typically be much slower than a set-based operation. In such cases, XQuery will be a better choice over OPENXML(). Using OPENXML() may be expensive in terms of memory usage too. It uses the MSXML parser internally, using a COM invocation, which may not be cheap. A call to xp_xml_preparedocument parses the XML document and stores it in the internal cache of SQL Server. The MSXML parser uses one-eighth of the total memory available to SQL Server. Every document handle initialized by xp_xml_prepare document should be released by calling the xp_xml_releasedocument procedure to avoid memory leaks. 482 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 483 Manipulating XML Data 18 XSD and XML Schema Collections XSD (XML Schema Definition) is a W3C-recommended language for describing and validating XML documents. SQL Server supports a subset of the XSD specification and can validate XML documents against XSD schemas. SQL Server implements support for XSD schemas in the form of XML schema collections. An XML SCHEMA COLLECTION is a SQL Server database object just like tables or views. It can be created from an XML schema definition. Once a schema collection is created, it can be associated with an XML column or variable. An XML column or variable that is bound to a schema collection is called typed XML. SQL Server strictly validates typed XML documents when the value of the column or variable is modified either by an assignment operation or by an XML DML operation (insert/update/delete). Creating an XML Schema collection An XML schema collection can be created with CREATE XML SCHEMA COLLECTION statement. It creates a new XML schema collection with the specified name using the schema definition provided. The following example shows an XML schema that describes a customer information XML document and implements a number of validation rules: CREATE XML SCHEMA COLLECTION CustomerSchema AS ‘ <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Customer"> <xs:complexType> <xs:attribute name="CustomerID" use="required"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="1"/> <xs:maxInclusive value="9999"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="CustomerName" use="optional"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:maxLength value="40"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> </xs:schema>’ GO This schema defines a top-level element ˝ Customer˝ with two attributes: CustomerID and CustomerNumber. CustomerID attribute is set to mandatory by using the use attribute. A restriction of minimum value and maximum value is applied on the customerID attribute. The CustomerNumber attribute is set to optional by setting the use attribute to optional. A restriction is applied on the length of this attribute. 483 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 484 Part III Beyond Relational Creating typed XML columns and variables Once the schema collection is created, typed XML columns and variables can be created that are bound to the schema collection. The following example creates a typed XML variable: DECLARE @x XML(CustomerSchema) Similarly, a typed XML column can be created as follows: Create a table with a TYPED XML column CREATE TABLE TypedXML( ID INT, CustomerData XML(CustomerSchema)) Typed XML columns can be added to existing tables by using the ALTER TABLE ADD statement: add a new typed XML column ALTER TABLE TypedXML ADD Customer2 XML(CustomerSchema) Typed XML parameters can be used as input and output parameters of stored procedures. They can also be used as input parameters and return values of scalar functions. Performing validation When a value is assigned to a typed XML column or variable, SQL Server will perform all the validations defined in the schema collection against the new value being inserted or assigned. The insert/ assignment operation will succeed only if the validation succeeds. The following code generates an error because the value being assigned to the CustomerID attribute is outside the range defined for it: DECLARE @x XML(CustomerSchema) SELECT @x = ‘<Customer CustomerID="19909" CustomerName="Jacob"/>’ /* Msg 6926, Level 16, State 1, Line 2 XML Validation: Invalid simple type value: ‘19909’. Location: /*:Cus- tomer[1]/@*: CustomerID */ SQL Server will perform the same set of validations if a new value is being assigned or the existing value is modified by using XML DML operations (insert/update/delete). An existing untyped XML column can be changed to a typed XML column by using the ALTER TABLE ALTER COLUMN command. SQL Server will validate the XML values stored in each row for that column, and check if the values validate successfully against the schema collection being bound to the column. The ALTER COLUMN operation will succeed only if all the existing values are valid as per the rules defined in the schema collection. The same process happens if a typed XML column is altered and the column is bound to a different schema collection. The operation can succeed only if all the existing values are valid as per the rules defined in the new schema collection. 484 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 485 Manipulating XML Data 18 XML DOCUMENT and CONTENT A typed XML column or variable can accept two flavors of XML values: DOCUMENT and CONTENT. DOCUMENT is a complete XML document with a single top-level element. CONTENT usually is an XML fragment and can have more than one top-level element. Depending upon the requirement, a typed XML column or variable can be defined as DOCUMENT or CONTENT when it is bound with the schema collection. The following code snippet shows examples of XML variables declared as DOCUMENT and CONTENT. XML Document DECLARE @x XML(DOCUMENT CustomerSchema) SELECT @x = ‘<Customer CustomerID="1001" CustomerName="Jacob"/>’ XML Content DECLARE @x XML(CONTENT CustomerSchema) SELECT @x = ‘ <Customer CustomerID="1001" CustomerName="Jacob"/> <Customer CustomerID="1002" CustomerName="Steve"/>’ If a content model is not specified, SQL Server assumes CONTENT when creating the typed XML column or variable. Altering XML Schema collections There are times when you might need to alter the definition of a given schema collection. This can usually happen when the business requirement changes or you need to fix a missing or incorrect validation rule. However, altering schema collections is a big pain in SQL Server. Once created, the definition of a schema cannot be altered. The schema demonstrated earlier in this section defines customer name as an optional attribute. If the business requirement changes and this attribute has to be made mandatory, that will be a lot of work. Because the definition of a schema collection cannot be altered, if a new schema definition is wanted, the existing schema collection should be dropped by executing the DROP XML SCHEMA COLLECTION statement. Note that a schema collection cannot be dropped unless all the references are removed. All columns bound to the given schema collection should be dropped, changed to untyped XML, or altered and bound to another schema collection before dropping the schema collection. Similarly, any XML parameters or return values that refer to the schema collection in stored procedures or functions should be removed or altered as well. What’s in the ‘‘collection’’? An XML schema collection can contain multiple schema definitions. In most production use cases, it will likely have only one schema definition, but it is valid to have more than one schema definition in a single schema collection. 485 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 486 Part III Beyond Relational When a schema collection contains more than one schema definition, SQL Server will allow XML values that validate with any of the schema definitions available within the schema collection. For example, a feed aggregator that stores valid RSS and ATOM feeds in a single column can create a schema collection containing two schema definitions, one for RSS and one for ATOM. SQL Server will then allow both RSS and ATOM feeds to be stored in the given column. The following XML schema collection defines two top-level elements, Customer and Order: CREATE XML SCHEMA COLLECTION CustomerOrOrder AS ‘ <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Customer"> <xs:complexType> <xs:attribute name="CustomerID"/> <xs:attribute name="CustomerName"/> </xs:complexType> </xs:element> <xs:element name="Order"> <xs:complexType> <xs:attribute name="OrderID"/> <xs:attribute name="OrderNumber"/> </xs:complexType> </xs:element> </xs:schema>’ GO A typed XML column or variable b ound to this schema collection can store a Customer element, an Order element or both (if the XML column or variable is defined as CONTENT). The following sample code presents an example to demonstrate this. XML Document DECLARE @x XML(CustomerOrOrder) SELECT @x = ‘<Customer CustomerID="1001" CustomerName="Jacob"/>’ SELECT @x = ‘<Order OrderID="121" OrderNumber="10001"/>’ SELECT @x = ‘ <Customer CustomerID="1001" CustomerName="Jacob"/> <Order OrderID="121" OrderNumber="10001"/>’ A new schema definition can be added to an existing schema collection by using the ALTER XML SCHEMA COLLECTION ADD statement: ALTER XML SCHEMA COLLECTION CustomerOrOrder ADD ‘ <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Item"> <xs:complexType> <xs:attribute name="ItemID"/> <xs:attribute name="ItemNumber"/> </xs:complexType> </xs:element> </xs:schema>’ GO 486 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 487 Manipulating XML Data 18 Before creating or altering an XML schema collection, it is important to check whether a schema collection with the given name exists. XML schema collections are stored in a set of internal tables and are accessible through a number of system catalog views. Sys.xml_schema_collections can be queried to determine whether a given XML schema collection exists: IF EXISTS( SELECT name FROM sys.xml_schema_collections WHERE schema_id = schema_id(’dbo’) AND name = ‘CustomerSchema’ ) DROP XML SCHEMA COLLECTION CustomerSchema What’s new in SQL Server 2008 for XSD SQL Server 2008 added a number of enhancements to the XSD implementation of the previous version. The XSD implementation of date, time,anddateTime data types required time zone information in SQL Server 2005, so a date value should look like 2009-03-14Z or 2009-03-14+05:30,wherethe z and +05:30 indicates the time zone. This requirement has been removed in SQL Server 2008. The XSD processor now accepts date, time,anddateTime values with or without time zone information. Though the date, time,anddateTime data type implementation in SQL Server 2005 required time zone information, the XML document did not preserve it. It normalized the value to a UTC date/time value and stored it. SQL Server 2008 added enhancements to preserve the time zone information. If the date, time,ordateTime value contains time zone information, then SQL Server 2008 preserves it. Unions and lists are two powerful data models of XSD. Union types are simple types created with the union of two or more atomic types. List types are simple types that can store a space-delimited list of atomic values. Both of these types supported only atomic values in their implementation in SQL Server 2005. SQL Server 2008 enhanced these types so that lists of union types and unions of list types can be created. Lax validation is the most important addition to the XSD validation in SQL Server 2008. In SQL Server 2005, wildcard elements could be validated either with ‘‘skip’’ (does not validate at all) or ‘‘strict’’ (performs a strict, or full, validation). SQL Server 2008 added ‘‘lax’’ validation whereby the schema processor performs the validation only if the declaration for the target namespace is found in the schema collection. This chapter provides only a brief overview of the XSD implementation in SQL Server. Detailed coverage of all the XSD features is beyond the scope of this chapter. Understanding XML Indexes SQL Server does not allow a n XML column to be part of a regular index (SQL Index). To optimize queries that extract information from XML columns, SQL Server supports a special type of index called an XML index. The query processor can use an XML index to optimize XQuery, just like it uses SQL indexes to optimize SQL queries. SQL Server supports four different types of XML indexes. Each XML column can have a primary XML index and three different types of secondary XML indexes. 487 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 488 Part III Beyond Relational A primary XML index is a clustered index created in document order, on an internal table known as a node table. It contains information about a ll tags, paths, and values within the XML instance in each row. A primary XML index can be created only on a table that already has a clustered index on the primary key. The primary key of the base table is used to join the XQuery results with the base table. The primary XML index contains one row for each node in the XML instance. The query processor will use the primary XML index to execute every query, except for cases where the whole document is retrieved. Just like SQL indexes, XML indexes should be created and used wisely. The size of a primary XML index may be around three times the size of the XML data stored in the base table, although this may vary based on the structure of the XML document. Document order is important for XML, and the primary XML index is created in such a way that document order and the structural integrity of the XML document is maintained in the query result. If an XML column has a primary XML index, three additional types of secondary XML indexes can be created on the column. The additional index types are PROPERTY, VALUE,andPATH indexes. Based upon the specific query requirements, one or more of the index types may be used. Secondary indexes are non-clustered indexes created on the internal node table. A PATH XML index is created on the internal node table and indexes the path and value of each XML element and attribute. PATH indexes are good for operations in which nodes with specific values are fil- tered or selected. A PROPERTY XML index is created on the internal node table and contains the primary key of the table, the path to elements and attributes, and their values. The advantage of a PROPERTY XML index over a PATH XML index is that it helps to search multi-valued properties in the same XML instance. A VALUE XML index is just like the PATH XML index, and contains the value and path of each XML element and attribute (instead of path and value). VALUE indexes are helpful in cases where wildcards are used in the path expression. XML indexes are a great addition to the XML capabilities of SQL Server. Wise usage of XML indexes helps optimize queries that use XQuery to fetch information from XML columns. XML Best Practices SQL Server comes with a wide range of XML-related functionalities, and the correct usage of these functionalities is essential for building a good system. A feature may be deemed ‘‘good’’ only if it is applied on an area where it is really required. If not, it might result in unnecessary overhead or add unwanted complexity to an otherwise simpler task. ■ XML should be used only where it is really required. Using XML where relational data would best be suited is not a good idea. Similarly, using a relational model where XML might run better won’t produce the desired results. ■ XML is good for storing semi-structured or unstructured data. XML is a better choice if the physical order of values is significant and the data represents a hierarchy. If the values are valid XML documents and need to be queried, storing them on an XML column will be a better choice over VARCHAR, NVARCHAR,orVARBINARY columns. 488 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 489 Manipulating XML Data 18 ■ If the structure of the XML documents is defined, using typed XML columns will be a better choice. Typed XML columns provide better metadata information and allow SQL Server to optimize queries running over typed XML columns. Furthermore, typed XML provides storage optimization and static type checking. ■ Creating a primary XML index and a secondary XML index (or more, depending on the work- load) might help improve XQuery performance. An XML primary index usually uses up to three times the storage space than the data in the base table. This indicates that, just like SQL indexes, XML indexes also should be used wisely. Keep in mind that a full-text index can be created on an XML column. A wise combination of a full-text index with XML indexes might be a better choice in many situations. ■ Creating property tables to promote multi-valued properties from the XML column may b e a good idea in many cases. One or more property tables may be created from the data in an XML column, and these tables can be indexed to improve performance further. ■ Two common mistakes that add a lot of overhead to XQuery processing are usage of wildcards in the path expression and using a parent node accessor to read information from upper-level nodes. ■ Using specific markups instead of generic markups will enhance performance significantly. Generic markups do not perform well and do not allow XML index lookups to be done efficiently. ■ Attribute-centric markup is a better choice than element-centric markup. Processing information from attributes is much more efficient than processing information from elements. Attribute-centric markups take less storage space than element-centric markups, and the evaluation of predicates is more efficient because the attribute’s value is stored in the same row as its markup in the primary XML index. ■ An in-place update of the XML data type gives b etter performance in most cases. If the update operation requires modifying the value of one or more elements or attributes, it is a better practice to modify those elements and attributes using XML DML functions, rather than replace the whole document. ■ Using the exist() method to check for the existence of a value is much more efficient than using the value() method. Parameterizing XQuery and XML DML expressions is much more efficient than executing dynamic SQL statements. Summary SQL Server 2008 is fully equipped with a wide range of XML capabilities to support the XML processing requirements needed by almost every modern application. SQL Server 2008 added a number of enhancements to the XML features supported by previous versions. Key points to take away from this chapter include the following: ■ SQL Server 2008 is equipped with a number of XML processing capabilities, including support for generating, loading, querying, validating, modifying, and indexing XML documents. ■ The XML data type can be used to store XML documents. It supports the following methods: value(), exist(), query(), modify(),andnodes(). 489 www.getcoolebook.com Nielsen c18.tex V4 - 07/21/2009 1:01pm Page 490 Part III Beyond Relational ■ An XML data type column or variable that is associated with a schema collection is called typed XML. SQL Server validates typed XML columns and variables against the rules defined in the schema. ■ The OPENROWSET() function can be used with the BULK row set provider to load an XML document from a disk file. ■ XML output can be generated from the result of a SELECT query using FOR XML. FOR XML can be used with the AUTO, RAW, EXPLICIT,andPATH directives to achieve different levels of control over the structure and format of the XML output. ■ The query() method of the XML data type supports XQuery FLWOR operations, which allow complex restructuring and manipulation of XML documents. SQL Server 2008 added support for the let clause in FLWOR operations. ■ The XML data type supports XML DML operations through the modify() method. It allows performing insert, update, and delete operations on XML documents. ■ SQL Server 2008 added support for inserting an XML data type value into another XML document. ■ WITH XMLNAMESPACES can be used to process XML documents that have namespace declarations. ■ SQL Server supports XSD in the f orm of XML schema collections. SQL Server 2008 added a number of enhancements to the XSD support available with previous versions. These enhancements include full support for the date, time,anddateTime data types, support for lax validation, support for creating unions of list types and lists of union types, etc. ■ SQL Server supports a special category of indexes called XML indexes to index XML columns. A primary XML index and up to three secondary indexes ( PATH, VALUE,andPROPERTY) can be created on an XML column. 490 www.getcoolebook.com Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 491 Using Integrated Full-Text Search IN THIS CHAPTER Setting up full-text index catalogs with Management Studio or T-SQL code Maintaining full-text indexes Using full-text indexes in queries Performing fuzzy word searches Searching text stored in binary objects Full-text search performance S everal years ago I wrote a word search for a large database of legal texts. For word searches, the database parsed all the documents and built a word-frequency table as a many-to-many association between the word table and the document table. It worked well, and word searches became lightning-fast. As much fun as writing your own word search can be, fortunately, you have a choice. SQL Server includes a structured word/phrase indexing system called Full-Text Search. More than just a word parser, Full-Text Search actually performs linguis- tic analysis by determining base words and word boundaries, and by conjugating verbs for different languages. It runs circles around the simple word index system that I built. ANSI Standard SQL uses the LIKE operator to perform basic word searches and even wildcard searches. For example, the following code uses the LIKE operator to query the Aesop’s Fables sample database: USE Aesop; SELECT Title FROM Fable WHERE Fabletext LIKE ‘%Lion%’ AND Fabletext LIKE ‘%bold%’; Result: Title The Hunter and the Woodman The main problem with performing SQL Server WHERE LIKE searches is the slow performance. Indexes are searchable from the beginning of the word, 491 www.getcoolebook.com . in SQL Server 2008. In SQL Server 2005, wildcard elements could be validated either with ‘‘skip’’ (does not validate at all) or ‘‘strict’’ (performs a strict, or full, validation). SQL Server 2008. to xp_xml_preparedocument parses the XML document and stores it in the internal cache of SQL Server. The MSXML parser uses one-eighth of the total memory available to SQL Server. Every document handle. name = ‘CustomerSchema’ ) DROP XML SCHEMA COLLECTION CustomerSchema What’s new in SQL Server 2008 for XSD SQL Server 2008 added a number of enhancements to the XSD implementation of the previous

Định dạng
Số trang	10
Dung lượng	0,96 MB