1. Trang chủ
  2. » Công Nghệ Thông Tin

Beginning Regular Expressions 2005 phần 9 ppsx

78 262 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

PersonData.xsd. If you want to validate the XML document and the schema is in some other location, you will need to change the value of the xsi:noNamespaceSchemaLocation attribute appropriately: xsi:noNamespaceSchemaLocation=”C:\BRegExp\Ch24\PersonData.xsd” After XMLSpy has associated a W3C XML Schema document with an XML instance document, you can use XMLSpy to validate the XML instance document. The cursor in Figure 24-3 is hovering over the rele- vant toolbar button. Toward the bottom of Figure 24-3, you can see the message indicating that the docu- ment is valid according to the schema. You can similarly validate an XML instance document, PersonDataAssocSchema.xml, in Stylus Studio (shown in Figure 24-4) or XMLWriter (shown in Figure 24-5). The arrow cursor in each figure shows you the relevant toolbar button to validate an XML instance document. Figure 24-4 Whether you already have an XML editor or choose to use the trial downloads for XMLSpy, StylusStudio, or XMLWriter, you should now be in a position to validate an XML instance document against its schema. So you can now try out the examples in this chapter. 597 Regular Expressions in W3C XML Schema 27_574892 ch24.qxd 1/7/05 11:04 PM Page 597 Figure 24-5 How Constraints Are Expressed in W3C XML Schema In one sense, W3C XML Schema is all about applying constraints. One type of constraint is limiting how elements and attributes can be structured inside an XML instance document belonging to the class of XML documents to which the schema applies. Another aspect of W3C XML Schema constraining the content of a class of XML documents is in constraining the content allowed as the value contained in an element or attribute. Two kinds of types can exist as the content of an element: a complex type (indicated by an xs:complexType element in the schema) and a simple type (which may be indicated by an xs:simpleType element in the schema). This chapter focuses on constraining the values allowed in simple types in an XML instance document. 598 Chapter 24 27_574892 ch24.qxd 1/7/05 11:04 PM Page 598 W3C XML Schema Datatypes In the other uses of regular expressions you have seen in this book, the regular expression has been applied to a string value. In W3C XML Schema, it is possible to use regular expressions together with other datatypes. The following table summarizes the datatypes built into W3C XML Schema. Datatypes are shown as having the xs namespace prefix as an indication that they belong to the XML namespace http://www.w3.org/ 2001/XMLSchema . Datatypes can be viewed as primitive or derived built-in datatypes. Datatype Description xs:anyType Functions as the root of the type hierarchy. Types derived from xs:anyType can be a complex type or a simple type. xs:anySimpleType The base type for all simple types. xs:string A sequence of XML characters of finite length. xs:boolean Expresses the binary notion of true and false. xs:base64Binary Represents base-64 encoded binary data. xs:hexBinary Represents hexadecimal encoded binary data. xs:float Represents an IEEE single-precision 32-bit floating-point number. xs:decimal Represents arbitrary precision decimal numbers. xs:double Represents an IEEE double-precision 64-bit floating-point number. xs:anyURI Represents a Uniform Resource Identifier, whether absolute or rela- tive, and may include a fragment identifier. xs:QName An XML namespace-qualified name. xs:NOTATION Represents an XML 1.0 NOTATION. xs:duration Represents a duration with Gregorian year, month, day, hour, minute, and seconds components. xs:dateTime Represents a specific instant of time. xs:time Represents a specific instant of time that recurs every day. xs:date Represents a specified calendar day. xs:gYearMonth Represents the year and month parts of an xs:dateTime. xs:gMonthDay Represents a specified day of the year, such as September 25. xs:gDay Represents a specified day of the month, such as the 25th. xs:gMonth Represents a specified Gregorian calendar month. In addition to the datatypes already listed, there are datatypes derived, directly or indirectly, from the xs:string and xs:decimal datatypes. 599 Regular Expressions in W3C XML Schema 27_574892 ch24.qxd 1/7/05 11:04 PM Page 599 The following table summarizes the datatypes that are derived from xs:string. Derived Datatype Description xs:normalizedString The base type is xs:string. The xs:normalizedString type is the set of strings that does not contain the characters carriage return (#xD), linefeed (#xA), and tab (#x9). xs:token The base type is xs:string. This datatype is the set of strings that does not contain the linefeed ( #xA) or tab (#x9) characters, nor any leading or trailing space characters ( #x20) or any doubled internal space characters. xs:language The base type is xs:token. This datatype is the set of xs:token val- ues that are language identifiers in the XML 1.0 (second edition) specification. xs:Name The base type is xs:token. This datatype is the set of strings that are legal XML names, as defined in the XML 1.0 (second edition) specification. xs:NCName The base type is xs:Name. This datatype is the set of strings that are XML names but do not contain a colon character. xs:ID The base type is xs:NCName. This datatype represents values of ID type that are also NCNames. xs:IDREF The base type is xs:NCName. This datatype is the set of strings that represent values of type IDREF, which are NCNames. xs:IDREFS The item type is xs:IDREF. This datatype is a list of whitespace-sep- arated values, each of which is of type xs:IDREF. xs:NMTOKEN The base type is xs:token. This datatype is the set of xs:token val- ues that match the NMTOKEN definition in XML 1.0 (second edition). xs:NMTOKENS The item type is xs:NMTOKEN. This datatype is a list of whitespace- separated values, each of which is of type xs:NMTOKEN. xs:ENTITY The base type is xs:NCName. This datatype represents values that are of ENTITY type, as defined in the XML 1.0 (second edition) specification. xs:ENTITIES The item type is xs:ENTITY. This datatype is a list of whitespace- separated values, each of which is of type xs:ENTITY. The following table summarizes the built-in datatypes that are derived, directly or indirectly, from the xs:decimal datatype. 600 Chapter 24 27_574892 ch24.qxd 1/7/05 11:04 PM Page 600 Derived Datatype Description xs:integer The base type is xs:decimal. This datatype represents positive and negative integer values. xs:nonPositiveInteger The base type is xs:integer. This datatype represents negative inte- gers and zero. xs:negativeInteger The base type is xs:nonPositiveInteger. This datatype represents negative integers. xs:long The base type is xs:integer. This datatype represents integer val- ues from -9223372036854775808 to 9223372036854775807. xs:int The base type is xs:long. This datatype represents integer values from -2147483648 to 2147483647 inclusive. xs:short The base type is xs:int. This datatype represents integer values from -32768 to 32767 inclusive. xs:byte The base type is xs:short. This datatype represents integer values from -128 to 127 inclusive. xs:nonNegativeInteger The base type is xs:integer. This datatype represents integer val- ues that are positive integers and zero. xs:unsignedLong The base type is xs:nonNegativeInteger. This datatype represents integer values from 0 to 18446744073709551615. xs:unsignedInt The base type is xs:unsignedLong. This datatype represents integer values from 0 to 4294967295 inclusive. xs:unsignedShort The base type is xs:unsignedInt. This datatype represents integer values from 0 to 65535 inclusive. xs:unsignedByte The base type is xs:unsignedShort. This datatype represents inte- ger values from 0 to 255 inclusive. xs:positiveInteger The base type is xs:nonNegativeInteger. This datatype represents integer values of 1 and greater. Fuller details on how the built-in datatypes are specified can be found in XML Schema Part 2 at www.w3.org/TR/2001/REC-xmlschema-2-20010502, XML 1.0 (second edition) at www.w3.org/TR/ 2000/WD-xml-2e-20000814 , and Namespaces in XML at www.w3.org/TR/REC-xml-names. The programmer can develop custom types from these built-in types by any of the three mechanisms in the following list: ❑ Derivation by restriction — Values of an existing datatype are constrained by restricting the allowed values. ❑ Derivation by list — A list of values of a built-in or user-defined datatype. ❑ Derivation by union — The user-defined datatype is the union of two other datatypes (which can be built-in datatypes or user-defined datatypes). 601 Regular Expressions in W3C XML Schema 27_574892 ch24.qxd 1/7/05 11:04 PM Page 601 Derivation by Restriction When using W3C XML Schema, there are often several ways to specify a specific desired structure. Of the methods of derivation in the preceding list, derivation by restriction is the most commonly used. One method of restriction is to specify an enumeration. The following XML instance document, BookEnum.xml, is associated with a W3C XML Schema document that contains an enumeration: <?xml version=”1.0” encoding=”UTF-8”?> <Book xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:noNamespaceSchemaLocation=”C:\BRegExp\Ch24\BookEnum.xsd”> <Chapter number=”1”>Some content</Chapter> <Chapter number=”2”>Some content</Chapter> <Chapter number=”3”>Some content</Chapter> <Chapter number=”4”>Some content</Chapter> <Chapter number=”5”>Some content</Chapter> </Book> The associated W3C XML Schema document, BookEnum.xsd, created by XMLSpy, constrains the values of the number attribute of the Chapter element to be an enumeration of values from 1 through 5: <?xml version=”1.0” encoding=”UTF-8”?> <xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema” elementFormDefault=”qualified”> <xs:element name=”Book”> <xs:complexType> <xs:sequence> <xs:element ref=”Chapter” maxOccurs=”unbounded”/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name=”Chapter”> <xs:complexType> <xs:simpleContent> <xs:extension base=”xs:string”> <xs:attribute name=”number” use=”required”> <xs:simpleType> <xs:restriction base=”xs:NMTOKEN”> <xs:enumeration value=”1”/> <xs:enumeration value=”2”/> <xs:enumeration value=”3”/> <xs:enumeration value=”4”/> <xs:enumeration value=”5”/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:schema> 602 Chapter 24 27_574892 ch24.qxd 1/7/05 11:04 PM Page 602 The value of the number attribute is a simple type value. The schema document that XMLSpy creates uses the xs:NMTOKEN datatype, because the sample values of 1, 2, 3, 4, and 5 in the XML instance docu- ment allow for that datatype. However, the same constraint on values could be applied using the xs:pattern element as in BookPattern.xsd, shown here: <?xml version=”1.0” encoding=”UTF-8”?> <xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema” elementFormDefault=”qualified”> <xs:element name=”Book”> <xs:complexType> <xs:sequence> <xs:element ref=”Chapter” maxOccurs=”unbounded”/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name=”Chapter”> <xs:complexType> <xs:simpleContent> <xs:extension base=”xs:string”> <xs:attribute name=”number” use=”required”> <xs:simpleType> <xs:restriction base=”xs:NMTOKEN”> <xs:pattern value=”(1|2|3|4|5)” /> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:schema> An XML instance document associated with BookPattern.xsd is provided as BookPattern.xml in the code download. The only change from BookEnum.xml is that the xsi:noNamespaceSchemaLocation attribute points to the BookPattern.xsd file: <Book xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:noNamespaceSchemaLocation=”C:\BRegExp\Ch24\BookPattern.xsd”> The xs:pattern element is featured prominently in the remainder of this chapter, because it is the W3C XML Schema element that uses regular expressions. The value of the xs:pattern element’s value attribute is a regular expression pattern — hence, the name of the element. In the pattern shown in the preceding code listing, notice that the value of the value attribute is a fairly simple example of alternation, (1|2|3|4|5), which allows the value to be any one value of 1, 2, 3, 4, or 5. Before looking at the range of metacharacters supported in W3C XML Schema and how those metacharacters can be used, read about how Unicode is relevant to regular expressions in W3C XML Schema documents. 603 Regular Expressions in W3C XML Schema 27_574892 ch24.qxd 1/7/05 11:04 PM Page 603 Unicode and W3C XML Schema XML documents consist of sequences of Unicode characters. Unicode contains many thousands of char- acters. In reality, few, if any, applications can display all Unicode characters, and very few human beings could easily understand all Unicode characters. To make Unicode more manageable, the characters are divided into Unicode character classes and Unicode blocks. Each of these is discussed later in this section. Unicode Overview The Unicode Standard defines the universal character set. The aim of Unicode is to allow the interchange of text content across all the languages of planet Earth. Unicode specifies a text encoding for most charac- ters of most languages, as well as characters to assist in interoperability with older character encodings. The Windows Character Map utility provides a convenient way to examine the Unicode codes for many individual characters. Figure 24-6 shows the uppercase A selected. Notice in the lower part of the figure that uppercase A is U+0041. The number following the U and the + sign must consist of at least four numeric digits. The number is a sequence of hexadecimal digits. In this example, uppercase A is hexa- decimal 0041, which is 65 in decimal notation. Figure 24-6 Full information about Unicode is located at www.unicode.org. At the time of this writing, the current version of the Unicode Standard is version 4.0.1. Further information about the Unicode Standard is located at www.unicode.org/ standard/standard.html . 604 Chapter 24 27_574892 ch24.qxd 1/7/05 11:04 PM Page 604 In XML, uppercase A can also be written as &#x0041;. In most situations, it is simpler to express charac- ters commonly used in English literally. A Unicode character class indicates the type of usage for a set of characters— for example, lowercase let- ters. A Unicode character block indicates a language or other means of expression associated with that block of characters. Using Unicode Character Classes When using a Unicode character class in W3C XML Schema documents, the character class is specified as follows: \p{characterClass} The following table summarizes the Unicode character classes supported in W3C XML Schema. Unicode Character Class Description C Other characters Cc Control characters Cf Format characters Cn Unassigned code points L Letters Ll Lowercase letters Lm Modifier letters Ln Other letters Lt Title-case letters Lu Uppercase letters M All marks Mc Space-combining marks Me Enclosing marks Mn Nonspacing marks N Numbers Nd Decimal digits Nl Number letters No Other numbers P Punctuation Pc Connector punctuation Table continued on following page 605 Regular Expressions in W3C XML Schema 27_574892 ch24.qxd 1/7/05 11:04 PM Page 605 [...]... xsi:noNamespaceSchemaLocation=”C:\BRegExp\Ch24\PartNumbers.xsd”> A 99 BC 993 3 DEF88125 Z1 617 25 Regular Expressions in Java Java is a widely used programming language that can be used on a variety of platforms in addition to Windows Several packages written in or for Java support regular expression functionality However,... xsi:noNamespaceSchemaLocation=”C:\BRegExp\Ch24\Name2.xsd”> John Smith Alicia Manton Pierre Laval 616 Regular Expressions in W3C XML Schema Maria Von Trapp John James Manton 2 Specify a pattern using Unicode character classes that will match the following part numbers: ❑ A 99 ❑ BC 993 3 ❑ DEF88125 ❑ Z1 A sample document, PartNumbers.xml, is shown here for convenience: . indirectly, from the xs:string and xs:decimal datatypes. 599 Regular Expressions in W3C XML Schema 27_574 892 ch24.qxd 1/7/05 11:04 PM Page 599 The following table summarizes the datatypes that are. instance document. 598 Chapter 24 27_574 892 ch24.qxd 1/7/05 11:04 PM Page 598 W3C XML Schema Datatypes In the other uses of regular expressions you have seen in this book, the regular expression. So you can now try out the examples in this chapter. 597 Regular Expressions in W3C XML Schema 27_574 892 ch24.qxd 1/7/05 11:04 PM Page 597 Figure 24-5 How Constraints Are Expressed in W3C XML

Ngày đăng: 13/08/2014, 12:21

TỪ KHÓA LIÊN QUAN