Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 99 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
99
Dung lượng
1,54 MB
Nội dung
60 Part I ✦ Introducing XML While the first few elements provided opportunities to introduce you to the basic declarations and syntax of DTD documents, the rest of the DTD provides additional examples of DTD descriptions of XML documents. The next element that is defined in the DTD is the quote element, which is a child of the quotelist element. The quote element has two optional attributes, source and author. <!ELEMENT quote (#PCDATA)> <!ATTLIST quote source CDATA #IMPLIED author CDATA #IMPLIED > The catalog element must contain two elements in sequence, starting with amazon and ending with elcorteingles. The catalog element has one required attribute, called items, which contains a count of the items in the catalog: <!ELEMENT catalog (amazon, elcorteingles)> <!ATTLIST catalog items CDATA #REQUIRED > The amazon element contains one child element called product. The + cardinality operator indicates that there can be one or more product child elements under amazon: <!ELEMENT amazon (product+)> <!ATTLIST amazon items CDATA #REQUIRED > The elcorteingles element contains one child element called product. Because no cardinality operator is specified, there can be only one product child element under elcorteingles: <!ELEMENT elcorteingles (product)> <!ATTLIST elcorteingles items CDATA #REQUIRED > The next element declaration is a great example of the combination of the DTD ele- ment declaration, sequence and choice list operators, and cardinality operators working in concert to solve a tricky data validation problem. The XML document supports both English and Spanish translations in nested elements of the product element. Unfortunately, parsers have no way of automatically recognizing and translating the element names, so it’s up to the DTD developer to make sure that all possibilities in both formats are covered as part of the validation process. c538292 ch03.qxd 8/18/03 8:43 AM Page 60 61 Chapter 3 ✦ XML Data Format and Validation In this example, all elements that have English and Spanish translations are offered as choice lists components in a sequence list of nested elements under the product element. Each translation choice list is completed with the + cardinality operator outside of the braces that contain the list choices, which means that at least one instance of the element has to be present in one of the languages, and more instances are permissible. The Amazon.com product element also contains some nested elements that the elcorteingles product element does not. Those ele- ments have been listed in sequence and end with a ? cardinality operator, indicat- ing that the nested elements are optional, but if they are present they must be in the sequence specified in the listing. In summary, the product DTD element declara- tion enforces either an English product listing from Amazon.com, or a smaller Spanish listing from the elcorteingles.com Website. <!ELEMENT product (ranking?, (title | titulo)+, (asin | isbn)+, (author | autor)+, (image | imagen)+, small_image?, (list_price | precio)+, (release_date | fecha_de_publicación)+, (binding | Encuadernación)+, availability?, (tagged_url | librourl)+)> There is one optional attribute for the product element, called xml:lang. The lan- guage of the product element for the elcorteingles listing is defined by using the predefined xml:lang attribute. In the DTD this is represented by an optional attribute for the product: <!ATTLIST product xml:lang CDATA #IMPLIED > The rest of the elements have no children or attributes and are represented by PCDATA (Parsed Character Data) element declarations. Parent element declarations need these element declarations to be in the DTD. The PCDATA declaration indi- cates a text-only content model, which means that these elements can contain text and attributes but not nested elements. <!ELEMENT Encuadernación (#PCDATA)> <!ELEMENT asin (#PCDATA)> <!ELEMENT isbn (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT autor (#PCDATA)> <!ELEMENT availability (#PCDATA)> <!ELEMENT binding (#PCDATA)> <!ELEMENT fecha_de_publicación (#PCDATA)> <!ELEMENT image (#PCDATA)> <!ELEMENT imagen (#PCDATA)> <!ELEMENT librourl (#PCDATA)> <!ELEMENT list_price (#PCDATA)> <!ELEMENT precio (#PCDATA)> <!ELEMENT ranking (#PCDATA)> <!ELEMENT release_date (#PCDATA)> c538292 ch03.qxd 8/18/03 8:43 AM Page 61 62 Part I ✦ Introducing XML <!ELEMENT small_image (#PCDATA)> <!ELEMENT tagged_url (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT titulo (#PCDATA)> While DTDs are still in use and still often the data validation tool of choice for many XML developers, the W3C Schema promises, and in most cases, delivers, much more control over data validation than DTDs. In the next section of this chapter, I’ll introduce you to Schemas and show how Schemas are structured and validate XML data. W3C XML Schemas Schemas are an updated document format for XML data validation. Schemas can be less cryptic than DTDs, but consequently are much more verbose, and are much easier to grasp for XML developers than DTD syntax because Schemas are more closely based on XML syntax. Nested elements are represented by nested elements, and attributes are assigned explicitly as part of the element. Cardinality operators, attribute data types, and choice lists are replaced by element representations and attribute keywords, and there is much more control over data types. The XML Schema 1.0 is an official W3C Recommendation as of May 2001, and XML 1.1 is in the works at the W3C. More information can be found at http://www.w3.org/ TR/2001/REC-xmlschema-1-20010502. A good listing of Schema editors can be found on the XML.com Website at http://www.xml.com/pub/pt/2. Most of the Schema tools listed are free or have free trial downloads available. As with the DTD example earlier in this chapter, this Schema example is edited using Altova’s xmlspy (http://www.altova.com). I was also able to use xmlspy to translate the DTD used in the previous example to a Schema that almost worked. As with DTDs, xmlspy’s W3C Schema generator is probably the best on the market, but there was one crucial item that xmlspy missed in the DTD to Schema translation that had to be added manually, which I will get into later in this chapter. The point is that as with DTDs, developers still need to know something about Schemas structure if they want to make sure that the Schema generated is the best format possible for validating XML document data, or to fix a generated Schema if there is a problem with it. W3C Schema data types DTDs were developed as part of the original SGML specifications, and extended to describe HTML markup as well. They are great as a legacy data validation tool, but have several drawbacks when applied to modern XML documents. DTDs require that elements be text, nested elements, or a combination of nested elements and text. DTDs also have limited support for predefined data types. c538292 ch03.qxd 8/18/03 8:43 AM Page 62 63 Chapter 3 ✦ XML Data Format and Validation Schemas can support all of the DTD attribute data types (ID, IDREF, IDREFS, ENTITY, ENTITIES, NMTOKEN, NMTOKENS and NOTATION). CDATA, is replaced by the primitive string data type. Other data types can be used in a multitude of for- mats, as shown in Table 3-5 Table 3-5 Schema Data Types Name Base Type Description String String Primitive Any well-formed XML string normalizedString string Any well-formed XML string that also does not contain line feeds, carriage returns, or tabs. Token normalizedString Any well-formed XML string that does not contain line feeds, carriage returns, tabs, leading or trailing spaces, or more than one space. language token A valid language id, matching xml:lang format, which is usually International Organization of Standardization (ISO) 639 format. QName Primitive XML namespace qualified name (Qname). Name token A string based on well-formed element and tribute name rules. NCName name The part of a namespace name to the right of the namespace prefix and colon. Date date Primitive Date value in the format YYYY-MM-DD. time Primitive Time value in the format HH:MM:SS. dateTime Primitive Combined date and time value in the format YYYY-MM-DDT HH:MM:SS. gDay Primitive The day part of a date in the format DD. Also the national greeting of Australia. gMonth Primitive The month part of a date in the format MM. gMonthDay Primitive The month and day part of a date in the format MM-DD. Continued c538292 ch03.qxd 8/18/03 8:43 AM Page 63 64 Part I ✦ Introducing XML Table 3-5 (continued) Name Base Type Description gYear Primitive The month part of a date in the format YYYY. gYearMonth Primitive The year and month part of a date in the format YYYY-MM. duration Primitive Represents a time interval the ISO 8601 extended format P1Y1M1DT1H1M1S. This example represents one year, one month, one day, one hour, one minute, and one second. Numeric number Primitive Any numeric value up to 18 decimal places. decimal Primitive Any decimal value number. float Primitive Any 32-bit floating-point type real number. double Primitive Any 64-bit floating-point type real number. integer number Any integer. byte short Any signed 8-bit integer. short int Any signed 16-bit integer. int integer Any signed 32-bit integer. long integer Any signed 64-bit integer. unsignedByte integer Any unsigned 8-bit integer. unsignedShort unsignedInt Any unsigned 16-bit integer. unsignedInt unsignedLong Any unsigned 32-bit integer. unsignedLong nonNegativeInteger Any unsigned 64-bit integer. positiveInteger nonNegativeInteger Any integer with a value greater than 0. nonPositiveInteger integer Any integer with a value less than or equal to 0. negativeInteger nonPositiveInteger Any integer with a value less than 0. nonNegativeInteger integer Any integer with a value greater than or equal to 0. c538292 ch03.qxd 8/18/03 8:43 AM Page 64 65 Chapter 3 ✦ XML Data Format and Validation Name Base Type Description Other anyURI Primitive Represents a URI, and can contain any URL or URN. Boolean Primitive Standard binary logic, in the format of 1, 0, true, or false. hexBinary Primitive Hex-encoded binary data base64Binary Primitive Base64-encoded binary data. Primitive and derived data types can be extended to create new data types. Data types that extend existing data types are called user-derived data types. W3C Schema elements Data types are formatted as attributes in element declarations of Schema docu- ments, just as data types are usually defined by attributes in XML documents. Data types are contained in four types of elements: ✦ Element declarations: Describe an element in an XML document. ✦ Simple type definitions: Contain values in a single element, usually with attributes that define one of the primitive or derived W3C data types, but can contain user-derived data types as well. ✦ Complex type definitions: A series of nested elements with attributes that describe a complex XML document structure and primitive, derived, or user- derived data types. ✦ Attribute declarations: Elements that describe attributes and attributes that define a data type for the attribute. Element declarations, simple type definitions, complex type definitions, and attribute declarations are all defined by declaring one or more of the Schema ele- ments listed in Table 3-6 in a Schema document: c538292 ch03.qxd 8/18/03 8:43 AM Page 65 66 Part I ✦ Introducing XML Table 3-6 Schema Elements Element Description all Nested elements can appear in any order. Each child element is optional, and can occur no more than one time. annotation Schema comments. Contains appInfo and documentation. appInfo: Information for parsing and destination applications - must be a child of annotation. documentation: Schema text comments; must be a child of annotation. any Any type of well-formed XML can be nested under the any element, in any order. Same as the DTD <!ELEMENT element_name ANY > declaration. anyAttribute Any attributes composed of well-formed XML can be nested under the anyAttribute element, in any order. attribute An attribute. attributeGroup Reusable attribute group for complex type definitions. choice A list of choices, one of which must be chosen. Same as using the vertical bar character (|) in a DTD choice list. complexContent Definition of mixed content or elements in a complex type. complexType Complex type element. element Element element. extension Extends a simpleType or complexType. field An element or attribute that is referenced for a constraint. Similar to the DTD IDREF attribute data type, but uses an XPATH expression for the reference. group A group of elements for complex type definitions. import Imports external Schemas with different Namespaces. include Includes external Schemas with the same Namespace. key Defines a nested attribute or element as a unique key. Same as the DTD ID attribute data type. keyref Refers to a key element. Same as the DTD IDREF attribute data type. list A list of values in a simple type element. notation Defines the format of non-parsed data within an XML document. Same as the DTD NOTATION attribute data type. c538292 ch03.qxd 8/18/03 8:43 AM Page 66 67 Chapter 3 ✦ XML Data Format and Validation Element Description restriction Imposes restrictions on a simpleType, simpleContent, or a complexContent element. schema The root element of every W3C Schema document. selector Groups a set of elements for identity constraints using an XPath expression. sequence Specifies a strict order on child elements. Same as using the comma to separate nested elements in a DTD sequence. simpleContent Definition of text-only content in a simple type. simpleType Declares a simple type definition. union Groups simple types into a single union of values. unique Defines an element or an attribute as unique at a specified nesting level in the document. W3C Schema element and data type restrictions Aside from the elements listed in Table 3-6, there are several other types of ele- ments that define constraints on other elements in the Schema. Data type properties, including constraints, on simple data types, are called facets. Simple data types can be constrained by fundamental facets, which specify funda- mental constraints on the data type such as the order of display or the cardinality, much like using the DTD cardinality operators (+, ?, *), commas and vertical bar characters were used to predefine DTD element constraints. Constraining facets extend beyond predefined rules to control behavior based on Schema definitions. Table 3-7 shows a listing of W3C Schema fundamental facets that constrain simple data types. Table 3-7 Schema Element Restrictions Restriction Description choice A list of choices predefined in the Schema document. Same as the DTD enumeration for attribute list data types. fractionDigits Maximum decimal placed for a value. Integers are 0. length Number of characters, or for lists, number of list choices. Continued c538292 ch03.qxd 8/18/03 8:43 AM Page 67 68 Part I ✦ Introducing XML Table 3-7 (continued) Restriction Description maxExclusive Maximum up to, but not including the number specified. maxInclusive Maximum including the number specified. maxLength Maximum number of characters, or for lists, number of list choices. minExclusive Minimum down to, but not including the number specified. minInclusive Minimum including the number specified. minLength Minimum number of characters, or for lists, number of list choices. pattern Defines a pattern and sequence of acceptable characters. totalDigits Number of non-decimal, positive, non-zero digits. whiteSpace How line feeds, tabs, spaces, and carriage returns are treated when the document is parsed. A listing of which constraints apply to which simple data types can be found as part of the W3C Schema Recommendation at http://www.w3.org/TR/xmlschema-2. Namespaces and W3C Schemas One of the additional features of Schemas is the ability to handle XML namespaces as part of the Schema. One of the best examples of this is the XML Schema Schema. Schema namespaces and data types are defined by a Schema that is referenced by the root element of every W3C Schema. The namespace declaration looks like this: <xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”> The URL, http://www.w3.org/2001/XMLSchema, actually resolves to document that links to the Schema Schema. The Schema specifies the elements and data types used in the Schema. It also is a very long Schema document that includes embed- ded DTDS, imported and included external Schemas, and just about every type of Schema situation imaginable. This makes it a great start for finding working exam- ples of Schema structure and syntax. An example W3C Schema document Listing 3-2 shows the Schema that I will be using as an example for this chapter. The AmazonMacbethSpanish.xsd is referenced and validates the contents of the AmazonMacbethSpanishwithXSDref.xml document. c538292 ch03.qxd 8/18/03 8:43 AM Page 68 [...]...Chapter 3 ✦ XML Data Format and Validation Listing 3 -2: Contents of AmazonMacbethSpanish.xsd < ?xml version=”1.0” encoding=”UTF-8”?> 2 U (http://www.xmlspy.com) > ... didn’t recognize the xml: lang attribute until this line was added to the Schema: This imported the Schema from http://www.w3.org /20 00/10 /xml. xsd as part of the current Schema document This Schema defines the xml: lang, xml: space, and xml: base elements and prefix names For xml: lang, the declaration... AmazonMacbethSpanishwithXSDRef2.xsd, which should be in the same directory as the XML file to be validated by the Schema 71 72 Part I ✦ Introducing XML Schema structure and syntax The example Schema in Listing 3 -2 starts with an XML declaration that contains a comment that tells you that this Schema was generated using xmlspy Note that the Schema comment format is the same as the XML and DTD document comment formats: < ?xml. .. microsoft.com/xmlnotepad ✦ The IBM XML Viewer is great for viewing XML documents on non-Windows machines that support Java You can download it at http://alphaworks ibm.com/tech/xmlviewer It’s a simple tool very similar to XML Notepad but is better at handling more advanced XML such as namespaces The tradeoff is that it lacks the basic editing capabilities of the Microsoft XML Notepad 81 82 Part I ✦ Introducing XML. .. SAX 2 Most current parsers implement the SAX2 interfaces Unlike DOM 1 and 2, SAX 2 parsers are usually backward compatible with SAX 1 SAX 1 supports Navigation 83 84 Part I ✦ Introducing XML around a document and manipulation of content via SAX 1 events via the SAX 1 Parser class SAX 2 supports namespaces, filter chains, and querying and setting features and properties via SAX events via the SAX 2 XMLReader... Listing 4-1: A Very Simple XML Document < ?xml version=”1.0” encoding=”UTF-8”?> This is level 1 of the nested elements This is level 2 of the nested elements You can easily visualize XML document structures... Applying Schemas Referencing Schemas in XML documents is done via namespace declarations in the root element of the document: In this case, the namespace declaration reference to http://www.w3.org /20 01/ XMLSchema-instance resolves to an actual document at... 5 ✦ Simple API for XML (SAX) parsing breaks XML documents down into events in a SAX document representation These nodes and events, once identified, can be Parsing XML documents About XML parsers Tree parsers Event-driven parsers Document Object Model (DOM) Simple API for XML (SAX) DOM versus SAX: when to use what ✦ ✦ ✦ ✦ 80 Part I ✦ Introducing XML used to convert the original XML document elements... Part I ✦ Introducing XML About XML Parsers There are several XML parsers on the market, and a fairly complete listing of parsers can be found at http://www.xmlsoftware.com/parsers.html Of all the parsers on the market, three parsers stand out from the pack in terms of standards support and general marketplace acceptance: Apache Xerces, IBM XML4 J (XML for Java), and Microsoft’s MSXML parser All of these... of using Xerces in J2EE applications, please refer to Chapter 16 IBM’s XML4 J The IBM XML for Java (XML4 J) libraries, with some more recent help from the Apache Xerces project and Sun (via project Crimson), is the mother of all Javabased XML parsers, starting with version 1.0 in 1998 IBM and the Apache group work closely on XML document parsing technologies Consequently the IBM XML4 J libraries are based . The XML Schema 1.0 is an official W3C Recommendation as of May 20 01, and XML 1.1 is in the works at the W3C. More information can be found at http://www.w3.org/ TR /20 01/REC-xmlschema-1 -20 0105 02. A. by XMLSPY v5 rel. 2 U (http://www.xmlspy.com) > <xs:schema xmlns:xs=”http://www.w3.org /20 01/XMLSchema” elementFormDefault=”qualified”> <xs:import namespace=”http://www.w3.org /XML/ 1998/namespace”. is AmazonMacbethSpanishwithXSDRef2.xsd, which should be in the same directory as the XML file to be validated by the Schema. c53 829 2 ch03.qxd 8/18/03 8:43 AM Page 71 72 Part I ✦ Introducing XML Schema structure