Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 191 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
191
Dung lượng
1,53 MB
Nội dung
Appendix A Creating Markup with XML 1621 In Fig. A.5, two distinct file elements are differentiated using namespaces. Lines 6– 7 use the XML namespace keyword xmlns to create two namespace prefixes: text and image. The values assigned to attributes xmlns:text and xmlns:image are called Uniform Resource Identifiers (URIs). By definition, a URI is a series of characters used to differentiate names. To ensure that a namespace is unique, the document author must provide a unique URI. Here, we use the text urn:deitel:textInfo and urn:deitel:imageInfo as URIs. A common practice is to use Universal Resource Locators (URLs) for URIs, because the domain names (e.g., deitel.com) used in URLs are guaranteed to be unique. For example, lines 6–7 could have been written as <directory xmlns:text = "http://www.deitel.com/xmlns-text" xmlns:image = "http://www.deitel.com/xmlns-image"> where we use URLs related to the Deitel & Associates, Inc., domain name (www.dei- tel.com). These URLs are never visited by the parser—they only represent a series of characters for differentiating names and nothing more. The URLs need not even exist or be properly formed. Lines 9–11 use the namespace prefix text to describe elements file and descrip- tion. Notice that end tags have the namespace prefix text applied to them as well. Lines 13–16 apply namespace prefix image to elements file, description and size. 9 <text:file filename = "book.xml"> 10 <text:description>A book list</text:description> 11 </text:file> 12 13 <image:file filename = "funny.jpg"> 14 <image:description>A funny picture</image:description> 15 <image:size width = "200" height = "100"/> 16 </image:file> 17 18 </text:directory> <?xml version="1.0" encoding="UTF-8"?> <! Fig. A.5 : namespace.xml > <! Namespaces > <text:directory xmlns:text="urn:deitel:textInfo" xmlns:image="urn:dei- tel:imageInfo"> <text:file filename="book.xml"> <text:description>A book list</text:description> </text:file> <image:file filename="funny.jpg"> <image:description>A funny picture</image:description> <image:size width="200" height="100"/> </image:file> </text:directory> Fig. A.5 Fig. A.5Fig. A.5 Fig. A.5 Demonstrating XML namespaces (part 2 of 2). 1622 Creating Markup with XML Appendix A To eliminate the need to place a namespace prefix in each element, authors may specify a default namespace for an element and all of its child elements. Figure A.6 dem- onstrates the use of default namespaces. We declare a default namespace using the xmlns attribute with a URI as its value (line 6). Once this default namespace is in place, child elements that are part of the namespace do not need a namespace prefix. Element file (line 9) is in the namespace corresponding to the URI urn:deitel:textInfo. Compare this usage with that in Fig. A.5, where we prefixed the file and description elements with the namespace prefix text (lines 9–11). The default namespace applies to all elements contained in the directory element. However, we may use a namespace prefix to specify a different namespace for particular 1 <?xml version = "1.0"?> 2 3 <! Fig. A.6 : defaultnamespace.xml > 4 <! Using Default Namespaces > 5 6 <directory xmlns = "urn:deitel:textInfo" 7 xmlns:image = "urn:deitel:imageInfo"> 8 9 <file filename = "book.xml"> 10 <description>A book list</description> 11 </file> 12 13 <image:file filename = "funny.jpg"> 14 <image:description>A funny picture</image:description> 15 <image:size width = "200" height = "100"/> 16 </image:file> 17 18 </directory> C:\>java -jar ParserTest.jar defaultnamespace.xml <?xml version="1.0" encoding="UTF-8"?> <! Fig. A.6 : defaultnamespace.xml > <! Using Default Namespaces > <directory xmlns="urn:deitel:textInfo" xmlns:image="urn:deitel:image- Info"> <file filename="book.xml"> <description>A book list</description> </file> <image:file filename="funny.jpg"> <image:description>A funny picture</image:description> <image:size width="200" height="100"/> </image:file> </directory> Fig. A.6 Fig. A.6Fig. A.6 Fig. A.6 Using default namespaces. Appendix A Creating Markup with XML 1623 elements. For example, the file element on line 13 uses the prefix image to indicate that the element is in the namespace corresponding to the URI urn:deitel:imageInfo. A.7 Internet and World Wide Web Resources www.w3.org/XML Worldwide Web Consortium Extensible Markup Language home page. Contains links to related XML technologies, recommended books, a time-line for publications, developer discussions, transla- tions, software, etc. www.w3.org/Addressing Worldwide Web Consortium addressing home page. Contains information on URIs and links to other resources. www.xml.com This is one of the most popular XML sites on the Web. It has resources and links relating to all aspects of XML, including articles, news, seminar information, tools, Frequently Asked Questions (FAQs), etc. www.xml.org “The XML Industry Portal” is another popular XML site that includes links to many different XML resources, such as news, FAQs and descriptions of XML-derived markup languages. www.oasis-open.org/cover Oasis XML Cover Pages home page is a comprehensive reference for many aspects of XML and its related technologies. The site includes links to news, articles, software and events. html.about.com/compute/html/cs/xmlandjava/index.htm This site contains articles about XML and Java and is updated regularly. www.w3schools.com/xml Contains a tutorial that introduces the reader to the major aspects of XML. The tutorial contains many examples. java.sun.com/xml Home page of the Sun’s JAXP and parser technology. SUMMARY • XML is a technology for creating markup languages to describe data of virtually any type in a structured manner. • XML allows document authors to describe data precisely by creating their own tags. Markup lan- guages can be created using XML for describing almost anything. • XML documents are commonly stored in text files that end in the extension .xml . Any text editor can be used to create an XML document. Many software packages allow data to be saved as XML documents. • The XML declaration specifies the version to which the document conforms. • All XML documents must have exactly one root element that contains all of the other elements. • To process an XML document, a software program called an XML parser is required. The XML parser reads the XML document, checks its syntax, reports any errors and allows access to the doc- ument’s contents. • An XML document is considered well formed if it is syntactically correct (i.e., the parser did not report any errors due to missing tags, overlapping tags, etc.). Every XML document must be well formed. 1624 Creating Markup with XML Appendix A • Parsers may or may not support the Document Object Model (DOM) and/or the Simple API for XML (SAX) for accessing a document’s content programmatically by using languages such as Ja- va, Python and C. • XML documents may contain: carriage return, the line feed and Unicode characters. Unicode is a standard that was released by the Unicode Consortium in 1991 to expand character representation for most of the world’s major languages. The American Standard Code for Information Inter- change (ASCII) is a subset of Unicode. • Markup text is enclosed in angle brackets (i.e., < and >). Character data are the text between a start tag and an end tag. Child elements are considered markup—not character data. • Spaces, tabs, line feeds and carriage returns are whitespace characters. In an XML document, the parser considers whitespace characters to be either significant (i.e., preserved by the parser) or in- significant (i.e., not preserved by the parser). • Almost any character may be used in an XML document. However, the characters ampersand (&) and left-angle bracket (<) are reserved in XML and may not be used in character data, except in CDATA sections. Angle brackets are reserved for delimiting markup tags. The ampersand is re- served for delimiting hexadecimal values that refer to a specific Unicode character. These expres- sions are terminated with a semicolon (;) and are called entity references. The apostrophe and double-quote characters are reserved for delimiting attribute values. • XML provides built-in entities for ampersand (&), left-angle bracket (<), right-angle bracket (>), apostrophe (') and quotation mark ("). • All XML start tags must have a corresponding end tag and all start- and end tags must be properly nested. XML is case sensitive, therefore start tags and end tags must have matching capitalization. • Elements define a structure. An element may or may not contain content (i.e., child elements or character data). Attributes describe elements. An element may have zero, one or more attributes associated with it. Attributes are nested within the element’s start tag. Attribute values are en- closed in quotes—either single or double. • XML element and attribute names can be of any length and may contain letters, digits, under- scores, hyphens and periods; and they must begin with either a letter or an underscore. • A processing instruction’s (PI’s) information is passed by the parser to the application using the XML document. Document authors may create their own processing instructions. Almost any name may be used for a PI target except the reserved word xml (in any mixture of case). Process- ing instructions allow document authors to embed application-specific data within an XML docu- ment. This data are not intended to be readable by humans, but readable by applications. • CDATA sections may contain text, reserved characters (e.g., <), words and whitespace characters. XML parsers do not process the text in CDATA sections. CDATA sections allow the document au- thor to include data that is not intended to be parsed. CDATA sections cannot contain the text ]]>. • Because document authors can create their own tags, naming collisions (e.g., conflicts that arise when document authors use the same names for elements) can occur. Namespaces provide a means for document authors to prevent naming collisions. Document authors create their own namespac- es. Virtually any name may be used for a namespace, except the reserved namespace xml. • A Universal Resource Identifier (URI) is a series of characters used to differentiate names. URIs are used with namespaces. TERMINOLOGY <![CDATA[ and ]]> to delimit a CDATA section ampersand (&) angle brackets (< and >) <? and ?> to delimit a processing instruction apostrophe (') Appendix A Creating Markup with XML 1625 SELF-REVIEW EXERCISES A.1 State whether the following are true or false. If false, explain why. a) XML is a technology for creating markup languages. b) XML markup text is delimited by forward and backward slashes (/ and \). c) All XML start tags must have corresponding end tags. d) Parsers check an XML document’s syntax and may support the Document Object Model and/or the Simple API for XML. e) An XML document is considered well formed if it contains whitespace characters. f) SAX-based parsers process XML documents and generate events when tags, text, com- ments, etc., are encountered. g) When creating new XML tags, document authors must use the set of XML tags provided by the W3C. h) The pound character (#), the dollar sign ($), the ampersand (&), the greater-than symbol (>) and the less-than symbol (<) are examples of XML reserved characters. i) Any text file is automatically considered to be an XML document by a parser. A.2 Fill in the blanks in each of the following statements: a) A/An processes an XML document. b) Valid characters that can be used in an XML document are the carriage return, line feed and characters. c) An entity reference must be proceeded by a/an character. d) A/An is delimited by <? and ?>. e) Text in a/an section is not parsed. application parser ASCII (American Standard Code for Information Interchange) PI target PI value attribute processing instruction (PI) built-in entity quotation mark (") CDATA section reserved character character data reserved keyword child reserved namespace child element right angle bracket (>) comment root element container element SAX-based parser content significant whitespace character element Simple API for XML (SAX) empty element start tag end tag structured data entity references tree structure of an XML document insignificant whitespace character Unicode Java API for XML Parsing (JAXP) Unicode Consortium left angle bracket (<) Universal Resource Identifier (URI) markup language XML markup text XML declaration namespace XML document namespace prefix XML namespace namespace xml naming collision XML parser XML processor node XML version 1626 Creating Markup with XML Appendix A f) An XML document is considered if it is syntactically correct. g) help document authors prevent element-naming collisions. h) A/An tag does not contain character data. i) The built-entity for the ampersand is . A.3 Identify and correct the error(s) in each of the following: a) <my Tag>This is my custom markup<my Tag> b) <!PI value!> <! a sample processing instruction > c) <myXML>I know XML!!!</MyXML> d) <CDATA>This is a CDATA section.</CDATA> e) <xml>x < 5 && x > y</xml> <! mark up a Java condition **> ANSWERS TO SELF-REVIEW EXERCISES A.4 a)True. b) False. In an XML document, markup text is any text delimited by angle brack- ets (< and >), with a forward slash being used in the end tag. c) True. d) True. e) False. An XML document is considered well formed if it is parsed successfully. f) True. g) False. When creating new tags, programmers may use any valid name except the reserved word xml (in any mixture of case). h) False. XML reserved characters include the ampersand (&) and the left angle bracket (<), but not the right-angle bracket (>), # and $. i) False. The text file must be parsable by an XML parser. If pars- ing fails, the document cannot be considered an XML document. A.5 a) parser. b) Unicode. c) ampersand (&). d) processing instruction. e) CDATA. f) well formed. g) namespaces. h) empty. i) &. A.6 a) Element name my tag contains a space. The forward slash, /, is missing in the end tag. The corrected markup is <myTag>This is my custom markup</myTag> b) Incorrect delimiters for a processing instruction. The corrected markup is <?PI value?> <! a sample processing instruction > c) Incorrect mixture of case in end tag. The corrected markup is <myXML>I know XML!!!</myXML> or <MyXML>I know XML!!!</MyXML> d) Incorrect syntax for a CDATA section. The corrected markup is <![CDATA[This is a CDATA section.]]> e) The name xml is reserved and cannot be used as an element. The characters <, & and > must be represented using entities. The closing comment delimiter should be two hy- phens—not two stars. Corrected markup is <someName>x < 5 && x > y</someName> <! mark up a Java condition > B Document Type Definition (DTD) Objectives • To understand what a DTD is. • To be able to write DTDs. • To be able to declare elements and attributes in a DTD. • To understand the difference between general entities and parameter entities. • To be able to use conditional sections with entities. • To be able to use NOTATIONs. • To understand how an XML document’s whitespace is processed. To whom nothing is given, of him can nothing be required. Henry Fielding Like everything metaphysical, the harmony between thought and reality is to be found in the grammar of the language. Ludwig Wittgenstein Grammar, which knows how to control even kings. Molière 1628 Document Type Definition (DTD) Appendix B B.1 Introduction In this appendix, we discuss Document Type Definitions (DTDs), which define an XML document’s structure (e.g., what elements, attributes, etc. are permitted in the document). An XML document is not required to have a corresponding DTD. However, DTDs are of- ten recommended to ensure document conformity, especially in business-to-business (B2B) transactions, where XML documents are exchanged. DTDs specify an XML docu- ment’s structure and are themselves defined using EBNF (Extended Backus-Naur Form) grammar—not the XML syntax introduced in Appendix A. B.2 Parsers, Well-Formed and Valid XML Documents Parsers are generally classified as validating or nonvalidating. A validating parser is able to read a DTD and determine whether the XML document conforms to it. If the document conforms to the DTD, it is referred to as valid. If the document fails to conform to the DTD but is syntactically correct, it is well formed, but not valid. By definition, a valid document is well formed. A nonvalidating parser is able to read the DTD, but cannot check the document against the DTD for conformity. If the document is syntactically correct, it is well formed. In this appendix, we use a Java program we created to check a document conformance. This program, named Validator.jar, is located in the Appendix B examples directory. Validator.jar uses the reference implementation for the Java API for XML Pro- cessing 1.1, which requires crimson.jar and jaxp.jar. Outline B.1 Introduction B.2 Parsers, Well-Formed and Valid XML Documents B.3 Document Type Declaration B.4 Element Type Declarations B.4.1 Sequences, Pipe Characters and Occurrence Indicators B.4.2 EMPTY, Mixed Content and ANY B.5 Attribute Declarations B.6 Attribute Types B.6.1 Tokenized Attribute Type (ID, IDREF, ENTITY, NMTOKEN) B.6.2 Enumerated Attribute Types B.7 Conditional Sections B.8 Whitespace Characters B.9 Internet and World Wide Web Resources Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises Appendix B Document Type Definition (DTD) 1629 B.3 Document Type Declaration DTDs are introduced into XML documents using the document type declaration (i.e., DOCTYPE). A document type declaration is placed in the XML document’s prolog (i.e., all lines preceding the root element), begins with <!DOCTYPE and ends with >. The docu- ment type declaration can point to declarations that are outside the XML document (called the external subset) or can contain the declaration inside the document (called the internal subset). For example, an internal subset might look like <!DOCTYPE myMessage [ <!ELEMENT myMessage ( #PCDATA )> ]> The first myMessage is the name of the document type declaration. Anything inside the square brackets ([]) constitutes the internal subset. As we will see momentarily, ELE- MENT and #PCDATA are used in “element declarations.” External subsets physically exist in a different file that typically ends with the.dtd extension, although this file extension is not required. External subsets are specified using either keyword the keyword SYSTEM or the keyword PUBLIC. For example, the DOC- TYPE external subset might look like <!DOCTYPE myMessage SYSTEM "myDTD.dtd"> which points to the myDTD.dtd document. The PUBLIC keyword indicates that the DTD is widely used (e.g., the DTD for HTML documents). The DTD may be made available in well-known locations for more efficient downloading. We used such a DTD in Chapters 9 and 10 when we created XHTML documents. The DOCTYPE <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> uses the PUBLIC keyword to reference the well-known DTD for XHTML version 1.0. XML parsers that do not have a local copy of the DTD may use the URL provided to down- load the DTD to perform validation. Both the internal and external subset may be specified at the same time. For example, the DOCTYPE <!DOCTYPE myMessage SYSTEM "myDTD.dtd" [ <!ELEMENT myElement ( #PCDATA )> ]> contains declarations from the myDTD.dtd document, as well as an internal declaration. Software Engineering Observation B.1 The document type declaration’s internal subset plus its external subset form the DTD. 0.0 Software Engineering Observation B.2 The internal subset is visible only within the document in which it resides. Other external documents cannot be validated against it. DTDs that are used by many documents should be placed in the external subset. 0.0 1630 Document Type Definition (DTD) Appendix B B.4 Element Type Declarations Elements are the primary building blocks used in XML documents and are declared in a DTD with element type declarations ( ELEMENTs). For example, to declare element myMessage, we might write <!ELEMENT myElement ( #PCDATA )> The element name (e.g., myElement) that follows ELEMENT is often called a generic identifier. The set of parentheses that follow the element name specify the element’s al- lowed content and is called the content specification. Keyword PCDATA specifies that the element must contain parsable character data. These data will be parsed by the XML pars- er, therefore any markup text (i.e., <, >, &, etc.) will be treated as markup. We will discuss the content specification in detail momentarily. Common Programming Error B.1 Attempting to use the same element name in multiple element type declarations is an error.0.0 Figure B.1 lists an XML document that contains a reference to an external DTD in the DOCTYPE. We use Validator.jar to check the document’s conformity against its DTD. The document type declaration (line 6) specifies the name of the root element as MyMessage. The element myMessage (lines 8–10) contains a single child element named message (line 9). Line 3 of the DTD (Fig. B.2) declares element myMessage. Notice that the content specification contains the name message. This indicates that element myMessage con- tains exactly one child element named message. Because myMessage can have only an element as its content, it is said to have element content. Line 4, declares element message whose content is of type PCDATA. Common Programming Error B.2 Having a root element name other than the name specified in the document type declaration is an error. 0.0 If an XML document’s structure is inconsistent with its corresponding DTD, but is syntactically correct, the document is only well formed—not valid. Figure B.3 shows the messages generated when the required message element is omitted. 1 <?xml version = "1.0"?> 2 3 <! Fig. B.1: welcome.xml > 4 <! Using an external subset > 5 6 <!DOCTYPE myMessage SYSTEM "welcome.dtd"> 7 8 <myMessage> 9 <message>Welcome to XML!</message> 10 </myMessage> Fig. B.1 Fig. B.1Fig. B.1 Fig. B.1 XML document declaring its associated DTD. [...]... that assigns shippedBy (line 32) the value "bug" No shipID attribute has a value "bug", which results in a invalid XML document 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Java core libraries import java. io.*; // Java standard extensions import javax.xml.parsers.*; // third-party libraries import org.w3c.dom.*; import org.xml.sax.*; public class XMLInfo { Fig C .2 XMLInfo displays information about XML input (part 1 of 3) Document Object Model (DOM™) Appendix C 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46... declared in the DTD as an NMTOKEN and an enumeration, respectively Both these attributes are normalized by the parser 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 lists an XML document that demonstrates the use of entities and entity attribute types 1 2 Fig B.9 Error displayed when an invalid ID is referenced (part 1 of 2) Appendix B 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Document Type Definition (DTD) 1641 2 to 4 days 1 day Java How to Program 4th edition XML How to Program C++ How to Program 3rd edition... Java How to Program 4th edition C How to Program 3rd edition C++ How to Program 3rd edition C:\ >java -jar Validator.jar invalid-IDExample.xml error: No element has an ID attribute with value "bug" Fig B.9 Error displayed when an invalid ID is referenced (part 2 of 2) Line 7 declares a notation... isbnXML "0-13- 028 417-3"> ]> 2 to 4 days 1 day Java How to Program 4th edition XML How to Program ... Object Model, we begin with a simple example that uses Java This example takes an XML document (Fig C.1) that marks up an article and uses the JAXP API to display the document’s element names and values Figure C .2 lists the Java code that manipulates this XML document and displays its content 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 20 01 Parent node of date is: article Fig C .2 XMLInfo displays information about XML input (part 3 of 3) Lines 7– 12 import several packages related to XML Package javax.xml.parsers provides classes related to parsing an XML document Package org.w3c.dom provides the DOM-API programmatic interface (i.e., classes, methods, etc.) Lines 27 28 create a new DocumentBuilderFactory The DocumentBuilderFactory... entity is placed to the right of NDATA Line 11 declares attribute tour for element company Attribute tour specifies a required ENTITY attribute type Line 16 assigns entity city to attribute tour If we replaced line 16 with 16 42 Document Type Definition (DTD) Appendix B the document fails to conform to the DTD because entity country does not exist Figure B.11 shows the error . <bookstore> 19 <shipping shipID = "bug2bug"> 20 <duration> ;2 to 4 days</duration> 21 </shipping> 22 23 <shipping shipID = "Deitel"> 24 <duration>1. <duration>1 day</duration> 25 </shipping> 26 27 <book shippedBy = "Deitel" isbn = "&isbnJava;"> 28 Java How to Program 4th edition. 29 </book> 30 31 <book. "&isbnXML;"> 32 XML How to Program. 33 </book> 34 35 <book shippedBy = "bug2bug" isbn = "&isbnCPP;"> 36 C++ How to Program 3rd edition. 37 </book> 38 </bookstore> C:>java