1. Trang chủ
  2. » Công Nghệ Thông Tin

XML in 60 Minutes a Day phần 3 ppt

72 245 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 72
Dung lượng 9,08 MB

Nội dung

Review Questions 1. What is the difference between an application and an XML application? 2. What are the names of the four basic operators in a validating parser? 3. What are the two most fundamental components of an XML document? 4. Match the following: a. Comments i. Speak to the application b. Processing instructions ii. Speak to the parser c. Document type declarations iii. Speak to human beings 5. What are the two types of empty elements? 6. What is the difference between attributes and pseudo-attributes? 7. What are the components of a qualified name resulting from a prefix namespace declaration? 8. Which namespace declaration “turns off” previous namespace declarations? a. Prefix b. Empty string c. Default d. None of the above 9. General entity references deal with entities used for constructing _______________________, while parameter entity references deal with entities used for constructing __________________________. 10. What are the five characters reserved for markup characters in XML, and what are their corresponding predefined entities? 11. What are the six W3C well-formedness constraints? 12. What is the definition of a valid XML document? 114 Chapter 3 422541 Ch03.qxd 6/19/03 10:09 AM Page 114 Answers to Review Questions 1. Used alone, the term application means a program or group of programs intended for end users and designed to access and manipulate XML documents. An XML applica- tion is one of several terms used to refer to a derivative markup language created according to XML 1.0. 2. The four basic operators in a validating parser are a content handler, an error handler, a DTD and schema handler, and an entity resolver. 3. The two most fundamental components of an XML document are the prolog and the data instance. 4. a. and iii.; b. and i.; c. and ii. 5. Those that are termed declared empty and those that are termed elements with no content. 6. Attributes appear in the data instance component within the start tags of elements. They provide additional description of an element or its data. Pseudo-attributes look similar to attributes but appear in declarations or instructions in the prolog component. Their descriptions pertain to a whole document. 7. The components are the prefix, the colon delimiter, and the local part of the name. 8. b. There are two considerations here. As discussed in the text, the latest namespace declaration overrides previous namespace declarations. Also, when an empty string is specified as a prefix, the subsequent relevant names only need the local part to qualify as universal names; they don’t need qualifying URLs. The effect is to “shut off” name- space declarations for the extent that the empty string namespace is in effect. 9. General entity references deal with entities used for constructing XML documents, while parameter entity references deal with entities used for constructing DTDs or schemas. 10. The five reserved characters and their predefined entities are as follows: a. The left angle bracket, or less-than symbol (<); its entity is &lt; b. The right angle bracket, or greater-than symbol (>); its entity is &gt; c. The quotation mark (“); its entity is &quot; d. The apostrophe (‘); its entity is &apos; e. The ampersand (&); its entity is &amp; Anatomy of an XML Document 115 422541 Ch03.qxd 6/19/03 10:09 AM Page 115 11. The six well-formedness constraints are as follows: a. An XML document must contain at least one element. b. Each parsed entity referenced directly or indirectly within an XML document must also be well-formed. c. An XML document can have only one root element and all other elements must be nested within it. d. Non-root elements must nest properly within each other and cannot “overlap.” e. Every start tag must have a corresponding end tag. The declared empty start tag is not a classic XML start tag, so it is an exception. f. Element names must obey XML naming conventions. 12. A valid XML document is a well-formed XML document that also conforms to the declarations, structures, and other rules defined in the document’s respective DTD or schema. 116 Chapter 3 422541 Ch03.qxd 6/19/03 10:09 AM Page 116 117 Chapter 1, “XML Backgrounder,” explains that XML is derived from SGMLand that many markup and metalanguages have been derived, in turn, from XML. New XML-based markup languages are created by developers who can’t find an existing XML language to meet their industry or organizational needs. They want to create one or more specific types of documents, with specific components related to one another and combined in specific ways. Thus, they have two basic requirements: a way to define the structure and content of their new markup language, and a way to link the relevant documents they will eventually create back to that markup language for validation purposes. The second requirement—creating and linking relevant documents—will probably turn out to be the easier task. But that first one—defining the new markup language—can be a long and involved process. Whole books have been written on that topic. Nevertheless, after you have developed a robust, comprehensive, and extensible document type definition, and when you see that the well-formed and valid documents based on it are properly processed by your applications, you will conclude that those rewards are worth the effort. Presently, XML provides two methods for defining new markup languages: the document type definition (DTD) and the schema. In this chapter, we intro- duce you to basic DTD concepts and syntax. In the next chapter, we introduce you to XML schemas, which are becoming increasingly popular, but which dif- fer significantly from DTDs in a number of areas. Document Type Definitions CHAPTER 4 422541 Ch04.qxd 6/19/03 10:09 AM Page 117 By the end of this chapter, you will know how to create small, simple DTDs and how to create simple, relevant documents based on those DTDs. You will also see how the guided editing capability of the XML editor used in your lab exercises really comes in handy. What Are Document Type Definitions? Each XML-related language is a unique markup solution that meets the spe- cific needs of an organization, industry, group, or even individual. So each language varies from all the others in scope and intent. That is, the names of their document types, element types, and other components are unique and different. But they all have several aspects in common. Each is written accord- ing to the XML 1.0 specifications, which makes all of them members of the same extended markup family. Each is readable by any XML-compliant browser. Each language must be built according to a consistent set of rules, structures, and semantics. After that consistent set has been developed, related XML documents can be created. Document type definitions have historically been the most common method for defining an XML-related language and, thereafter, for developing the related documents. They are a form of metamarkup, which we defined in Chapter 1, that was born during the development of GML in the late 1960s and, later, made part of the ISO’s SGML standard (ISO 8879:1986). XML inher- ited the DTD, with its distinctly non-XML vocabulary, grammar, and syntax, from SGML. DTDs define (the W3C’s term is declare, which is the term we’ll use most often) all of the components that an XML language or document is allowed to contain, as well as the structural relationships among those components. Thus, each unique XML vocabulary, along with its related XML documents, will be created according to the content and structure rules declared within its respec- tive DTD or schema. (Each language can have only one of those documents, and that one document must be either a DTD or a schema.) DTDs are com- posed of the following: ■■ An internal subset of declarations located within an XML document ■■ An actual separate, external document that contains such declarations ■■ A combination of both If there is only one set of declarations and it is found within the XML docu- ment, the declarations are called an internal DTD. If the declarations are in a separate document, they are called an external DTD. If there is a combination of internal and external declarations, each is called a subset and, together, they are considered to be the DTD. 118 Chapter 4 422541 Ch04.qxd 6/19/03 10:09 AM Page 118 To define document types, a DTD must contain several kinds of information (each is discussed in detail in this chapter): Element type declarations. You can’t create just any element types in your XML documents. All element types have to be declared in the DTD, too, and so become part of the DTD’s set of allowed element types (that is, part of the language’s vocabulary). Attribute declarations. Similarly, a DTD declares the set of attributes that can be included in the start tag for each element. Each attribute declara- tion defines the name, default values, and behavior of the attribute. Entity declarations. DTDs contain the specified name and definitions for general and parameter entities. Often, entities are declared in the inter- nal subsets (which we’ll define soon) as well as in the external subsets. Notation declarations. Notation declarations are labels that specify vari- ous types of nonparsed binary data (and text data, too, occasionally). Other information. This type of information consists of the XML declara- tion at the beginning of the document, as well as comments and white space that help to structure the document and communicate other rele- vant information. These declarations are discussed in detail later in this chapter. We’ll see how their syntax defines the relationships among the components they define. These relationships form the content model—that is, the nesting aspects, order, number, frequency, and required or optional nature of the components—and, thus, the XML-related language’s grammar. They are so important that a large portion of the W3C XML Recommendation is dedicated to defining the vari- ous declarations that are allowed in DTDs. Why Use Document Type Definitions? We’ve discussed already how XML is powerful, because with it you can create your own unique element types with meaningful tags. Furthermore, it is possible—but not recommended—to write XML in a freeform style, where elements can occur in a fairly arbitrary order and where elements can be prop- erly nested or overlap. However, the vast majority of XML-related applications are not able to process your documents if the elements occur in an arbitrary order or if they overlap. To ensure that an XML document always communi- cates what the author intends, there should be some structure and content rules (also called constraints). Those rules are manifested in DTDs and schemas. Document Type Definitions 119 422541 Ch04.qxd 6/19/03 10:09 AM Page 119 Classroom Q & A Q: So, when would you use a DTD or schema? A: On several occasions you would consider using DTDs. Here are some examples: when you want to specify default values for attributes or when you want to use style sheets or transformation style sheets. Also, the use of DTDs and schemas would lead to the development of smaller-size XML-related browsers, unlike those HTML browsers that have to carry extra logic in order to “guess” the meaning of bad HTML coding. Or when you want to conduct commerce transactions, it would be important for all parties to use applications and documents that recognize common compo- nents. Or when you are a member of a user community (that is, within an organization or an industry) that shares data. The declarations within a DTD communicate meta information about the DTD and its related documents to an XML parser. That meta information includes the type, frequency, sequencing, and nesting of elements; attribute information; various types of entities; the names and types of external files that may be referenced; and the formats of some external (non-XML) data that also may be referenced. Creating DTDs—General In this chapter, we show you how to create the declarations found in a basic DTD. But we won’t be discussing DTD design in detail. Detailed design—that is, the best content model; the number and semantics of element types, attrib- utes, and other components; the jurisdiction over DTDs; and many other aspects—depends on the specific challenge and context facing the developer. However, we will make a few general comments. XML DTDs must be designed to comply with the XMLwell-formedness and validity constraints. The job of the DTD is to ensure validity, so it must be well formed and valid itself. However, a DTD must not contain any SGML features that are not allowed in XML. The design and implementation of DTDs—at least, those used by an organi- zation, industry, society, or other data-sharing group—can be a complex process, rivaling the management of any complex project. So, like project man- agement, the process usually involves several stages: planning and design; creation and testing (some call it validating or verification); deployment and commissioning; and finally, documentation. Please recognize that there may eventually be an extension phase—that is, a revisit to the definition of the lan- guage to add components—based on experience gained during the initial use 120 Chapter 4 422541 Ch04.qxd 6/19/03 10:09 AM Page 120 of the XML-related language and its documents. So it is important to design a DTD for extensibility. We recommend that, during the documentation stage, DTD developers pro- vide complete and detailed documentation with every DTD suite (XML docu- ments, relevant DTDs, and other referenced entities). The documentation should be designed for use by XML novices and experts, and it should detail the syntax, proper use, and client-specific definition for each element in a DTD. Additional relevant information about each element, such as probable audio/visual presentation, should also be included as comments. You should also produce documentation for all other XML documents (including all of their relevant DTDs and other documents) that will interoperate with the sub- ject XML document and DTD suite. An XML application isn’t considered com- plete or stable until it is fully documented. If you are working on the development of an XML application or on the development of individual DTDs or schemas, consult one or more of the several books dedicated to DTD design on the market. This chapter can only provide an introduction and overview to the syntax, components, and processes. For any mature XML application, its DTDs are usually referenced by more than one document. So DTDs should be designed to be flexible, reusable, and practical. The more detailed the DTD, the more detailed the related docu- ments’ structures, element types, and attributes will be. Consequently, there is a greater likelihood that, when the related applications access XML docu- ments, they will obtain the data they need from them. But remember that the development of each DTD and document component costs time and money. DTD Types and Locations As we learned in Chapter 3, “Anatomy of an XML Document,” a valid XML document is a well-formed XML document with a document type declaration that contains or refers to a DTD or schema and that conforms to the declara- tions found in that DTD or schema. The respective W3C Recommendations for XML and XML schemas identify all of the criteria in detail. In Chapter 3, we also discussed how the structure of a conforming XML doc- ument consists of two major parts: the prolog and the data instance (which contains the root element and other components). A document type declara- tion statement (also called a DOCTYPE definition) should always be included in the prolog. That declaration states what class or type the document is and may also refer to internal and external DTD declarations to which the docu- ment must adhere to be valid. Document Type Definitions 121 422541 Ch04.qxd 6/19/03 10:09 AM Page 121 As we stated earlier, then, within its document type declaration statement, there may be an internal set of declarations (an internal DTD or internal sub- set), the name and location of an external document containing declarations (an external DTD or an external subset), or both. In other words, there may be a standalone internal DTD, an external DTD, or a combination of an internal DTD plus a reference to an external DTD. To determine whether a document is valid, the XML processor must read the entire document type definition, including internal and external subsets. For some applications, however, validity may not be required, and it may be sufficient for the processor to read only the internal subset. Internal DTD Subsets Figure 4.1 is an example of an XML document that contains an internal DTD subset. In Figure 4.1, the standalone pseudo-attribute states standalone=”yes”, so we can say that the document contains only an internal DTD. The value “yes” indicates that the components in the document need to be validated against the internal declarations only; no external DTD subset needs to be consulted. Because the standalone specification is “yes”, the parser looks for an internal DTD in the document type declaration statement, between the opening and closing square brackets ([ and ]). Internal DTDs are handy during early development stages. An author can check validity and save time and resources without installing applications or altering server or directory systems. A validating parser, which merely has to check a document against the document’s own internal declarations, is all that is needed. A developer is not restricted to using either an internal DTD or an external DTD. Developers can combine internal declaration subsets with external DTD subsets. In combination cases, the value of standalone is set to “no”. The parser would then consult the declarations in the internal subset and in the external subset. External DTD Subsets DTD declarations can be stored in an external document, which is referred to in the DOCTYPE definition of one or more XML documents. There are three types of external DTDs: ■■ Private external DTDs ■■ External DTDs located at Web sites ■■ External DTDs with public access 122 Chapter 4 422541 Ch04.qxd 6/19/03 10:09 AM Page 122 Figure 4.1 A simple XML document with an internal DTD subset. Private External DTDs Figure 4.2 illustrates another XML document, whose standalone pseudo- attribute has been set to “no” in the XML declaration statement. In the DOC- TYPE definition statement, the parser is told that an external DTD subset must be consulted. In this case, the external subset can be called the external DTD, because it alone contains the declarations. In the figure, the name of the exter- nal DTD document is diamonds2.dtd. The XML document must follow the syntax and structure rules found in diamonds2.dtd. There is an indication that the physical location of the diamonds2.dtd docu- ment is on the local system, because the keyword SYSTEM has been inserted after the class specification diamonds. In fact, the diamonds2.dtd document appears to be in the same directory as the XML document itself, because there are no additional paths (that is, folders or directories) specified with diamonds2.dtd. <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <?xml-stylesheet type="text/css" href="diamonds1.css"?> <!DOCTYPE diamonds [ <!ELEMENT diamonds (location,gem)*> <!ELEMENT location (#PCDATA)> <!ELEMENT gem (name,carats,color,clarity,cut,cost,reserved)> <!ELEMENT name (#PCDATA)> <!ELEMENT carats (#PCDATA)> <!ELEMENT color (#PCDATA)> <!ELEMENT clarity (#PCDATA)> <!ELEMENT cut (#PCDATA)> <!ELEMENT cost (#PCDATA)> <!ELEMENT reserved EMPTY> ]> <! Gems Version 1 - Space Gems, Inc. > <! filename: gems_excerpt_04.xml > <diamonds> <location>Ursae Majoris</location> <gem> <name>Smokey</name> <carats>1003.29</carats> <color>F</color> <clarity>IF</clarity> <cut>Ideal</cut> <cost>2250000</cost> <reserved /> </gem> </diamonds> Document Type Definitions 123 422541 Ch04.qxd 6/19/03 10:09 AM Page 123 [...]... several DTDs are accessed by several XML documents Notation Declarations We’ve mentioned that XML documents can contain parsed text data and unparsed data (for example, audio, video, and other document files) In Chapter 3, “Anatomy of an XML Document,” we showed you how to incorporate XML data Now we show you how to incorporate non -XML data There are two basic methods for incorporating non -XML data into... Definitions and xml: lang NMTOKEN ‘x-cancri-au’> In each case, the names are character strings that begin with a letter In each case, too, there is a default value specified between quotation marks Entity Declarations We learned in Chapter 3 that entities are the physical storage units for the parsed and unparsed data that compose every XML document They are references that are passed along... #REQUIRED indicates that no default value exists Eventually, the XML parser reads the DTD as it validates the XML document and passes the attribute specification data to the application CDATA is one of XML s 10 possible attribute types Table 4.1 lists all the attribute types available 135 136 Chapter 4 Table 4.1 Attribute Types ATTRIBUTE TYPE VALUE SPECIFICATION CDATA Value is a character string Any text... White-space maintenance requires two steps: inserting the xml: space attribute in the relevant element start tags, and the corresponding declaration of the attribute in the DTD Both of these are needed to advise the parser to maintain white space 137 138 Chapter 4 Remember that the only legal values for XML: space are preserve and default The value default indicates that the author does not mind whatever... declaration for the default namespace in its start tag look like? We would be correct if we created the following attribute declaration: The declaration states that in the extent of any instance of the element type, an attribute named xmlns appears, whose value contains parseable character data The value does not change;... space If you would like more information on normalization, refer to the XML 1.0 Recommendation Chapter 4 Labs: Creating a DTD In the Chapter 3 lab exercises, you created an XML document whose data instance consisted of a structure of several elements containing the names and relevant characteristics of several diamonds That document was created to introduce you to the nature of XML data structuring and... gems_excerpt_11 .xml > Smokey AFTER Figure 4.6 Example of an internal parameter entity Document Type Definitions External Parameter Entities Parameter entities can be added to external DTD subsets in a manner similar to the way the internal parameter entity example appears in Figure 4.6 Parameter entity advantages are multiplied if you add them to external DTDs, especially if each... Attributes in the DTD In Chapter 3, we learned how namespace declarations are a specialized form of attribute specifications Thus, for their documents to be valid, declarations for namespaces must also appear in DTDs and schemas A declaration must appear for each namespace But just as default namespaces differ from prefix namespaces, their declarations also differ The next sections describe the specific approaches... and formatting In practice, though, a DTD is created first and then is used as a template to create XML documents (TurboxXML calls them instances) So the labs in this chapter represent a restart In the first lab, you construct a DTD that declares the properties of several diamond-related components In the second, you create an XML data instance from that DTD That instance is identical to the lab exercise... declaration that we discussed earlier in this chapter We also mentioned earlier that the “character data only” element type declaration is actually an example of the mixed content element type declaration This example declaration states that there is an element type named that may contain one or more child element types If it does, the child element type can be parsed character data or parsed . various element types are declared in DTDs. Elements Containing Parsed Character Data If you are creating a declaration for an element type that is intended to contain parsed character data,. Questions 1. What is the difference between an application and an XML application? 2. What are the names of the four basic operators in a validating parser? 3. What are the two most fundamental components. 116 117 Chapter 1, XML Backgrounder,” explains that XML is derived from SGMLand that many markup and metalanguages have been derived, in turn, from XML. New XML- based markup languages are created by developers

Ngày đăng: 14/08/2014, 12:20

TỪ KHÓA LIÊN QUAN