Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 71 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
71
Dung lượng
319,46 KB
Nội dung
Here is a line. I think I'll insert a break. <br/> Here is a line separated from the previous one by a break. An element can contain text, or one or more other elements or both. You can see this in the resume and JavaBeans examples. If you keep in mind the idea that elements are nodes on a tree and can be moved and manipulated, then it will make sense to you that elements must be properly nested. For example, the following is not allowed: <outer> <inner> </outer> </inner> If you want to pick up the entire <inner> element and place it before the <outer> element, you would be taking the end tag for <outer> with you. Instead, you have to properly nest, as follows: <outer> <inner> </inner> </outer> In these last two snippets we've omitted the indentation that we usually include for readability. There was no way to properly indent the first snippet, and we didn't want to imply in the second one that the indentation was why the second one properly parsed. Another of the rules is that XML is case−sensitive. Again, many of us have gotten sloppy in HTML and written something like the following: <html> <Body> </body> </HTML> As Java developers, this restriction shouldn't bother us. We often use different cases to indicate a class and an instance of the class. To declare an object of type Dog named dog, we might write something like the following: Dog dog = new Dog(); The point isn't whether or not you like this naming convention, but that you aren't in need of case−sensitivity training. As you choose an element name, you should make sure that it starts with a letter or underscore and that it doesn't contain any spaces. Following your Java naming conventions, you should choose names that are descriptive and that help you or other developers understand what you are describing. Namespaces You need namespaces in XML for the same reasons that you use packages in Java. You may have constructed your own version of a resume, wherein your concept of an address is different from mine. To distinguish your address element from mine, prefix the element name with the name of a namespace. My <address> consists of Chapter 10: Building an XML Foundation 205 <street>, <city>, <state>, <zip>, and <phone>. Just as you would tend to package these together in Java, you should put them in the same namespace. Again, this way your <address> will use your <street>, and so on. Let's say that our namespace will be called J2EEBible, and that yours will be called reader. Then we will refer to our <address> with the qualified name <J2EEBible:address>, and to yours as <reader:address>. In each case, the part before the colon is the prefix, and the part after is the local part. Really J2EEBible is not the namespace; it is the prefix that we will bind to a particular namespace using the following syntax: xlmns:prefix="URI" Here's how the use of namespaces might change the earlier resume document: <?xml version="1.0"?> <J2EEBible:resume xmlns:J2EEBible="http://www.hungryminds.com/j2eebible/"> <J2EEBible:name> A. Employee </name> <J2EEBible:address> <J2EEBible:street> 1234 My Street </J2EEBible:street> <J2EEBible:city> My City</J2EEBible:city> <J2EEBible:state> OH </J2EEBible:state> <J2EEBible:zip> 44120 </J2EEBible:zip> <J2EEBible:phone> (555) 555−5555 </J2EEBible:phone> </J2EEBible:address> <J2EEBible:education> <J2EEBible:school> Whatsamatta U.</J2EEBible:school> <J2EEBible:degree> B.S. </J2EEBible:degree> <J2EEBible:yeargraduated> 1920 </J2EEBible:yeargraduated> </J2EEBible:education> </J2EEBible:resume> The portion in boldface shows where the namespace is declared. It is an attribute placed inside the start tag for the <resume> element. (We'll say more about attributes in the next subsection.) First we prefixed the tag with the name of the namespace, and then we bound the name J2EEBible to the URI http://www.hungryminds.com/j2eebible/. The URI that you choose is not necessarily a URL that can actually be typed into a browser; it is a way of uniquely identifying your namespace, just as you might use com.hungryminds.j2eebible to name a Java package. You can use more than one namespace in a document. You can also use a default namespace, that any element without a prefix is associated with. You denote the default namespace using the following syntax: xmlns="http://www.hungryminds.com/somedefaultnamespace/" Note that there is no colon after xmlns, nor any prefix name. If you add the default namespace to your modified resume file, then <J2EEBible:address> refers to the element defined in our namespace, whereas <address> refers to the element defined in the default namespace. Attributes In addition to specifying the content between the start and end tags of an element, you can include attributes in an element start tag itself. Inside the element's start tag you include an attribute as a name−value pair using the following syntax: name="value" Chapter 10: Building an XML Foundation 206 The attribute value is enclosed in quotation marks: We've used double quotes here, but you can also use single quotes. The name of an attribute follows the same rules and guidelines as the name of an element. Consider how namespaces affect attributes. When we specified the default namespace, the name of the attribute was xmlns, and the value was http://www.hungryminds.com/somedefaultnamespace/. When we specified the namespace J2EEBible, the name of the attribute was xmlns:J2EEBible, and the value was http://www.hungryminds.com/j2eebible/. The biggest question is, "when should you use an attribute?" The issue is that for the most part, any attribute could also have been created as a sub−element of the current element. The general rule of thumb for using attributes is that attributes should contain metadata or system information. Elements should contain data that you may be presenting or working with. These guidelines are not always cut and dry, however. Take a look at a snippet from the JavaBeans example earlier in this chapter: <java version="1.4.0−beta" class="java.beans.XMLDecoder"> <object class="javax.swing.JFrame"> <void property="bounds"> <object class="java.awt.Rectangle"> <int>0</int> <int>0</int> <int>200</int> <int>300</int> </object> </void> <void property="defaultCloseOperation"> <int>3</int> </void> The attributes associated with the java and the first object elements aren't too controversial. In the java element, attributes are being used to specify the version and the class that can interpret this element. The first object element has the attribute class, which points to the class that you are instantiating. You could have viewed the bounds of the JFrame as an attribute. Similarly, you could have written the defaultCloseOperation in many ways, including the following: <void property="defaultCloseOperation" value="3" /> <void defaultCloseOperation="3" /> <defaultCloseOperation value="3" /> If you were just inventing the tags you'd use in an application, none of these choices would be wrong. The actual code given in the example above was chosen over these alternatives to conform with the specification outlined in JSR−57, and this solution is best for bean persistence across IDEs. When you are designing your own XML documents, you will have to make your own decisions about what is an attribute and what is an element. Follow the rough rule of thumb about usage and rest assured that whichever choice you make for the remaining cases, lots of people will feel that you're wrong. One limitation may influence your decision about whether something should be represented as an element or as an attribute. The following version of setting the bounds of the JFrame would not be legal: <void property="bounds"> <object class="java.awt.Rectangle" int="0" int="0" int="200" int="300" /> Chapter 10: Building an XML Foundation 207 </void> This code is illegal because you can't use the same name for two different attributes. This wasn't a problem with elements. In the original version you had four ints: Each was a different element contained between the object start and end tags. It would be legal to code this example as follows: <void property="bounds"> <object class="java.awt.Rectangle" xTopLeft="0" yTopLeft="0" xBottomRight="200" yBottomRight="300" /> </void> This code may seem more descriptive than the original, but you have to remember what this XML document is being used for. You want to define the bounds of your JFrame by passing in a Rectangle. The Rectangle is constructed from four int primitives. The original code clearly conveyed this information to a Java developer. It was also generated automatically from the Java code that specified the bounds of the JFrame. Summary In this chapter you've been introduced to XML from the perspective of a Java developer. So far you have learned the following: Fundamentally, XML is a format that represents data along with tags that describe that data. This "self−describing" document is both human− and machine−readable. Binary files that use proprietary formats are not easily read by people or by other applications, and HTML produces content that humans can read, but that means little to machines. XML provides a robust format for both humans and machines. • To display XML in a user−friendly form you have to use some companion technology. You can convert XML to HTML or another format using XSLT, or you can treat it as you do HTML and use it with Cascading Style Sheets. We'll further explore the first option in Chapter 14. • When documents are represented using XML instead of HTML, the different parts become more accessible. You can more easily manipulate the document and pull out the content you are looking for. • To standardize configuration files, a movement has sprung up in favor of using XML. You've already seen this use of XML in the web.xml configuration files for Tomcat and Enterprise JavaBeans. • XML is used to persist data about JavaBeans and to aid development across many IDEs. The file is generated and read by the XMLEncoder and XMLDecoder classes along with helper classes that were added to the java.beans package in JDK 1.4. • Elements must have properly nested start and end tags. An element may have an empty tag that is basically both a start and an end tag. When choosing names for elements, remember that XML is case−sensitive. • Attributes are useful for including meta−information. Data that won't be rendered for the client, and that are system information, are often better represented as attributes than as elements. You can't, however, repeat an attribute name the way you can repeat an element name. • Chapter 10: Building an XML Foundation 208 Chapter 11: Describing Documents with DTDs and Schemas Overview Good programming practices in Java stress separating the interface from the implementation. If you know the interface for a class, then you know how to write applications that use the methods in that class. You don't care about the implementation. Similarly, in an XML document, if you know how the data are structured, you can write Java applications that extract, create, and manipulate the document. Currently, the most popular way to specify the structure of an XML file is to use a Document Type Definition (DTD). XML Schema is an XML technology that enables you to constrain an XML document using an XML document. In this chapter you'll begin by reading through a DTD to get a feel for the syntax. You'll then be able to use a Web resource to validate an XML document against that DTD. After that, you'll be ready to write your own DTD — one that enforces the rules you need to enforce in our running résumé example. Finally, you'll see how you can constrain the same document using XML Schema. We won't show you every aspect of constructing a DTD or a schema, but you'll learn enough that you'll be able to consult the specs for the rest of the details. DTDs and XML Schema are not the only systems for constraining XML. The Schematron is a Structural Schema Language for constraining XML using patterns in trees. You can find out more at the Academia Sinica Computing Centre's Web site, http://www.ascc.net/xml/resource/schematron/schematron.html. The Regular Language description for XML (RELAX) is currently working its way through the ISO. You can find a tutorial in English or Japanese, examples, and links to software at the RELAX homepage at http://www.xml.gr.jp/relax/. Producing Valid XML Documents In Chapter 10, we began to show you what XML documents are. We considered some examples and showed you some of the basic rules of producing well−formed XML. These were basically grammatical rules. As long as the syntax was OK, we were satisfied that the XML document could be parsed by an XML parser so that you could process the information using a Java application. Consider, for example, the following sentence: My ele dri brok phantenves ice 7cream. It's hard to make sense of it. Perhaps the silent 7 at the beginning of cream doesn't help. It's also difficult because the words elephant, drives, and broken are not properly nested. The following sentence is easier to read, although it doesn't make much more sense: My elephant drives broken ice cream. Now the sentence is well formed. You can parse it and locate the subject, the verb and the object. Depending on where and when you went to school, you may even be able to diagram it. You can alter the sentence in many ways so that it makes sense: My elephant eats delicious ice cream. 209 My elephant drives large trucks. My elephant likes broken ice cream cones. If your task were to make sense out of "My elephant drives broken ice cream" then, even though it is well formed, you still would be out of luck. But what if you had to follow a rule like the following: If verb="drives" the object must describe one or more vehicles. Now you can go to town. Maybe you need to restrict the subject to being a human being, but you can see the improvement. The sentence begins to make some sort of sense. That is what you get when you provide a DTD or a schema for an XML document to follow. You are defining the structure of the document. If a document conforms to the specified DTD, it is said to be valid. Once you know that a document is valid according to a specific DTD, you know where to find the elements you're looking for. That's why it's a good idea to understand DTDs and schema before you start parsing and working with XML documents. Reading a DTD Before we show you how to create a DTD, take a look at one that corresponds to the resume document we looked at in Chapter 10. To remind you, here's the XML version of the résumé document: <?xml version="1.0"?> <resume> <name> A. Employee </name> <address> <street> 1234 My Street </street> <city> My City</city> <state> OH </state> <zip> 44120 </zip> <phone> (555) 555−5555 </phone> </address> <education> <school> Whatsamatta U.</school> <degree> B.S. </degree> <yeargraduated> 1920 </yeargraduated> </education> </resume> It was pretty easy to determine the structure of this document just by looking at it. Now the goal is to go in the other direction. Having a DTD enables you to specify the structure so that anyone who wants to create a résumé that conforms to our DTD knows which elements he or she can or must use, and the order in which those elements should go. <!ELEMENT resume (name, address, education)> <!ELEMENT address (street, city, state, zip, phone)> <!ELEMENT education (school, degree, yeargraduated)> <!ELEMENT name (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT zip (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT school (#PCDATA)> <!ELEMENT degree (#PCDATA)> <!ELEMENT yeargraduated (#PCDATA)> Chapter 11: Describing Documents with DTDs and Schemas 210 Without knowing the DTD syntax, you can figure out that the first element is called resume and consists of the elements name, address, and education. You might even assume, correctly, that there can be only one of each of those elements and that they appear in the given order. Similarly, the address element is also made up of one of each of the elements street, city, state, zip, and phone, and the education element consists of one each of the elements school, degree, and yeargraduated. The remaining elements are somehow different. Each consists of #PCDATA. This indicates that you can think of these elements as being the fundamental building blocks of the other elements. In other words, address and education are both made up of these fundamental building blocks, which in turn consist of nothing more than parsed character data. Connecting the document and the DTD At this point you have an XML file and a DTD but nothing that ties them to each other. You follow the same basic rules you would follow in tying a CSS (Cascading Style Sheet) to an HTML document. For example, to indicate that this XML file references that particular DTD, you can just include the DTD in the XML file, as shown in the following example: <?xml version="1.0"?> <!DOCTYPE resume [ <!ELEMENT resume (name, address, education)> <!ELEMENT address (street, city, state, zip, phone)> <!ELEMENT education (school, degree, yeargraduated)> <!ELEMENT name (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT zip (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT school (#PCDATA)> <!ELEMENT degree (#PCDATA)> <!ELEMENT yeargraduated (#PCDATA)> ]> <resume> <name> A. Employee </name> <address> <street> 1234 My Street </street> <city> My City</city> <state> OH </state> <zip> 44120 </zip> <phone> (555) 555−5555 </phone> </address> <education> <school> Whassamatta U.</school> <degree> B.S. </degree> <yeargraduated> 1920 </yeargraduated> </education> </resume> The portion in bold, <!DOCTYPE resume [ ]>, is the document type declaration. It specifies that the root element is of type resume and then includes the DTD between square brackets. The processing instruction <?xml version=1.0?> and the DOCTYPE tag are not elements and so do not need to have matching closing tags. It would be inefficient and overly restrictive for every XML file to include the DTD (or DTDs) it uses. Instead, suppose that you save this particular DTD in a file called resume.dtd in the same directory that contains your XML file. Then you can reference the DTD using the following document type declaration instead: Chapter 11: Describing Documents with DTDs and Schemas 211 <!DOCTYPE resume SYSTEM "resume.dtd"> Here you don't include the DTD in the document type declaration but rather point to it. You can place it in another directory and use a relative URL, or you can provide an absolute URI that points to the document on your machine or another machine. Take a look at the /lib/dtds directory in your J2EE distribution. It contains various DTDs for use in enterprise applications. By storing your DTDs in this location, you can reference them from any XML document that needs to be validated against them. The web.xml document that you used as a config file for Tomcat had the following document type declaration: <!DOCTYPE web−app PUBLIC "−//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/j2ee/dtds/web−app_2_3.dtd"> Here the DTD is declared to be PUBLIC instead of SYSTEM. The idea is that you aren't just using a DTD for your own idea of what a résumé should look like; this DTD will be used by tons of people customizing the web.xml file to configure their servlet containers. The validator will first try to use the first address that follows the word PUBLIC. In this case that address signifies that no standards body has approved this DTD, that it is owned by Sun, and that it describes Web Applications version 2.3 in English. The second address indicates the URI where the DTD can be found. Note Sun has moved the address for all its J2EE DTDs to the URL http://java.sun.com/dtd/. The document type declaration in the current Tomcat config will most likely have been updated by the time you read this. You should install the latest version so that the changes are reflected. You will also have a local copy of these files in your J2EE SDK distribution version 1.3 or higher, in the directory /lib/dtds/. Take a look at the web−app DTD. It includes a lot of documentation to help you understand what each element is designed to handle. Here's the specification for the web−app element. <!ELEMENT web−app (icon?, display−name?, description?, distributable?, context−param*, filter*, filter−mapping*, listener*, servlet*, servlet−mapping*, session−config?, mime−mapping*, welcome−file−list?, error−page*, taglib*, resource−env−ref*, resource−ref*, security−constraint*, login− config?, security−role*, env−entry*, ejb−ref*)> From your experience so far you can figure out that the list in parentheses is an ordered list of elements the web−app contains. But now each name is followed by a ? or a *. As you'll see in the following section, the ? indicates that the element may or may not be included, and the * indicates that if it's included, there may be more than one. Writing Document Type Definitions (DTDs) In the previous section you saw a couple of examples of DTDs and got a feel for the basic syntax. In this section we'll run through the most common constructs used to specify elements and attributes. For more information on DTDs you should consult a book devoted to XML, such as the second edition of Elliotte Rusty Harold's XML Bible (Hungry Minds, 2001). Chapter 11: Describing Documents with DTDs and Schemas 212 Declaring elements From our examples, you've probably figured out that the syntax for declaring an element is the following: <!ELEMENT element−name (what it contains )> In Chapter 10, we covered restrictions on the name of the element. Now take a look at what an element can contain. Nothing at all In the resume example, let's say that the employer belongs to a secret club and wishes to give preferential treatment to others in the same club. This club membership indicator may appear in an element that contains information but doesn't appear on the page. For example, the resume may be adjusted as follows: <resume> <name> A. Employee </name> <knowsSecretHandshake /> <address> You should adjust the DTD to indicate that there is now an empty element called knowsSecretHandshake. Of course, you have to adjust the resume element declaration in the DTD as well, in addition to adding the following entry: <!ELEMENT knowsSecretHandshake EMPTY> Nothing but text The fundamental building blocks of the resume contain nothing but #PCDATA. This parsed character data is just text. You could have declared street as consisting of a streetNumber and a streetName. You didn't. It is declared as follows: <!ELEMENT street (#PCDATA)> So the contents of street can't meaningfully be further parsed by an XML parser. Other elements Now the fun begins. An element can contain one or more other elements. It may seem a bit silly to have it contain only one — but you can. If the parent element contains nothing but what is in the child, and only a single child element exists, then there should be a good reason for this additional layer. In any case, here's how you would declare it: <!ELEMENT parent (child)> You've already seen the case of a parent containing more than one child. For example, you declared the education element in the resume example as follows: <!ELEMENT education (school, degree, yeargraduated)> Chapter 11: Describing Documents with DTDs and Schemas 213 It is possible that your candidate never went to school. You can indicate that the resume element may contain one or no education elements by using a ? after the word education: <!ELEMENT resume (name, address, education?)> You'll notice that no symbols follow name or address. This indicates that these elements must occur exactly once each. On second thought, your candidate may never have graduated from school, or may have graduated from one or more schools. You can indicate that an element may occur zero or more times by using a *. In this example, the resume element would be declared as follows: <!ELEMENT resume (name, address, education*)> Your candidate may have more than one address, and you don't want to allow the candidate to have no address or you won't be able to contact him or her. You can't, therefore, just use the * and hope that it is used correctly. You use the symbol + to indicate that an element will appear one or more times. The following example shows what this symbol looks like applied to the address element: <!ELEMENT resume (name, address+, education)> It is possible that your candidate has more than one degree from the same school. You can group elements to expand your options in specifying the number of degrees. Here's how you'd specify that a candidate can have one or more degrees from the same school: <!ELEMENT education (school, (degree, yeargraduated)+)> The element yeargraduated is grouped with the element degree so you know the year associated with each degree earned. Finally, you may want to present options. You may want to indicate that an element can contain either a certain element (or group of elements) or another one. You can do this with the | symbol. Here's how you indicate that an address consists either of a street, city, state, and zip or of a phone: <!ELEMENT address ((street, city, state, zip)| phone)> Mixed content Sometimes you want to include text without having to create a whole new element that represents this text. For example, this is an XML version of the nonsense example from the beginning of the chapter: <nonsense> My <animal> Elephant </animal> drives <vehicles> large trucks </vehicles>. </nonsense> The corresponding DTD entry is the following: <!ENTITY nonsense (#PCDATA,animal,#PCDATA,vehicles,#PCDATA)> Really, the format of the entry isn't different from the format of those you saw when including other elements. The difference is that #PCDATA is an allowable entry. Chapter 11: Describing Documents with DTDs and Schemas 214 [...]... Nothing from the XML file is pointing at the schema, so you have to alter the schema to point to the XML file You do this in the schema opening tag, as follows: 03.01, 1997.01. 02 −−> .27 (thanks to Eve Maler) −−> . <J2EEBible:street> 123 4 My Street </J2EEBible:street> <J2EEBible:city> My City</J2EEBible:city> <J2EEBible:state> OH </J2EEBible:state> <J2EEBible:zip> 44 120 </J2EEBible:zip> . <J2EEBible:phone> (555) 555−5555 </J2EEBible:phone> </J2EEBible:address> <J2EEBible:education> <J2EEBible:school> Whatsamatta U.</J2EEBible:school> <J2EEBible:degree>. version="1.0"?> <J2EEBible:resume xmlns:J2EEBible="http://www.hungryminds.com/j2eebible/"> <J2EEBible:name> A. Employee </name> <J2EEBible:address> <J2EEBible:street>