1. Trang chủ
  2. » Công Nghệ Thông Tin

Java & XML 2nd Edition solutions to real world problems phần 3 pdf

42 348 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 42
Dung lượng 710,88 KB

Nội dung

Java & XML, 2nd Edition 81 chain, or pipeline, of events. To understand what I mean by a pipeline, here's the normal flow of a SAX parse: • Events in an XML document are passed to the SAX reader. • The SAX reader and registered handlers pass events and data to an application. What developers started realizing, though, is that it is simple to insert one or more additional links into this chain: • Events in an XML document are passed to the SAX reader. • The SAX reader performs some processing and passes information to another SAX reader. • Repeat until all SAX processing is done. • Finally, the SAX reader and registered handlers pass events and data to an application. It's the middle steps that introduce a pipeline, where one reader that performed specific processing passes its information on to another reader, repeatedly, instead of having to lump all code into one reader. When this pipeline is set up with multiple readers, modular and efficient programming results. And that's what the XMLFilter class allows for: chaining of XMLReader implementations through filtering. Enhancing this even further is the class org.xml.sax.helpers.XMLFilterImpl , which provides a helpful implementation of XMLFilter. It is the convergence of an XMLFilter and the DefaultHandler class I showed you in the last section; the XMLFilterImpl class implements XMLFilter, ContentHandler, ErrorHandler, EntityResolver, and DTDHandler, providing pass-through versions of each method of each handler. In other words, it sets up a pipeline for all SAX events, allowing your code to override any methods that need to insert processing into the pipeline. Let's use one of these filters. Example 4-5 is a working, ready-to-use filter. You're past the basics, so we will move through this rapidly. Example 4-5. NamespaceFilter class package javaxml2; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.XMLReader; import org.xml.sax.helpers.XMLFilterImpl; public class NamespaceFilter extends XMLFilterImpl { /** The old URI, to replace */ private String oldURI; /** The new URI, to replace the old URI with */ private String newURI; public NamespaceFilter(XMLReader reader, String oldURI, String newURI) { super(reader); this.oldURI = oldURI; this.newURI = newURI; } Java & XML, 2nd Edition 82 public void startPrefixMapping(String prefix, String uri) throws SAXException { // Change URI, if needed if (uri.equals(oldURI)) { super.startPrefixMapping(prefix, newURI); } else { super.startPrefixMapping(prefix, uri); } } public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { // Change URI, if needed if (uri.equals(oldURI)) { super.startElement(newURI, localName, qName, attributes); } else { super.startElement(uri, localName, qName, attributes); } } public void endElement(String uri, String localName, String qName) throws SAXException { // Change URI, if needed if (uri.equals(oldURI)) { super.endElement(newURI, localName, qName); } else { super.endElement(uri, localName, qName); } } } I start out by extending XMLFilterImpl, so I don't have to worry about any events that I don't explicitly need to change; the XMLFilterImpl class takes care of them by passing on all events unchanged unless a method is overridden. I can get down to the business of what I want the filter to do; in this case, that's changing a namespace URI from one to another. If this task seems trivial, don't underestimate its usefulness. Many times in the last several years, the URI of a namespace for a specification (such as XML Schema or XSLT) has changed. Rather than having to hand-edit all of my XML documents or write code for XML that I receive, this NamespaceFilter takes care of the problem for me. Passing an XMLReader instance to the constructor sets that reader as its parent, so the parent reader receives any events passed on from the filter (which is all events, by virtue of the XMLFilterImpl class, unless the NamespaceFilter class overrides that behavior). By supplying two URIs, the original and the URI to replace it with, you set this filter up. The three overridden methods handle any needed interchanging of that URI. Once you have a filter like this in place, you supply a reader to it, and then operate upon the filter, not the reader. Going back to contents.xml and SAXTreeViewer, suppose that O'Reilly has informed me that my book's online URL is no longer http://www.oreilly.com/javaxml2, but http://www.oreilly.com/catalog/javaxml2. Rather than editing all my XML samples and uploading them, I can just use the NamespaceFilter class: Java & XML, 2nd Edition 83 public void buildTree(DefaultTreeModel treeModel, DefaultMutableTreeNode base, String xmlURI) throws IOException, SAXException { // Create instances needed for parsing XMLReader reader = XMLReaderFactory.createXMLReader(vendorParserClass); NamespaceFilter filter = new NamespaceFilter(reader, "http://www.oreilly.com/javaxml2", "http://www.oreilly.com/catalog/javaxml2"); ContentHandler jTreeContentHandler = new JTreeContentHandler(treeModel, base, reader); ErrorHandler jTreeErrorHandler = new JTreeErrorHandler( ); // Register content handler filter.setContentHandler(jTreeContentHandler); // Register error handler filter.setErrorHandler(jTreeErrorHandler); // Register entity resolver filter.setEntityResolver(new SimpleEntityResolver( )); // Parse InputSource inputSource = new InputSource(xmlURI); filter.parse(inputSource); } Notice, as I said, that all operation occurs upon the filter, not the reader instance. With this filtering in place, you can compile both source files (NamespaceFilter.java and SAXTreeViewer.java), and run the viewer on the contents.xml file. You'll see that the O'Reilly namespace URI for my book is changed in every occurrence, shown in Figure 4-2. Java & XML, 2nd Edition 84 Figure 4-2. SAXTreeViewer on contents.xml with NamespaceFilter in place Of course, you can chain these filters together as well, and use them as standard libraries. When I'm dealing with older XML documents, I often create several of these with old XSL and XML Schema URIs and put them in place so I don't have to worry about incorrect URIs: XMLReader reader = XMLReaderFactory.createXMLReader(vendorParserClass); NamespaceFilter xslFilter = new NamespaceFilter(reader, "http://www.w3.org/TR/XSL", "http://www.w3.org/1999/XSL/Transform"); NamespaceFilter xsdFilter = new NamespaceFilter(xslFilter, "http://www.w3.org/TR/XMLSchema", "http://www.w3.org/2001/XMLSchema"); Here, I'm building a longer pipeline to ensure that no old namespace URIs sneak by and cause my applications any trouble. Be careful not to build too long a pipeline; each new link in the chain adds some processing time. All the same, this is a great way to build reusable components for SAX. 4.3.2 XMLWriter Now that you understand how filters work in SAX, I want to introduce you to a specific filter, XMLWriter . This class, as well as a subclass of it, DataWriter , can be downloaded from David Megginson's SAX site at http://www.megginson.com/SAX. XMLWriter extends XMLFilterImpl, and DataWriter extends XMLWriter. Both of these filter classes are used to output XML, which may seem a bit at odds with what you've learned so far about SAX. However, just as you could insert statements that output to Java Writers in SAX callbacks, so can this class. I'm not going to spend a lot of time on this class, because it's not really the way Java & XML, 2nd Edition 85 you want to be outputting XML in the general sense; it's much better to use DOM, JDOM, or another XML API if you want mutability. However, the XMLWriter class offers a valuable way to inspect what's going on in a SAX pipeline. By inserting it between other filters and readers in your pipeline, it can be used to output a snapshot of your data at whatever point it resides in your processing chain. For example, in the case where I'm changing namespace URIs, it might be that you want to actually store the XML document with the new namespace URI (be it a modified O'Reilly URI, a updated XSL one, or the XML Schema one) for later use. This becomes a piece of cake by using the XMLWriter class. Since you've already got SAXTreeViewer using the NamespaceFilter, I'll use that as an example. First, add import statements for java.io.Writer (for output), and the com.megginson.sax.XMLWriter class. Once that's in place, you'll need to insert an instance of XMLWriter between the NamespaceFilter and the XMLReader instances; this means output will occur after namespaces have been changed but before the visual events occur. Change your code as shown here: public void buildTree(DefaultTreeModel treeModel, DefaultMutableTreeNode base, String xmlURI) throws IOException, SAXException { // Create instances needed for parsing XMLReader reader = XMLReaderFactory.createXMLReader(vendorParserClass); XMLWriter writer = new XMLWriter(reader, new FileWriter("snapshot.xml")); NamespaceFilter filter = new NamespaceFilter(writer, "http://www.oreilly.com/javaxml2", "http://www.oreilly.com/catalog/javaxml2"); ContentHandler jTreeContentHandler = new JTreeContentHandler(treeModel, base, reader); ErrorHandler jTreeErrorHandler = new JTreeErrorHandler( ); // Register content handler filter.setContentHandler(jTreeContentHandler); // Register error handler filter.setErrorHandler(jTreeErrorHandler); // Register entity resolver filter.setEntityResolver(new SimpleEntityResolver( )); // Parse InputSource inputSource = new InputSource(xmlURI); filter.parse(inputSource); } Be sure you set the parent of the NamespaceFilter instance to be the XMLWriter, not the XMLReader. Otherwise, no output will actually occur. Once you've got these changes compiled in, run the example. You should get a snapshot.xml file created in the directory you're running the example from; an excerpt from that document is shown here: Java & XML, 2nd Edition 86 <?xml version="1.0" standalone="yes"?> <book xmlns="http://www.oreilly.com/catalog/javaxml2"> <title ora:series="Java" xmlns:ora="http://www.oreilly.com">Java and XML</title> <contents> <chapter title="Introduction" number="1"> <topic name="XML Matters"></topic> <topic name="What's Important"></topic> <topic name="The Essentials"></topic> <topic name="What's Next?"></topic> </chapter> <chapter title="Nuts and Bolts" number="2"> <topic name="The Basics"></topic> <topic name="Constraints"></topic> <topic name="Transformations"></topic> <topic name="And More "></topic> <topic name="What's Next?"></topic> </chapter> <! Other content > </contents> </book> Notice that the namespace, as changed by NamespaceFilter, is modified here. Snapshots like this, created by XMLWriter instances, can be great tools for debugging and logging of SAX events. Both XMLWriter and DataWriter offer a lot more in terms of methods to output XML, both in full and in part, and you should check out the Javadoc included with the downloaded package. I do not encourage you to use these classes for general output. In my experience, they are most useful in the case demonstrated here. 4.4 Even More Handlers Now I want to show you two more handler classes that SAX offers. Both of these interfaces are no longer part of the core SAX distribution, and are located in the org.xml.sax.ext package to indicate they are extensions to SAX. However, most parsers (such as Apache Xerces) include these two classes for use. Check your vendor documentation, and if you don't have these classes, you can download them from the SAX web site. I warn you that not all SAX drivers support these extensions, so if your vendor doesn't include them, you may want to find out why, and see if an upcoming version of the vendor's software will support the SAX extensions. 4.4.1 LexicalHandler The first of these two handlers is the most useful: org.xml.sax.ext.LexicalHandler . This handler provides methods that can receive notification of several lexical events such as comments, entity declarations, DTD declarations, and CDATA sections. In ContentHandler, these lexical events are essentially ignored, and you just get the data and declarations without notification of when or how they were provided. Java & XML, 2nd Edition 87 This is not really a general-use handler, as most applications don't need to know if text was in a CDATA section or not. However, if you are working with an XML editor, serializer, or other component that must know the exact format of the input document, not just its contents, the LexicalHandler can really help you out. To see this guy in action, you first need to add an import statement for org.xml.sax.ext.LexicalHandler to your SAXTreeViewer.java source file. Once that's done, you can add LexicalHandler to the implements clause in the nonpublic class JTreeContentHandler in that source file: class JTreeContentHandler implements ContentHandler, LexicalHandler { // Callback implementations } By reusing the content handler already in this class, our lexical callbacks can operate upon the JTree for visual display of these lexical callbacks. So now you need to add implementations for all the methods defined in LexicalHandler. Those methods are as follows: public void startDTD(String name, String publicID, String systemID) throws SAXException; public void endDTD( ) throws SAXException; public void startEntity(String name) throws SAXException; public void endEntity(String name) throws SAXException; public void startCDATA( ) throws SAXException; public void endCDATA( ) throws SAXException; public void comment(char[] ch, int start, int length) throws SAXException; To get started, let's look at the first lexical event that might happen in processing an XML document: the start and end of a DTD reference or declaration. That triggers the startDTD( ) and endDTD( ) callbacks, shown here: public void startDTD(String name, String publicID, String systemID) throws SAXException { DefaultMutableTreeNode dtdReference = new DefaultMutableTreeNode("DTD for '" + name + "'"); if (publicID != null) { DefaultMutableTreeNode publicIDNode = new DefaultMutableTreeNode("Public ID: '" + publicID + "'"); dtdReference.add(publicIDNode); } if (systemID != null) { DefaultMutableTreeNode systemIDNode = new DefaultMutableTreeNode("System ID: '" + systemID + "'"); dtdReference.add(systemIDNode); } current.add(dtdReference); } public void endDTD( ) throws SAXException { // No action needed here } Java & XML, 2nd Edition 88 This adds a visual cue when a DTD is encountered, and a system ID and public ID if present. Continuing on, there are a pair of similar methods for entity references, startEntity( ) and endEntity( ). These are triggered before and after (respectively) processing entity references. You can add a visual cue for this event as well, using the code shown here: public void startEntity(String name) throws SAXException { DefaultMutableTreeNode entity = new DefaultMutableTreeNode("Entity: '" + name + "'"); current.add(entity); current = entity; } public void endEntity(String name) throws SAXException { // Walk back up the tree current = (DefaultMutableTreeNode)current.getParent( ); } This ensures that the content of, for example, the OReillyCopyright entity reference is included within an "Entity" tree node. Simple enough. Because the next lexical event is a CDATA section, and there aren't any currently in the contents.xml document, you may want to make the following change to that document (the CDATA allows the ampersand in the title element's content): <?xml version="1.0"?> <!DOCTYPE book SYSTEM "DTD/JavaXML.dtd"> <! Java and XML Contents > <book xmlns="http://www.oreilly.com/javaxml2" xmlns:ora="http://www.oreilly.com" > <title ora:series="Java"><![CDATA[Java & XML]]></title> <! Other content > </book> With this change, you are ready to add code for the CDATA callbacks. Add in the following methods to the JTreeContentHandler class: public void startCDATA( ) throws SAXException { DefaultMutableTreeNode cdata = new DefaultMutableTreeNode("CDATA Section"); current.add(cdata); current = cdata; } public void endCDATA( ) throws SAXException { // Walk back up the tree current = (DefaultMutableTreeNode)current.getParent( ); } This is old hat by now; the title element's content now appears as the child of a CDATA node. And with that, only one method is left, that which receives comment notification: Java & XML, 2nd Edition 89 public void comment(char[] ch, int start, int length) throws SAXException { String comment = new String(ch, start, length); DefaultMutableTreeNode commentNode = new DefaultMutableTreeNode("Comment: '" + comment + "'"); current.add(commentNode); } This method behaves just like the characters( ) and ignorableWhitespace( ) methods. Keep in mind that only the text of the comment is reported to this method, not the surrounding <!— and —> delimiters. With these changes in place, you can compile the example program and run it. You should get output similar to that shown in Figure 4-3. Figure 4-3. Output with LexicalHandler implementation in place You'll notice one oddity, though: an entity named [dtd]. This occurs anytime a DOCTYPE declaration is in place, and can be removed (you probably don't want it present) with a simple clause in the startEntity( ) and endEntity( ) methods: public void startEntity(String name) throws SAXException { if (!name.equals("[dtd]")) { DefaultMutableTreeNode entity = new DefaultMutableTreeNode("Entity: '" + name + "'"); current.add(entity); current = entity; } } public void endEntity(String name) throws SAXException { if (!name.equals("[dtd]")) { // Walk back up the tree current = (DefaultMutableTreeNode)current.getParent( ); } } Java & XML, 2nd Edition 90 This clause removes the offending entity. That's really about all that there is to say about LexicalHandler. Although I've filed it under advanced SAX, it's pretty straightforward. 4.4.2 DeclHandler The last handler to deal with is the DeclHandler . This interface defines methods that receive notification of specific events within a DTD, such as element and attribute declarations. This is another item only good for very specific cases; again, XML editors and components that must know the exact lexical structure of documents and their DTDs come to mind. I'm not going to show you an example of using the DeclHandler; at this point you know more than you'll probably ever need to about handling callback methods. Instead, I'll just give you a look at the interface, shown in Example 4-6. Example 4-6. The DeclHandler interface package org.xml.sax.ext; import org.xml.sax.SAXException; public interface DeclHandler { public void attributeDecl(String eltName, String attName, String type, String defaultValue, String value) throws SAXException; public void elementDecl(String name, String model) throws SAXException; public void externalEntityDecl(String name, String publicID, String systemID) throws SAXException; public void internalEntityDecl(String name, String value) throws SAXException; } This example is fairly self-explanatory. The first two methods handle the <!ELEMENT> and <!ATTLIST> constructs. The third, externalEntityDecl( ), reports entity declarations (through <!ENTITY>) that refer to external resources. The final method, internalEntityDecl( ), reports entities defined inline. That's all there is to it. And with that, I've given you everything that there is to know about SAX. Well, that's probably an exaggeration, but you certainly have plenty of tools to start you on your way. Now you just need to get coding to build up your own set of tools and tricks. Before closing the book on SAX, though, I want to cover a few common mistakes in dealing with SAX. 4.5 Gotcha! As you get into the more advanced features of SAX, you certainly don't reduce the number of problems you can get yourself into. However, these problems often become more subtle, which makes for some tricky bugs to track down. I'll point out a few of these common problems. [...]... xmlns="http://www.oreilly.com/javaxml2" xmlns:ora="http://www.oreilly.com"> Java and XML 107 Java & XML, 2nd Edition ... import import java. io.File; java. io.IOException; java. io.PrintWriter; javax.servlet.ServletException; javax.servlet.http.HttpServlet; javax.servlet.http.HttpServletRequest; javax.servlet.http.HttpServletResponse; 112 Java & XML, 2nd Edition // DOM import import import import import imports org.w3c.dom.Attr; org.w3c.dom.Document; org.w3c.dom.DOMImplementation; org.w3c.dom.Element; org.w3c.dom.Text; //... the indentation to use, the line separator, and methods to modify those properties Example 5-2 The DOMSerializer skeleton package javaxml2; import import import import import import import import import import import java. io.File; java. io.FileWriter; java. io.IOException; java. io.OutputStream; java. io.OutputStreamWriter; java. io.Writer; org.w3c.dom.Document; org.w3c.dom.DocumentType; org.w3c.dom.NamedNodeMap;... UpdateItemServlet class You'll also want to add this to the classes in your servlet engine's context In my setup, using Tomcat, my context is called javaxml2, in a directory named javaxml2 under the webapps directory In my WEB-INF/classes directory, there is a javaxml2 directory (for the package), and then the DOMSerializer.class and UpdateItemServlet.class files are within that directory You should also ensure... interfaces 5.1 .3 Why Not SAX? As a final conceptual note before getting into the code, newbies to XML may be wondering why they can't just use SAX for dealing with XML But sometimes using SAX is like taking a hammer to a scratch on a wall; it's just not the right tool for the job I discuss a few issues with SAX that make it less than ideal in certain situations 95 Java & XML, 2nd Edition 5.1 .3. 1 SAX is... created output .xml file and check it over for accuracy It should contain all the information in the original XML document, with only the differences already discussed in previous sections A portion of my output .xml is shown in Example 5 -3 Example 5 -3 A portion of the output .xml serialized DOM tree < ?xml version="1.0"?> Java and XML Contents > java javaxml2.SerializerTest c:\javaxml2\ch05 \xml\ contents .xml output .xml While you... org.w3c.dom.DocumentType; org.w3c.dom.NamedNodeMap; org.w3c.dom.Node; org.w3c.dom.NodeList; public class DOMSerializer { /** Indentation to use */ private String indent; /** Line separator to use */ private String lineSeparator; public DOMSerializer( ) { indent = ""; lineSeparator = "\n"; } public void setLineSeparator(String lineSeparator) { this.lineSeparator = lineSeparator; } public void serialize(Document doc, . <topic name=" ;XML Matters"></topic> <topic name="What's Important"></topic> <topic name="The Essentials"></topic> <topic. Basics"></topic> <topic name="Constraints"></topic> <topic name="Transformations"></topic> <topic name="And More "></topic> <topic. name="What's Next?"></topic> </chapter> <chapter title="Nuts and Bolts" number="2"> <topic name="The Basics"></topic>

Ngày đăng: 12/08/2014, 19:21

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN