Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com PART V Programming CHAPTER 11: Event-Driven Programming CHAPTER 12: LINQ to XML c11.indd 401 05/06/12 5:39 PM Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com c11.indd 402 05/06/12 5:39 PM Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 11 Event-Driven Programming WHAT YOU WILL LEARN IN THIS CHAPTER: ➤ Necessity of XML data access methods: SAX and NET’s XMLReader ➤ Why SAX and XMLReader are considered event-driven methods ➤ How to use SAX and XMLReader ➤ The right time to choose one of these methods to process your XML There are many ways to extract information from an XML document You’ve already seen how to use the document object model and XPath; both of these methods can be used to fi nd any relevant item of data Additionally, in Chapter 12 you’ll meet LINQ to XML, Microsoft’s latest attempt to incorporate XML data retrieval in its universal data access strategy Given the wide variety of methods already available, you may be wondering why you need more, and why in particular you need event-driven methods? The main answer is because of memory limitations Other XML processing methods require that the whole XML document be loaded into memory (that is, RAM) before any processing can take place Because XML documents typically use up to four times more RAM than the size of the file containing the document, some documents can take up more RAM than is available on a computer; it is therefore necessary to fi nd an alternative method to extract data This is where event-driven paradigms come into play Instead of loading the complete fi le into memory, the file is processed in sequence There are two ways to this: SAX and NET’s XMLReader Both are covered in this chapter c11.indd 403 05/06/12 5:39 PM Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 404 ❘ CHAPTER 11 EVENT-DRIVEN PROGRAMMING UNDERSTANDING SEQUENTIAL PROCESSING There are two main ways of processing a fi le sequentially The fi rst relies on events being fi red whenever specific items are found; whether you respond to these events is up to you For example, say an event is fi red when the opening tag of the root element is encountered, and the name of this element is passed to the event handler Any time any textual content is found after this, another event is fi red In this scenario there would also be events that capture the closing of any elements with the fi nal event being fi red when the closing tag of the root element is encountered The second method is slightly different in that you tell the processor what sort of content you are interested in For example, you may want to read an attribute on the first child under the root element To so, you instruct the XML reader to move to the root element and then to its fi rst child You would then begin to read the attributes until you get to the one you need Both of these methods are similar conceptually, and both cope admirably with the problem of larger memory usage posed by using the DOM that requires the whole XML document to be loaded into memory before being processed Processing files in a sequential fashion includes one or two downsides, however The first is that you can’t revisit content If you read an element and then move on to one of its siblings or children, you can’t then go back and examine one of its attributes without starting from the beginning again You need to plan carefully what information you’ll need The second problem is validation Imagine you receive the document shown here: Here is some data. Here is some more data. This document is well-formed, but what if its schema states that after all elements there should be a element? The processor will report the elements and text content that it encounters, but won’t complain that the document is not valid until it reaches the relevant point You may not care about the extra element, in which case you can just extract whatever you need, but if you want to validate before processing begins, this usually involves reading the document twice This is the price you pay for not needing to load the full document into memory In the following sections you’ll examine the two methods in more detail The pure event-driven method is called SAX and is commonly used with Java, although it can be used from any language that supports events The second is specific to NET and uses the System.Xml.XmlReader class USING SAX IN SEQUENTIAL PROCESSING SAX stands for the Simple API for XML, and arose out of discussions on the XML-DEV list in the late 1990s c11.indd 404 05/06/12 5:39 PM Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Using SAX in Sequential Processing ❘ 405 NOTE The archives for the XML-DEV list are available at http://lists.xml org/archives/xml-dev/ The list is still very active and any XML-related problems are usually responded to within hours, if not minutes Back then people were having problems because different parsers were incompatible David Megginson took on the job of coordinating the process of specifying a new API with the group On May 11, 1998, the SAX 1.0 specification was completed A whole series of SAX 1.0–compliant parsers then began to emerge, both from large corporations, such as IBM and Sun, and from enterprising individuals, such as James Clark All of these parsers were freely available for public download Eventually, a number of shortcomings in the specification became apparent, and David Megginson and his colleagues got back to work, fi nally producing the SAX 2.0 specification on May 5, 2000 The improvements centered on added support for namespaces and tighter adherence to the XML specification Several other enhancements were made to expose additional information in the XML document, but the core of SAX was very stable On April 27, 2004, these changes were fi nalized and released as version 2.0.2 SAX is specified as a set of Java interfaces, which initially meant that if you were going to any serious work with it, you were looking at doing some Java programming using Java Development Kit (JDK) 1.1 or later Now, however, a wide variety of languages have their own version of SAX, some of which you learn about later in the chapter In deference to the SAX tradition, however, the examples in this chapter are written in Java All the latest information about SAX is at www.saxproject.org It remains a public domain, open source project hosted by SourceForge To download SAX, go to the homepage and browse for the latest version, or go directly to the SourceForge project page at http://sourceforge net/projects/sax This is one of the extraordinary things about SAX — it isn’t owned by anyone It doesn’t belong to any consortium, standards body, company, or individual In other words, it doesn’t survive because some organization or government says that you must use it to comply with their standards, or because a specific company supporting it is dominant in the marketplace It survives because it’s simple and it works Preparing to Run the Examples The SAX specification does not limit which XML parser you use with your document It simply sits on top of it and reports what content it fi nds A number of different parsers are available out in the wild, but these examples use the one that comes with the JDK If you don’t have the JDK already installed, perform the following steps to so: Go to http://www.oracle.com/technetwork/java/javase/downloads/index html Download the latest version under the SE section These examples use 1.6 but 1.7 is the latest available version and will work just as well c11.indd 405 05/06/12 5:39 PM Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com ... System .Xml. XmlReader class USING SAX IN SEQUENTIAL PROCESSING SAX stands for the Simple API for XML, and arose out of discussions on the XML- DEV list in the late 1990s c11.indd 404 05/06/ 12 5:39... expose additional information in the XML document, but the core of SAX was very stable On April 27 , 20 04, these changes were fi nalized and released as version 2. 0 .2 SAX is specified as a set of Java...Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com c11.indd 4 02 05/06/ 12 5:39 PM Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com