• • • • • • Table of Contents Index Reviews Reader Reviews Errata Academic XML in a Nutshell, 3rd Edition By Elliotte Rusty Harold, W Scott Means Publisher : O'Reilly Pub Date : September 2004 ISBN : 0-596-00764-7 Pages : 712 There's a lot to know about XML, and it s constantly evolving But you don't need to commit every syntax, API, or XSLT transformation to memory; you only need to know where to find it And if it's a detail that has to do with XML or its companion standards, you'll find it clear, concise, useful, and well-organized in the updated third edition of XML in a Nutshell • • • • • • Table of Contents Index Reviews Reader Reviews Errata Academic XML in a Nutshell, 3rd Edition By Elliotte Rusty Harold, W Scott Means Publisher : O'Reilly Pub Date : September 2004 ISBN : 0-596-00764-7 Pages : 712 Copyright Preface What This Book Covers What's New in the Third Edition Organization of the Book Request for Comments Conventions Used in This Book Acknowledgments Part I: XML Concepts Chapter 1 Introducing XML Section 1.1 The Benefits of XML Section 1.2 What XML Is Not Section 1.4 How XML Works Section 1.3 Portable Data Section 1.5 The Evolution of XML Chapter 2 XML Fundamentals Section 2.1 XML Documents and XML Files Section 2.2 Elements, Tags, and Character Data Section 2.4 XML Names Section 2.6 CDATA Sections Section 2.8 Processing Instructions Section 2.10 Checking Documents for Well-Formedness Section 2.3 Attributes Section 2.5 References Section 2.7 Comments Section 2.9 The XML Declaration Chapter 3 Document Type Definitions (DTDs) Section 3.1 Validation Section 3.3 Attribute Declarations Section 3.5 External Parsed General Entities Section 3.7 Parameter Entities Section 3.9 Two DTD Examples Section 3.2 Element Declarations Section 3.4 General Entity Declarations Section 3.6 External Unparsed Entities and Notations Section 3.8 Conditional Inclusion Section 3.10 Locating Standard DTDs Chapter 4 Namespaces Section 4.1 The Need for Namespaces Section 4.2 Namespace Syntax Section 4.4 Namespaces and DTDs Section 4.3 How Parsers Handle Namespaces Chapter 5 Internationalization Section 5.1 Character-Set Metadata Section 5.3 Text Declarations Section 5.5 Unicode Section 5.7 Platform-Dependent Character Sets Section 5.9 The Default Character Set for XML Documents Section 5.2 The Encoding Declaration Section 5.4 XML-Defined Character Sets Section 5.6 ISO Character Sets Section 5.8 Converting Between Character Sets Section 5.10 Character References Section 5.11 xml:lang Part II: Narrative-Like Documents Chapter 6 XML as a Document Format Section 6.1 SGML's Legacy Section 6.2 Narrative Document Structures Section 6.4 DocBook Section 6.6 WordprocessingML Section 6.8 Transformation and Presentation Section 6.3 TEI Section 6.5 OpenOffice Section 6.7 Document Permanence Chapter 7 XML on the Web Section 7.1 XHTML Section 7.3 Authoring Compound Documents with Modular XHTML Section 7.2 Direct Display of XML in Browsers Section 7.4 Prospects for Improved Web Search Methods Chapter 8 XSL Transformations (XSLT) Section 8.1 An Example Input Document Section 8.2 xsl:stylesheet and xsl:transform Section 8.4 Templates and Template Rules Section 8.6 Applying Templates with xsl:apply-templates Section 8.8 Modes Section 8.9 Attribute Value Templates Section 8.11 Other XSLT Elements Section 8.3 Stylesheet Processors Section 8.5 Calculating the Value of an Element with xsl:value-of Section 8.7 The Built-in Template Rules Section 8.10 XSLT and Namespaces Chapter 9 XPath Section 9.1 The Tree Structure of an XML Document Section 9.3 Compound Location Paths Section 9.5 Unabbreviated Location Paths Section 9.7 XPath Functions Section 9.2 Location Paths Section 9.4 Predicates Section 9.6 General XPath Expressions Chapter 10 XLinks Section 10.1 Simple Links Section 10.2 Link Behavior Section 10.3 Link Semantics Section 10.4 Extended Links Section 10.6 DTDs for XLinks Section 10.5 Linkbases Section 10.7 Base URIs Chapter 11 XPointers Section 11.1 XPointers on URLs Section 11.2 XPointers in Links Section 11.4 Child Sequences Section 11.6 Points Chapter 12 XInclude Section 11.3 Shorthand Pointers Section 11.5 Namespaces Section 11.7 Ranges Section 12.1 The include Element Section 12.2 Including Text Files Section 12.4 Fallbacks Section 12.3 Content Negotiation Section 12.5 XPointers Chapter 13 Cascading Style Sheets (CSS) Section 13.1 The Levels of CSS Section 13.2 CSS Syntax Section 13.4 Selectors Section 13.6 Pixels, Points, Picas, and Other Units of Length Section 13.8 Text Properties Section 13.3 Associating Stylesheets with XML Documents Section 13.5 The Display Property Section 13.7 Font Properties Section 13.9 Colors Chapter 14 XSL Formatting Objects (XSL-FO) Section 14.1 XSL Formatting Objects Section 14.2 The Structure of an XSL-FO Document Section 14.4 XSL-FO Properties Section 14.3 Laying Out the Master Pages Section 14.5 Choosing Between CSS and XSL-FO Chapter 15 Resource Directory Description Language (RDDL) Section 15.1 What's at the End of a Namespace URL? Section 15.2 RDDL Syntax Section 15.3 Natures Section 15.4 Purposes Part III: Record-Like Documents Chapter 16 XML as a Data Format Section 16.1 Why Use XML for Data? Section 16.2 Developing Record-Like XML Formats Section 16.3 Sharing Your XML Format Chapter 17 XML Schemas Section 17.1 Overview Section 17.2 Schema Basics Section 17.4 Complex Types Section 17.6 Simple Content Section 17.8 Allowing Any Content Section 17.3 Working with Namespaces Section 17.5 Empty Elements Section 17.7 Mixed Content Section 17.9 Controlling Type Derivation Chapter 18 Programming Models Section 18.1 Common XML Processing Models Section 18.2 Common XML Processing Issues Section 18.3 Generating XML Documents Chapter 19 Document Object Model (DOM) Section 19.2 Structure of the DOM Core Section 19.4 Specific Node-Type Interfaces Section 19.6 DOM Level 3 Interfaces Section 19.8 A Simple DOM Application Section 19.1 DOM Foundations Section 19.3 Node and Other Generic Interfaces Section 19.5 The DOMImplementation Interface Section 19.7 Parsing a Document with DOM Chapter 20 Simple API for XML (SAX) Section 20.1 The ContentHandler Interface Section 20.2 Features and Properties Section 20.3 Filters Part IV: Reference Chapter 21 XML Reference Section 21.1 How to Use This Reference Section 21.2 Annotated Sample Documents Section 21.3 XML Syntax Section 21.4 Constraints Section 21.5 XML 1.0 Document Grammar Section 21.6 XML 1.1 Document Grammar Chapter 22 Schemas Reference Section 22.1 The Schema Namespaces Section 22.3 Built-in Types Chapter 23 XPath Reference Section 23.2 Data Types Section 23.4 Predicates Chapter 24 XSLT Reference Section 22.2 Schema Elements Section 22.4 Instance Document Attributes Section 23.1 The XPath Data Model Section 23.3 Location Paths Section 23.5 XPath Functions Section 24.1 The XSLT Namespace Section 24.2 XSLT Elements Section 24.4 TrAX Section 24.3 XSLT Functions Chapter 25 DOM Reference Section 25.1 Object Hierarchy Section 25.2 Object Reference Chapter 26 SAX Reference Section 26.1 The org.xml.sax Package Section 26.2 The org.xml.sax.helpers Package Section 26.4 The org.xml.sax.ext Package Section 26.3 SAX Features and Properties Chapter 27 Character Sets Section 27.1 Character Tables Section 27.2 HTML4 Entity Sets Section 27.3 Other Unicode Blocks Colophon Index Copyright © 2004, 2002, 2001 O'Reilly Media, Inc All rights reserved Printed in the United States of America Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O'Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://safari.oreilly.com) For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly Media, Inc The In a Nutshell series designations, XML in a Nutshell, the image of a peafowl, and related trade dress are trademarks of O'Reilly Media, Inc Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and O'Reilly Media, Inc was aware of a trademark claim, the designations have been printed in caps or initial caps While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein Preface In the last few years, XML has been adopted in fields as diverse as law, aeronautics, finance, insurance, robotics, multimedia, hospitality, travel, art, construction, telecommunications, software, agriculture, physics, journalism, theology, retail, and comics XML has become the syntax of choice for newly designed document formats across almost all computer applications It's used on Linux, Windows, Macintosh, and many other computer platforms Mainframes on Wall Street trade stocks with one another by exchanging XML documents Children playing games on their home PCs save their documents in XML Sports fans receive real-time game scores on their cell phones in XML XML is simply the most robust, reliable, and flexible document syntax ever invented XML in a Nutshell is a comprehensive guide to the rapidly growing world of XML It covers all aspects of XML, from the most basic syntax rules, to the details of DTD and schema creation, to the APIs you can use to read and write XML documents in a variety of programming languages What This Book Covers There are thousands of formally established XML applications from the W3C and other standards bodies, such as OASIS and the Object Management Group There are even more informal, unstandardized applications from individuals and corporations, such as Microsoft's Channel Definition Format and John Guajardo's Mind Reading Markup Language This book cannot cover them all, any more than a book on Java could discuss every program that has ever been or might ever be written in Java This book focuses primarily on XML itself It covers the fundamental rules that all XML documents and authors must adhere to, from a web designer who uses SMIL to add animations to web pages to a C++ programmer who uses SOAP to exchange serialized objects with a remote database This book also covers generic supporting technologies that have been layered on top of XML and are used across a wide range of XML applications These technologies include: XLink An attribute-based syntax for hyperlinks between XML and non-XML documents that provide the simple, onedirectional links familiar from HTML, multidirectional links between many documents, and links between documents to which you don't have write access XSLT An XML application that describes transformations from one document to another in either the same or different XML Module-based XHTML (W3C specification) XHTML Basic and RDDL xi:fallback element xi:include element xi\:fallback element XInclude 2nd 3rd alternate content for missing documents content negotiation include element text files, including xlink:arcrole attribute rddl\:resource element purpose names xlink:href attribute for locators rddl:resource element xlink:label attribute xlink:role attribute locator elements rddl\:resource element natures in resource elements xlink:show attribute xlink:title attribute 2nd arc elements locator elements resource elements xlink:to attribute (arc elements) xlink:type attribute arc 2nd extended locator possible values of resource 2nd simple xlink\:actuate attribute xlink\:from attribute (arc elements) xlink\:label attribute local resources and xlink\:role attribute xlink\:type attribute title XLinks 2nd 3rd 4th definition of DTDs for 2nd embedding non-XML content in XML documents extended links arcs local resources locator elements link behavior xlink\:actuate attribute xlink\:show attribute link semantics linkbases simple links web browsers' support of XPointers, use in XML benefits of case-sensitivity in character sets comments communications protocols and as data format [See data format, XML as] data format default character set for documents direct display in browsers alternative approaches Internet Explorer xml-stylesheet processing instruction documents [See documents, XML] elements enterprise applications and evolution of files how it works inability to use IDs in documents without DTDs invalid documents language reference name tokens names namespaces parsers [See parsers] processing instructions schema languages 2nd SGML and storing in a database trees valid documents Version 1.1 2nd 3rd IRI, use for namespaces namespaces support by DOM Level 3 Versions 1.0 and 1.1, W3C recommendations for what it isn't XML applications Robin Cover's list of XML Canonicalization XML declaration web browser problems with XML Encryption XML Information Set XML Schema Language (W3C) reference XML Signature XML Validation Form (Brown University) XML-RPC xml-stylesheet processing instructions 2nd 3rd [See also processing instructions] pseudo-attributes in XSLT stylesheets and xml:base attributes xml:lang attribute ATTLIST declarations of language codes subcodes for regions xml:space attribute xml\:lang attribute XMLCounter class (SAX, example class) xmlEncoding attribute (DOM) XMLFilter interface (SAX) 2nd XMLFilterImpl class (SAX) 2nd UpperCaseFilter class (example) xmllint valid flag xinclude option xmlns attribute 2nd SAX core feature setting default namespaces with 2nd xmlns( ) scheme, XInclude processors and xmlns:xsl attribute xmlns\:prefix attribute XMLReader class (Microsoft NET) XMLReader interface (SAX) 2nd counting elements/attributes in a document filters and getFeature( ) methods called in ContentHandler setFeature( ) validating parsers XMLReaderAdapter class (SAX) XMLReaderFactory class (SAX) 2nd XMLSchema-instance namespace xmlVersion attribute (DOM) XMPP (Extensible Messaging and Presence Protocol) XOM's nu.xom.xinclude package XPath 2nd 3rd arithmetic operators in calculating string value of an expression data model 2nd data types node sets numbers strings expressions 2nd 3rd [See also location paths] Booleans data types for numbers strings functions 2nd Boolean node-set numeric string location paths 2nd abbreviated syntax axes child element location steps compound node tests in predicates in root predicates in predicates in location steps relational operators unabbreviated location paths XPointer extensions to XPath module, DOM xpointer attribute (xi:include) xpointer( ) scheme, XInclude processors and XPointers 2nd 3rd child sequences escaping characters not allowed in URIs href attributes in URLs in links namespaces and points in ranges in here( ) function origin( ) function range( ) function range-inside( ) function range-to( ) function relative XPointers string-range( ) function shorthand pointers syntax of on URLs xs:all element 2nd xs:annotation element xs:any element 2nd xs:anyAttribute element 2nd xs:anySimpleType type xs:anyURI type xs:appinfo element 2nd xs:attribute element 2nd xs:attributeGroup element 2nd 3rd xs:base64Binary type xs:boolean type xs:byte type xs:choice element xs:complexContent element 2nd xs:complexType element 2nd 3rd mixed attribute xs:date type xs:dateTime type xs:decimal type xs:documentation element 2nd xs:double type xs:duration type xs:element element 2nd xs:ENTITIES type xs:ENTITY type xs:enumeration facet element xs:extension element 2nd deriving new type from xs:field element xs:float type xs:fractionDigits facet element xs:gDay type xs:gMonth type xs:gMonthDay type xs:group element 2nd xs:gYear type xs:gYearMonth type xs:hexBinary type xs:ID type xs:IDREF type xs:IDREFS type xs:import element 2nd xs:include element 2nd xs:int type xs:integer type xs:key element 2nd xs:keyref element 2nd xs:language type xs:long type xs:maxExclusive facet element xs:maxInclusive facet element xs:maxLength facet element xs:minExclusive facet element xs:minInclusive facet element xs:minLength facet element xs:Name type xs:NCName type xs:negativeInteger type xs:NMTOKEN type xs:NMTOKENS type xs:nonNegativeInteger type 2nd xs:nonPositiveInteger type xs:normalizedString type xs:notation element xs:NOTATION type xs:pattern facet element xs:positiveInteger type xs:QName type xs:redefine element 2nd xs:restriction element 2nd 3rd xs:schema element 2nd 3rd xs:selector element xs:sequence element 2nd xs:short type xs:simpleContent element 2nd xs:simpleType element 2nd pattern facet, using xs:string type xs:time type xs:token type xs:totalDigits facet element xs:union element 2nd xs:unique element 2nd xs:unsignedByte type xs:unsignedInt type xs:unsignedLong type xs:unsignedShort type xs:whiteSpace facet element xs\:appinfo element xs\:choice element xs\:schema element targetNamespace attribute xs\:simpleContent element xsi prefix xsi:nil attribute xsi:noNamespaceSchemaLocation attribute 2nd xsi:schemaLocation attribute 2nd xsi:type attribute 2nd XSL (Extensible Stylesheet Language) 2nd XSLT and XSL-FO XSL Formatting Objects Composer (IBM) xsl prefix 2nd XSL-FO (XSL Formatting Objects) 2nd 3rd applied to XML document (example) boxes in choosing between CSS and CSS vs generating laying out master pages flowing content into pages generating finished document programs for working with properties XSLT to XSL-FO transform structure of documents xsl:apply-imports element xsl:apply-templates element 2nd mode attribute xsl:attribute element xsl:attribute-set element xsl:call-template element xsl:choose element xsl:comment element xsl:copy element xsl:copy-of element xsl:decimal-format element 2nd xsl:element element xsl:fallback element xsl:for-each element xsl:if element xsl:import element xsl:include element xsl:key element xsl:message element xsl:namespace-alias element xsl:number element xsl:otherwise element xsl:output element xsl:param element xsl:preserve-space element xsl:processing-instruction element xsl:sort element xsl:strip-space element xsl:stylesheet element xsl:template element 2nd mode attribute xsl:text element xsl:transform element xsl:value-of element 2nd 3rd xsl:variable element xsl:when element xsl:with-param element XSLT 2nd 3rd 4th applying templates with xsl:apply-templates attribute value templates calculating element value with xsl:value-of elements flowing content into pages format-number( ) function functions input document, example of Internet Explorer and modes, applying different templates with namespaces 2nd other features RDDL nature URL for rddl\:resource element pointing to stylesheet stylesheet processors 2nd built-in template rules problems with incorrect namespace URIs stylesheet using unabbreviated XPath syntax template rules, built-in for comment and processing instruction nodes for element and root nodes for namespace nodes for text and attribute nodes templates and template rules transforming documents into XSL-FO transforming XML documents 2nd TrAX (Transformations API for XML) type pseudo-atttribute, specifying with Version 1.0 xsl:decimal-format element XSLT stylesheets XSL-FO and [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] Yiddish language, Unicode block for [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] zero-width nonbreaking space, UCS-2 ... obscures the reality, XML is not a database You're not going to replace an Oracle or MySQL server with XML A database can contain XML data, either as a VARCHAR or a BLOB or as some custom XML data type, but the database itself is not an XML. .. These tag sets are called XML applications An XML application is not a software application that uses XML, such as Mozilla or Microsoft Word Rather, it's an application of XML in a particular domain, such as vector... custom XML data type, but the database itself is not an XML document You can store XML data in a database on a server or retrieve data from a database in an XML format, but to do this, you need to be running software written in a real programming