Addison wesley effective XML 50 specific ways to improve your XML oct 2003 ISBN 0321150406

180 70 0
Addison wesley effective XML 50 specific ways to improve your XML oct 2003 ISBN 0321150406

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

[ Team LiB ] • Table of C ontents Effective XML: 50 Specific Ways to Improve Your XML By Elliotte Rusty Harold Publisher: Addison W esley Pub Date: Septem ber 22, 2003 ISBN: 0-321-15040-6 Pages: 336 "This is an ex cellent collection of XML best practices: essential reading for any developer using XML This book will help you avoid com m on pitfalls and ensure your XML applications rem ain practical and interoperable for as long as possible." Edd Dum bill, Managing Editor, XML.com and Program Chair, XML Europe "A collection of useful advice about XML and related technologies W ell worth reading before, during, and after XML application developm ent." Sean McGrath, CTO , Propylon If you want to becom e a m ore effective XML developer, you need this book You will learn which tools to use when in order to write legible, ex tensible, m aintainable and robust XML code How you write DTDs that are independent of nam espace prefix es? W hat parsers reliably report and what don't they? W hich schem a language is the right one for your job? W hich API should you choose for m ax im um speed and m inim um size? W hat can you to ensure fast, reliable access to DTDs and schem as without m ak ing your docum ent less portable? Is XML too verbose for your application? Elliotte Rusty Harold provides you with 50 practical rules of thum b based on real-world ex am ples and best practices His engaging writing style is easy to understand and illustrates how you can save developm ent tim e while im proving your XML code Learn to write XML that is easy to edit, sim ple to process, and is fully interoperable with other applications and code Understand how to design and docum ent XML vocabularies so they are both descriptive and ex tensible After reading this book , you'll be ready to choose the best tools and APIs for both large-scale and sm all-scale processing jobs Elliotte provides you with essential inform ation on building services such as verification, com pression, authentication, caching, and content m anagem ent If you want to design, deploy, or build better system s that utilize XML-then buy this book and get going! [ Team LiB ] [ Team LiB ] • Table of C ontents Effective XML: 50 Specific Ways to Improve Your XML By Elliotte Rusty Harold Publisher: Addison W esley Pub Date: Septem ber 22, 2003 ISBN: 0-321-15040-6 Pages: 336 Copyright Praise for Effective XML Effective Software Developm ent Series Titles in the Series Preface Ack nowledgm ents Introduction Elem ent versus Tag Attribute versus Attribute Value Entity versus Entity Reference Entity Reference versus Character Reference Children versus Child Elem ents versus Content Tex t versus Character Data versus Mark up Nam espace versus Nam espace Nam e versus Nam espace URI XML Docum ent versus XML File XML Application versus XML Software W ell-Form ed versus Valid DTD versus DO CTYPE XML Declaration versus Processing Instruction Character Set versus Character Encoding URI versus URI Reference versus IRI Schem as versus the W 3C XML Schem a Language Part 1: Syntax Item Include an XML Declaration The version Info The encoding Declaration The standalone Declaration Item Mark Up with ASCII if Possible Item Stay with XML 1.0 New Characters in XML Nam es C0 Control Characters C1 Control Characters NEL Used as a Line Break Unicode Norm alization Undeclaring Nam espace Prefix es Item Use Standard Entity References Item Com m ent DTDs Liberally The Header Com m ent Declarations Item Nam e Elem ents with Cam el Case Item Param eterize DTDs Param eterizing Attributes Param eterizing Nam espaces Full Param eterization Conditional Sections Item Modularize DTDs Item Distinguish Tex t from Mark up Item 10 W hite Space Matters The xml:space Attribute Ignorable W hite Space Tags and W hite Space W hite Space in Attributes W hite Space in Attributes Schem as Part 2: Structure Item 11 Mak e Structure Ex plicit through Mark up Tag Each Unit of Inform ation Avoid Im plicit Structure W here to Stop? Item 12 Store Metadata in Attributes Item 13 Rem em ber Mix ed Content Item 14 Allow All XML Syntax Item 15 Build on Top of Structures, Not Syntax Em pty-Elem ent Tags CDATA Sections Character and Entity References Item 16 Prefer URLs to Unparsed Entities and Notations Item 17 Use Processing Instructions for Process-Specific Content Style Location O verlapping Mark up Page Form atting O ut-of-Line Mark up Misuse of Processing Instructions Item 18 Include All Inform ation in the Instance Docum ent Item 19 Encode Binary Data Using Q uoted Printable and/or Base64 Q uoted Printable Base64 Item 20 Use Nam espaces for Modularity and Ex tensibility Choosing a Nam espace URI Validation and Nam espaces Item 21 Rely on Nam espace URIs, Not Prefix es Item 22 Don't Use Nam espace Prefix es in Elem ent Content and Attribute Values Item 23 Reuse XHTML for Generic Narrative Content Item 24 Choose the Right Schem a Language for the Job The W 3C XML Schem a Language Docum ent Type Definitions RELAX NG Schem atron Java, C#, Python, and Perl Layering Schem as Item 25 Pretend There's No Such Thing as the PSVI Item 26 Version Docum ents, Schem as, and Stylesheets Item 27 Mark Up According to Meaning Part 3: Sem antics Item 28 Use O nly W hat You Need Item 29 Always Use a Parser Item 30 Layer Functionality Item 31 Program to Standard APIs SAX DO M JDO M Item 32 Choose SAX for Com puter Efficiency Item 33 Choose DO M for Standards Support Item 34 Read the Com plete DTD Item 35 Navigate with XPath Item 36 Serialize XML with XML Item 37 Validate Inside Your Program with Schem as Xerces-J DO M Level Validation Part 4: Im plem entation Item 38 W rite in Unicode Choosing an Encoding A char Is Not a Character Norm alization Form s Sorting Item 39 Param eterize XSLT Stylesheets Item 40 Avoid Vendor Lock -In Item 41 Hang O n to Your Relational Database Item 42 Docum ent Nam espaces with RDDL Natures Purposes Item 43 Preprocess XSLT on the Server Side Servlet-Based Solutions Apache IIS Item 44 Serve XML+CSS to the Client Item 45 Pick the Correct MIME Media Type Item 46 Tidy Up Your HTML MIME Type HTML Tidy O lder Browsers Item 47 Catalog Com m on Resources Catalog Syntax Using Catalog Files Item 48 Verify Docum ents with XML Digital Signatures Digital Signature Syntax Digital Signature Tools Item 49 Hide Confidential Data with XML Encryption Encryption Syntax Encryption Tools Item 50 Com press if Space Is a Problem Recom m ended Reading [ Team LiB ] [ Team LiB ] Copyright Many of the designations used by m anufacturers and sellers to distinguish their products are claim ed as tradem ark s W here those designations appear in this book , and Addison-W esley was aware of a tradem ark claim , the designations have been printed with initial capital letters or in all capitals The author and publisher have tak en care in the preparation of this book , but m ak e no ex pressed or im plied warranty of any k ind and assum e no responsibility for errors or om issions No liability is assum ed for incidental or consequential dam ages in connection with or arising out of the use of the inform ation or program s contained herein The publisher offers discounts on this book when ordered in quantity for bulk purchases and special sales For m ore inform ation, please contact: U.S Corporate and Governm ent Sales (800) 382-3419 corpsales@pearsontechgroup.com For sales outside of the United States, please contact: International Sales (317) 581-3793 international@pearsontechgroup.com Visit Addison-W esley on the W eb: www.awprofessional.com Library of Congress Cataloging-in-Publication Data Harold, Elliotte Rusty Effective XML : 50 specific ways to im prove your XML / Elliotte Rusty Harold p cm Includes bibliographical references and index ISBN 0-321-15040-6 (alk paper) XML (Docum ent m ark up language) I Title Q A76.76.H94H334 2003 005.7'2—dc21 2003056257 © 2004 by Elliotte Rusty Harold All rights reserved No part of this publication m ay be reproduced, stored in a retrieval system , or transm itted, in any form , or by any m eans, electronic, m echanical, photocopying, recording, or otherwise, without the prior consent of the publisher Printed in the United States of Am erica Published sim ultaneously in Canada For inform ation on obtaining perm ission for use of m aterial from this work , please subm it a written request to: Pearson Education, Inc Rights and Contracts Departm ent 75 Arlington Street, Suite 300 Boston, MA 02116 Fax : (617) 848-7047 Tex t printed on recycled paper 10—CRS—0706050403 First printing, Septem ber 2003 [ Team LiB ] [ Team LiB ] Praise for Effective XML "This is an ex cellent collection of XML best practices: essential reading for any developer using XML This book will help you avoid com m on pitfalls and ensure your XML applications rem ain practical and interoperable for as long as possible." —Edd Dumbill, Managing Editor, XML.com and Program Chair, XML Europe "A collection of useful advice about XML and related technologies W ell worth reading both before, during, and after XML application developm ent." —Sean McGrath, CTO, Propylon "A book on m any best practices for XML that we have been eagerly waiting for." —A kmal B Chaudhri, Editor, IBM developerWorks "The fifty easy-to-read [item s] cover m any aspects of XML, ranging from how to use m ark up effectively to what schem a language is best for what task Som etim es controversial, but always relevant, Elliotte Rusty Harold's book provides best practices for work ing with XML that every user and im plem enter of XML should be aware of." —Michael Rys, Ph.D., Program Manager, SQL Server XML Technologies, Microsoft Corporation "Effective XML is an ex cellent book with perfect tim ing Finally, an XML book everyone needs to read! Effective XML is a fount of XML best practices and solid advice W hether you read Effective XML cover to cover or random ly one section at a tim e, its clear writing and insightful recom m endations enlighten, entertain, educate, and ultim ately im prove the effectiveness of even the m ost ex pert XML developer I'll tell you what I tell all m y cowork ers and custom ers: You need this book " —Michael Brundage, Technical Lead, XML Query Processing, Microsoft WebData XML Team "This book provides great insight for all developers who write XML software, regardless of whether the software is a trivial application-specific XML processor or a full-blown W 3C XML Schem a Language validator Mr Harold covers everything from a very im portant high-level term inology discussion to details about parsed XML nodes The well-researched com parisons of currently available XML-related software products, as well as the k ey criteria for selecting between XML technologies, ex em plify the thoroughness of this book " —Cliff Binstock, A uthor, The XML Schema Complete Reference [ Team LiB ] [ Team LiB ] Effective Software Development Series Scott Meyers, Consulting Editor The Effective Software Development Series provides ex pert advice on all aspects of m odern software developm ent Book s in the series are well written, technically sound, of lasting value, and tractable length Each describes the critical things the ex perts alm ost always do—or alm ost always avoid doing—to produce outstanding software Scott Meyers (author of the Effective C++ book s and CD) conceived of the series and acts as its consulting editor Authors in the series work with Meyers and with Addison- W esley Professional's editorial staff to create essential reading for software developers of every stripe [ Team LiB ] [ Team LiB ] Titles in the Series Elliotte Rusty Harold, Effective XML: 50 Specific Ways to Improve Your XML 0321150406 Diom idis Spinellis,Code Reading: The Open Source Perspective 0201799405 For m ore inform ation on book s in this series please see www.awprofessional.com /esds [ Team LiB ] [ Team LiB ] Preface Learning the fundam entals of XML m ight tak e a program m er a week Learning how to use XML effectively m ight tak e a lifetim e W hile m any book s have been written that teach developers how to use the basic syntax of XML, this is the first one that really focuses on how to use XML well This book is not a tutorial It is not going to teach you what a tag is or how to write a DTD I assum e you k now these things Instead it's going to tell you when, why, where, and how to use such tools effectively (and, perhaps equally im portantly, when not to use them ) This book derives directly from m y own ex periences teaching and writing about XML O ver the last five years, I've written several book s and taught num erous courses about XML Increasingly I'm finding that audiences are already fam iliar with the basics of XML They k now what a tag is, how to validate a docum ent against a DTD, and how to transform a docum ent with an XSLT stylesheet The question of what XML is and why to use it has been sufficiently well evangelized The essential syntax and supporting technologies are reasonably well understood However, although m ost developers k now what a CDATA section is, they are not sure what to use one for Although program m ers k now how to add attribute and child nodes to elem ents, they are not certain which one to use when Although program m ers k now what a schem a is, they don't k now which schem a language to choose Since XML has becom e a fundam ental underpinning of new software system s, it becom es im portant to begin ask ing new questions—not just about what XML is but also how to use it effectively W hich techniques work and which don't? Less obviously, which techniques appear to work at first but fail to scale as system s are further developed? W hen I teach program m ing at m y university, one of the first things I tell m y students is that it is not enough to write program s that com pile and produce the ex pected results It is as im portant (indeed m ore im portant) to write code that is ex tensible, legible, and m aintainable XML can be used to produce robust, ex tensible, m aintainable, com prehensible system s; or it can be used to create m asses of unm aintainable, illegible, fragile, closed code In the im m ortal words of Eric Clapton, "It's In The W ay That You Use It." XML is not a program m ing language It is a m ark up language, but it is being successfully used by m any program m ers There have been m ark up languages before, but in the developer com m unity XML is far and away the m ost successful However, the newness and unfam iliarity of m ark up languages have m eant that m any developers are using it less effectively than they could Many program m ers are hack ing together system s that work but are not as robust, ex tensible, or portable as XML prom ises This is to be ex pected Program m ers work ing with XML are pioneers ex ploring new territory, opening up new vistas in software, and accom plishing things that could not easily be accom plished just a few years ago However, m ore than a few XML pioneers have returned from the frontier with arrows in their back s Five years after the initial release of XML into the world, certain patterns and antipatterns for the proper design of XML applications are becom ing apparent All of us in the XML com m unity have m ade m istak es while ex ploring this new territory, the author of this book prom inently am ong them However, we've learned from those m istak es, and we're beginning to develop som e principles that m ay help those who follow in our footsteps to avoid m ak ing the sam e m istak es we did It is tim e to put up som e caution signs in the road W e m ay not ex actly say "Here there be dragons," but we can at least say, "That road is a lot rock ier than it look s at first glance, and you m ight really want to tak e this slightly less obvious but m uch sm oother path off to the left." This book is divided into four parts, beginning with the lowest layer of XML and gradually work ing up to the highest Part covers XML syntax , those aspects of XML that don't really affect the inform ation content of an XML docum ent but m ay have large im pact on how easy or hard those docum ents are to edit and process Part look s at XML structures, the general organization and annotation of inform ation in an XML docum ent Part discusses the various techniques and APIs available for processing XML with languages such as C++, C#, Java, Python, and Perl and thus attaching local sem antics to the labeled structures of XML Part ex plores effective techniques for system s built around XML docum ents, rather than look ing at individual docum ents in isolation Although this is how I've organized the book , you should be able to begin reading at essentially any chapter This book m ak es an ex cellent bathroom reader You m ay wish to read the introduction first, which defines a num ber of k ey term s used throughout the book that are frequently m isused or confused However, after that feel free to pick and choose from the topics as your interest and needs dictate I've m ade liberal use of cross-references throughout to direct you along other paths through the book that m ay be of interest I hope this book is a beginning, not an end It's still early in the life of XML, and m uch rem ains to be discovered and invented You m ay well develop best practices of your own that are not m entioned here If you do, I'd love to hear about them You m ay also tak e issue with som e of the principles stated here I'd lik e to hear about that too Discussion of m any of the guidelines identified here has tak en place on the x m l-dev m ailing list and seem s lik ely to continue in the future If you're interested in further discussion of the issues raised in this book , I recom m end that you subscribe and participate there Com plete details can be found at http://lists.x m l.org/ O n the other hand, if you find outright m istak es in this book (the ID attribute value is m issing a closing quote; the word "cat" is m isspelled), you can write m e directly at elharo@m etalab.unc.edu I m aintain a W eb page that lists k nown errata for this book , as well as any updates, at http://www.cafeconleche.org/book s/effectivex m l/ Finally, I hope this book m ak es your use of XML both m ore effective and m ore enjoyable [ Team LiB ] [ Team LiB ] Acknowledgments For m e, this book is the culm ination of m ore than five years of debate, argum ent, and discussion about XML with num erous people Som e of this took place in the hallways at conferences such as Software Developm ent and XMLO ne Som e of it took place on m ailing lists lik e x m l-dev Along the way a few nam es k ept popping up Som etim es I agreed with what those folk s said, som etim es I didn't—but their conversations and thoughts were always illum inating and helped clarify m y own think ing about XML These gurus include Tim Berners-Lee, Tim Bray, Claude Len Bullard, Mik e Cham pion, Jam es Clark , John Cowan, Roy Fielding, Rick Jelliffe, Michael Kay, Murata Mak oto, Uche O gbuji, W alter Perry, Paul Prescod, Jonathan Robie, and Sim on St Laurent I doubt any of them agree with everything I've written here In fact, I suspect a couple of them m ay violently disagree with m ost of it However, as I look at this book , I see their influences everywhere If they hadn't written what they've written, I couldn't have written this book Many people helped out in m ore direct ways with com m ents, corrections, and suggestions Alex Blewitt, Janek Boguck i, Lars Gregori, Gareth Jenk ins, Alex ander Rank ine, Clint Shank , and W ayne Tanner subm itted num erous helpful corrections for the draft of the m anuscript I posted at m y web site Mik e Black stone deserves special thank s for his copious notes Mik e Cham pion, Martin Gudgin, Sean McGrath, and Tim Bray did yeom anlik e service as technical reviewers Scott Meyers both founded the series and helped m e k eep the focus squarely on track Their com m ents all substantially im proved the book As always, the folk s at the Studio B literary agency were ex trem ely helpful at all steps of the process David Rogelberg, Sherry Rogelberg, and Stacey Barone should be called out for particular com m endation O n the publisher's side at Addison-W esley, Mary T O 'Brien shepherded this book from contract to com pletion Chrysta Meadowbrook e perform ed the single m ost pleasant copy edit I've ever ex perienced I would also lik e to thank the people who work ed on the production of the book , Patrick Cash-Peterson for coordinating this book through production, Stratford Publishing Services for layout and design, Sharon Hilgenberg for the index , and Diane Freed for proofing Finally, as always, m y biggest thank s are due to m y wife, Beth, without whose love and understanding this book could never have been com pleted [ Team LiB ] [ Team LiB ] Using Catalog Files The details of how to load a catalog file when processing a docum ent vary from parser to parser and tool to tool Not all XML processors support catalogs, but m ost of the im portant ones The Gnom e Project's x sltproc, Michael Kay's Sax on, and the XML Apache Project's Xerces-J and Xalan-J all support catalogs Notably lack ing from this list are the C++ versions of Xerces and Xalan as well as Microsoft's MSXML O f course, it isn't hard to integrate catalog processing into your own applications with just a little bit of open source code libxml2 Daniel Veillard's libx m l2 XML parser for C supports catalogs, as does his libx slt processor that sits on top of libx m l2 libx m l2 reads the catalog location from the XML_CATALOG_FILES environm ent variable, which contains a white-space-separated list of file nam es This can be set in all the usual ways For ex am ple, in bash or other Bourne shell derivatives, to specify that libx m l should use the catalog file found at /opt/x m l/catalog.x m l you would sim ply type the following: % XML_CATALOG_FILES=/opt/xml/catalog.xml % export XML_CATALOG_FILES In W indows, you'd set this environm ent variable in the System control panel This property can also be set to a white-space-separated list of file nam es to indicate that libx m l should try several different catalogs in sequence For ex am ple, the setting below requests that libx m l first look in the file catalog.x m l in the current work ing directory and then in the file /opt/x m l/docbook /docbook cat % XML_CATALOG_FILES="catalog.xml /opt/xml/docbook/docbook.cat" % export XML_CATALOG_FILES If you ex pect to use the sam e catalog file consistently, you could set XML_CATALOG_FILES in your bashrc or cshrc file O nce this environm ent variable is set, libx m l will consult the catalog for all docum ents it loads, whether you're calling the library from your own C++ source code, calling it from the XSLT stylesheet with the document() function, or using the com m and line tools x m llint and x sltproc If you're having trouble with the catalog, you can put libx m l in debug m ode by setting the XML_DEBUG_CATALOG environm ent variable (No value is required It just needs to be set.) libx m l will then tell you when it recognizes a catalog entry and what it's actually loading when I often find this useful for discovering sm all, nonobvious m ism atches between the IDs used in the instance docum ents and those used in the catalog For instance, when I was writing this item , libx m l helped m e uncover a m ism atch between the public ID in the catalog and the docum ents The catalog was using -//O ASIS//DTD DocBook XML V4.2.0//EN and the source docum ents were using -//O ASIS//DTD DocBook XML V4.2//EN The strings really have to m atch ex actly—4.2 is not the sam e as 4.2.0 when resolving public IDs Saxon, Xalan, and Other Java-Based XSLT Processors Most XML parsers and XSLT processors written in Java can use Norm W alsh's catalog library (now donated to the XML Apache Project) You can download it from http://x m l.apache.org/dist/com m ons/ Download the file resolver-1.0.jar (the version num ber m ay have changed) and add it to your classpath Nex t create a CatalogManager.properties file in a directory that is included in your classpath The resolver will look in this file to determ ine the locations of the catalog files Ex am ple 47-2 shows a properties file that loads the catalog nam ed catalog.x m l from the current work ing directory and the standard DocBook catalog from the absolute path /opt/x m l/docbook /docbook cat Example 47-2 A CatalogManager.properties File for Norm Walsh's Catalog Resolver catalogs=catalog.xml;/opt/xml/docbook/docbook.cat relative-catalogs=true static-catalog=yes catalog-class-name=org.apache.xml.resolver.Resolver verbosity=1 If you're having trouble, turn up the verbosity to to provide m ore detailed error m essages about ex actly which files the resolver is loading when You tell Sax on to use the Apache Com m ons catalog with several com m and line options, as shown below % java com.icl.saxon.StyleSheet -x org.apache.xml.resolver.tools.ResolvingXMLReader -y org.apache.xml.resolver.tools.ResolvingXMLReader -r org.apache.xml.resolver.tools.CatalogResolver chapter1.xml docbook.xsl Xalan is sim ilar % java org.apache.xalan.xslt.Process -ENTITYRESOLVER org.apache.xml.resolver.tools.CatalogResolver -URIRESOLVER org.apache.xml.resolver.tools.CatalogResolver -in chapter1.xml -xsl docbook.xsl jd.x slt work s the sam e ex cept that it uses lowercase argum ent nam es % java jd.xml.xslt.Stylesheet -entityresolver org.apache.xml.resolver.tools.CatalogResolver -uriresolver org.apache.xml.resolver.tools.CatalogResolver chapter1.xml docbook.xsl In all three cases, what you're really doing is telling the processor where to find an instance of the SAX EntityResolver interface and the TrAX URIResolver interface The org.apache.xml.resolver.tools.CatalogResolver class can also be used for these purposes in your own SAX and TrAX program s TrAX In TrAX both the Transformer and TransformerFactory classes have setURIResolver m ethods that allow you to provide a resolver that's used to look up URIs used by the docum ent function and the xsl:import and xsl:include elem ents Setting the URIResolver for a Transformer just changes that one Transformer object Setting the URIResolver for a TransformerFactory sets the default resolver for all Transformer objects created by that factory To use catalogs, just pass in an instance of the org.apache.xml.resolver.tools.CatalogResolver class URIResolver resolver = new CatalogResolver(); TransformerFactory factory = TransformerFactory.newInstance(); factory.setURIResolver(resolver); The location of the catalog file is determ ined by a CatalogManager.properties file as shown in Ex am ple 47-2 You will of course need to add the resolver.jar file to your classpath to m ak e this work SAX SAX program s access catalogs through the EntityResolver interface, which, conveniently, org.apache.xml.resolver.tools.CatalogResolver also im plem ents To add catalog support to your own application just pass an instance of this class to the setEntityResolver m ethod of the XMLReader class before beginning to parse a docum ent EntityResolver resolver = new CatalogResolver(); XMLReader parser = XMLReaderFactory.createXMLReader(); parser.setEntityResolver(resolver); That's all there is to it From here on, you just parse the XML as usual W henever the XMLReader loads a DTD fragm ent from either a system or a public ID it will first consult the catalogs identified by the CatalogManager.properties file [ Team LiB ] [ Team LiB ] Item 48 Verify Documents with XML Digital Signatures XML docum ents are used on W all Street for financial transactions totaling hundreds of m illions of dollars per day W orldwide, the figure's even larger There's significant incentive for crim inals to m odify XML docum ents m oving from one system to another XML digital signatures can help ensure that XML docum ents have not been tam pered with in transit Not all docum ents need to be digitally signed, but those that need to be signed need it badly The basic process of signing som ething digitally involves a k eyed hash function A hash function converts docum ents into num bers, generally sm aller than the docum ent itself Feed a docum ent into the hash function; get a 128-byte num ber out A very sim ple hash function m ight count up the num ber of bits in the docum ent and tak e the rem ainder when dividing by 256 This would give 256 possible different hash codes between and 255 Essentially this is a one-byte hash function Real-world hash functions used for signatures are m uch larger, m uch m ore com plex , and m uch m ore secure To be useful for digital signatures, the hash function is k eyed O ne k ey is used to sign the docum ent, and a separate, related k ey is used to com pare the docum ent against its signature and verify that they m atch The signing k ey is k ept private while the verification k ey is published Since only the holder of the signing k ey can create a hash code for a docum ent, if you receive a signed docum ent whose signature m atches when com puted with Alice's public k ey, you have a fair am ount of confidence that Alice signed the docum ent For purposes of security, it's im portant that the hash codes generated by the signing process be widely dispersed That is, changing even one bit of the docum ent should result in a com pletely different hash code when the docum ent is signed O therwise, it would be possible for a forger to m ak e sm all changes to the docum ent until he or she got a verifiable signature Most algorithm s for digital signatures are based around signing an entire file, which is treated as just a sequence of bytes However, XML signatures are a little m ore com plicated than that XML allows you to sign only certain elem ents or docum ent fragm ents, rather than entire docum ents Furtherm ore, the signatures can be em bedded in the docum ents they sign And finally, not all details of XML are necessarily relevant to the signature For instance, it doesn't m atter whether a check approval is written as or ; but it's very im portant that it not be changed to XML digital signatures generally sign a fragm ent of a docum ent that is identified by an XPath ex pression The actual data to be signed is calculated by canonicalizing that part of the XML docum ent before signing its bytes [ Team LiB ] [ Team LiB ] Digital Signature Syntax I'm going to give you just a brief overview of what a digitally signed XML docum ent look s lik e The arithm etic is far too com plex for m ost hum ans to by hand (even program m ers) It's virtually certain that you'll use som e software application or library to sign and verify your docum ents XML docum ents aren't signed by hand There are three basic k inds of signatures An enveloping signature contains the data it signs An enveloped signature is contained inside the docum ent it signs A detached signature signs data ex ternal to the docum ent identified by a URL Before any XML docum ent can be signed, it needs to be transform ed into a canonical form that norm alizes syntactically irrelevant details lik e attribute order and the am ount of white space inside tags For ex am ple, let us suppose we have a docum ent that represents order and paym ent inform ation, as shown in Ex am ple 481 Example 48-1 An Order Document Fables 2 DC Gen 13 46 Wildstorm Elliotte Rusty Harold 5555 3142 2718 2998 06 2006 Suppose the com ic shop wants to verify that I actually sent this order before charging m y credit card The store could require that I sign the docum ent with m y private k ey, which they would then verify with m y public k ey The m ost com m on way to this is with an enveloping signature This includes the docum ent being signed inside the signature Ex am ple 48-2 dem onstrates, using the order docum ent from Ex am ple 48-1 The root elem ent is now Signature instead of Order However, the last child elem ent of the Signature elem ent is a dsig:Object elem ent that contains the root Order elem ent of the original docum ent This is what has been signed After verifying the signature, you can ex tract the original elem ent using any of the usual techniques A tree-based API such as JDO M, XO M, or DO M is probably the sim plest approach here Example 48-2 An Enveloping Signature tRJGGSB544BQ1CVyj9UdR3+8/PE= GzgtyIj1DYTBX1idqH0wjae7U2lUBCXaAkuvBKeVIUWkwWyGHqBXqQ==

/X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY 1Y+r/F9bow9subVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX /rfGG/g7V+fGqKYVDwT7g/bTxR7DAjVUE1oWkTL2dfOuK2HXKu/yIg MZndFIAcc=

l2BQjxUjC8yykrmCouuEC/BYHPU= 9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ ZxBxCBgLRJFnEj6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWR bqN/C/ohNWLx+2J6ASQ7zKTxvqhRkImog9/hWuWfBpKLZl6Ae1UlZA FMO/7PSSo= 7bQ9Utz1cuAXbXGPwSC/v29fxGDiqXMO3nnyp3qvCzS351MWvYC3pf zW4KAqxEUdMeBzSpysBAhBW4IwEYSTRZ3RFtJUf2hjHhxo93oakMKZ /pfeg4MTPLM1rAQuTZ7tRI8jvXu/snhJknhhnGPGWGt1ZOePT24Mlx f+1hTGRck= CN=Elliotte Harold,OU=Metrotech,O=Polytechnic, L=Brooklyn, ST=New York,C=US 1046543415 CN=Elliotte Harold,OU=Metrotech,O=Polytechnic, L=Brooklyn,ST=New York,C=US MIIDJDCCAuECBD5g/DcwCwYHKoZIzjgEAwUAMHcxCzAJBgNVBAYTAlVTMREwDwYD VQQIEwhOZXcgWW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0 ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxGDAWBgNVBAMTD0VsbGlvdHRlIEhh cm9sZDAeFw0wMzAzMDExODMwMTVaFw0wMzA1MzAxODMwMTVaMHcxCzAJBgNVBAYT AlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNV BAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxGDAWBgNVBAMTD0Vs bGlvdHRlIEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OBHXUSKVLf Spwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4Ad NG/yZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQT WhaRMvZ1864rYdcq7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGB APfhoIXWmz3ey7yrXDa4V7l5lK+7+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0 SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4rs6Z1kW6jfwv6ITVi8ftiegEk O8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKBgQDttD1S3PVy 4BdtcY/BIL+/b1/EYOKpcw7eefKneq8LNLfnUxa9gLel/NbgoCrERR0x4HNKnKwE CEFbgjARhJNFndEW0lR/aGMeHGj3ehqQwpn+l96DgxM8szWsBC5Nnu1EjyO9e7+y eEmSeGGcY8ZYa3Vk549PbgyXF/7WFMZFyTALBgcqhkjOOAQDBQADMAAwLQIVAIQs 71E6P19ImxGIwBQfmB9ov0HTAhRtlgIWB6YUqt7ilNcSxfbHWOMKLA== Fables 2 DC Gen 13 46 Wildstorm Elliotte Rusty Harold 5555 3142 2718 2998 06 2006 Do not concern yourself ex cessively with the detailed syntax of this ex am ple Even if the XML structure is intelligible to a person, the m athem atics required to produce the Base64-encoded signature really aren't I suppose it's theoretically possible that an arithm etical savant could this by hand, but in practice it's always done by com puter You don't need to worry about the details unless you're writing the software to generate and verify digital signatures Most program m ers just use a library written by som ebody else such as XML-Security from the XML Apache Project (http://x m l.apache.org/security/) or XSS4J from IBM (http://www.alphawork s.ibm com /tech/x m lsecuritysuite) You should also not worry about the size Since the original ex am ple was quite sm all, the signature m ark up form s a large part of the signed docum ent However, the size of the signature m ark up is alm ost constant You could sign a m ultim egabyte docum ent with the sam e num ber of bytes used here The size of the signature is independent of the docum ent signed and only lightly coupled to the size of the k ey or the algorithm used Som etim es it m ay be m ore convenient to k eep the sam e root elem ent but add the Signature elem ent inside that docum ent This is a little trick y because verification needs to be careful to verify the docum ent without considering the signature to be part of it Still, although this caused a little ex tra work for the designers of the XML digital signature specification, the details are now encapsulated in the different libraries you m ight use, so it's not really any ex tra work for your code Ex am ple 48-3 shows a version of Ex am ple 48-1 that contains an enveloped signature Example 48-3 An Enveloped Signature Fables 2 DC Gen 13 46 Wildstorm Elliotte Rusty Harold 5555 3142 2718 2998 06 2006 pCD81qloCPf9UBbJ1CnTwMh+Wo4= dguuK7RO1THsftPd/yHJK+1ImHYd8dAy8mGk7GzAH/vVFxFkysJplQ==

/X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY 1Y+r/F9bow9subVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX /rfGG/g7V+fGqKYVDwT7g/bTxR7DAjVUE1oWkTL2dfOuK2HXKu/yIg MZndFIAcc=

l2BQjxUjC8yykrmCouuEC/BYHPU= 9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ ZxBxCBgLRJFnEj6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWR bqN/C/ohNWLx+2J6ASQ7zKTxvqhRkImog9/hWuWfBpKLZl6Ae1UlZA FMO/7PSSo= 7bQ9Utz1cuAXbXGPwSC/v29fxGDiqXMO3nnyp3qvCzS351MWvYC3pf zW4KAqxEUdMeBzSpysBAhBW4IwEYSTRZ3RFtJUf2hjHhxo93oakMKZ /pfeg4MTPLM1rAQuTZ7tRI8jvXu/snhJknhhnGPGWGt1ZOePT24Mlx f+1hTGRck= CN=Elliotte Harold,OU=Metrotech,O=Polytechnic, L=Brooklyn,ST=New York,C=US 1046543415 CN=Elliotte Harold,OU=Metrotech,O=Polytechnic, L=Brooklyn,ST=New York,C=US MIIDJDCCAuECBD5g/DcwCwYHKoZIzjgEAwUAMHcxCzAJBgNVBAYTAlVTMREwDwYD VQQIEwhOZXcgWW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0 ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxGDAWBgNVBAMTD0VsbGlvdHRlIEhh cm9sZDAeFw0wMzAzMDExODMwMTVaFw0wMzA1MzAxODMwMTVaMHcxCzAJBgNVBAYT AlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNV BAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxGDAWBgNVBAMTD0Vs bGlvdHRlIEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OBHXUSKVLf Spwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4Ad NG/yZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQT WhaRMvZ1864rYdcq7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGB APfhoIXWmz3ey7yrXDa4V7l5lK+7+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0 SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4rs6Z1kW6jfwv6ITVi8ftiegEk O8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKBgQDttD1S3PVy 4BdtcY/BIL+/b1/EYOKpcw7eefKneq8LNLfnUxa9gLel/NbgoCrERR0x4HNKnKwE CEFbgjARhJNFndEW0lR/aGMeHGj3ehqQwpn+l96DgxM8szWsBC5Nnu1EjyO9e7+y eEmSeGGcY8ZYa3Vk549PbgyXF/7WFMZFyTALBgcqhkjOOAQDBQADMAAwLQIVAIQs 71E6P19ImxGIwBQfmB9ov0HTAhRtlgIWB6YUqt7ilNcSxfbHWOMKLA== A detached signature neither contains nor is contained in the docum ent it signs Instead it points to the docum ent being signed with a URI This allows it to sign things besides XML docum ents such as JPEG im ages and Microsoft W ord files The object signed is identified by the URI attribute of a Reference elem ent Ex am ple 48-4 is a detached signature for the order docum ent shown in Ex am ple 48-1 Example 48-4 A Detached Signature J4qs6XERp3S9frY9Je3IiZL2yvs= TIptdglMXBgmHWFm1jOygQiMr4JJGGPAMW8XR65mGpjNeV469EiieQ==

/X9TgR11EilS30qcLuzk5/YRt1I870QAwx4/gLZRJmlFXUAiUftZPY 1Y+r/F9bow9subVWzXgTuAHTRv8mZgt2uZUKWkn5/oBHsQIsJPu6nX /rfGG/g7V+fGqKYVDwT7g/bTxR7DAjVUE1oWkTL2dfOuK2HXKu/yIg MZndFIAcc=

l2BQjxUjC8yykrmCouuEC/BYHPU= 9+GghdabPd7LvKtcNrhXuXmUr7v6OuqC+VdMCz0HgmdRWVeOutRZT+ ZxBxCBgLRJFnEj6EwoFhO3zwkyjMim4TwWeotUfI0o4KOuHiuzpnWR bqN/C/ohNWLx+2J6ASQ7zKTxvqhRkImog9/hWuWfBpKLZl6Ae1UlZA FMO/7PSSo= 7bQ9Utz1cuAXbXGPwSC/v29fxGDiqXMO3nnyp3qvCzS351MWvYC3pf zW4KAqxEUdMeBzSpysBAhBW4IwEYSTRZ3RFtJUf2hjHhxo93oakMKZ /pfeg4MTPLM1rAQuTZ7tRI8jvXu/snhJknhhnGPGWGt1ZOePT24Mlx f+1hTGRck= CN=Elliotte Harold,OU=Metrotech,O=Polytechnic, L=Brooklyn,ST=New York,C=US 1046543415 CN=Elliotte Harold,OU=Metrotech,O=Polytechnic, L=Brooklyn,ST=New York,C=US MIIDJDCCAuECBD5g/DcwCwYHKoZIzjgEAwUAMHcxCzAJBgNVBAYTAlVTMREwDwYD VQQIEwhOZXcgWW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNVBAoTC1BvbHl0 ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxGDAWBgNVBAMTD0VsbGlvdHRlIEhh cm9sZDAeFw0wMzAzMDExODMwMTVaFw0wMzA1MzAxODMwMTVaMHcxCzAJBgNVBAYT AlVTMREwDwYDVQQIEwhOZXcgWW9yazERMA8GA1UEBxMIQnJvb2tseW4xFDASBgNV BAoTC1BvbHl0ZWNobmljMRIwEAYDVQQLEwlNZXRyb3RlY2gxGDAWBgNVBAMTD0Vs bGlvdHRlIEhhcm9sZDCCAbgwggEsBgcqhkjOOAQBMIIBHwKBgQD9f1OBHXUSKVLf Spwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4Ad Spwu7OTn9hG3UjzvRADDHj+AtlEmaUVdQCJR+1k9jVj6v8X1ujD2y5tVbNeBO4Ad NG/yZmC3a5lQpaSfn+gEexAiwk+7qdf+t8Yb+DtX58aophUPBPuD9tPFHsMCNVQT WhaRMvZ1864rYdcq7/IiAxmd0UgBxwIVAJdgUI8VIwvMspK5gqLrhAvwWBz1AoGB APfhoIXWmz3ey7yrXDa4V7l5lK+7+jrqgvlXTAs9B4JnUVlXjrrUWU/mcQcQgYC0 SRZxI+hMKBYTt88JMozIpuE8FnqLVHyNKOCjrh4rs6Z1kW6jfwv6ITVi8ftiegEk O8yk8b6oUZCJqIPf4VrlnwaSi2ZegHtVJWQBTDv+z0kqA4GFAAKBgQDttD1S3PVy 4BdtcY/BIL+/b1/EYOKpcw7eefKneq8LNLfnUxa9gLel/NbgoCrERR0x4HNKnKwE CEFbgjARhJNFndEW0lR/aGMeHGj3ehqQwpn+l96DgxM8szWsBC5Nnu1EjyO9e7+y eEmSeGGcY8ZYa3Vk549PbgyXF/7WFMZFyTALBgcqhkjOOAQDBQADMAAwLQIVAIQs 71E6P19ImxGIwBQfmB9ov0HTAhRtlgIWB6YUqt7ilNcSxfbHWOMKLA== If you're signing non-XML data, you m ust use a detached signature If you're signing XML data, you should use either an enveloped or enveloping signature because they ignore XML-insignificant details lik e white space in tags and whether em pty elem ents are represented with one tag or two W hether you use enveloped or enveloping signatures depends m ainly on which seem s sim pler to you Most tools and class libraries for generating and verifying signatures work equally well with either [ Team LiB ] [ Team LiB ] Digital Signature Tools I'm not aware that digital signature software is restricted or forbidden by law anywhere However, the m athem atics and basic algorithm s for digital signatures are essentially the sam e as those used for som e form s of cryptography The m ost com m on signature algorithm s are essentially public k ey cryptography algorithm s run in reverse; that is, signatures are encrypted with private k eys and decrypted with public k eys Consequently, the software is less available than it should be and often ex cessively difficult to install or configure Vendors have to jum p through hoops to be allowed to publish, sell, and ex port their products The ex act num ber of hoops varies a lot from one jurisdiction to the nex t Thus, unfortunately, XML digital signature tools and libraries are som ewhat sparser than they otherwise would be Possibly the m ost advanced open source library at the tim e of this writing is XML-Security from the Apache XML Project This is a Java class library that runs on top of Java 1.3.1 and later [1] It relies on Sun's Java Cryptography Ex tension for its m athem atics The preferred im plem entation of this API is from the Legion of the Bouncy Castle which, being based in Australia, doesn't have to subm it to U.S ex port laws The Apache XML project can't legally ship the Bouncy Castle JCE with their software, but you can grab it yourself from http://www.bouncycastle.org/ [1] It may run on earlier versions, but the lead developer wasn't sure if it did when I asked him Even if you can get the current version to run on a pre-1.3 V M, there's no guarantee future releases will XML-Security also depends on Xalan and Xerces These products also need to be installed in your classpath Sun ships a buggy, beta version of Xalan with Java 1.4, so if you're using Java 1.4 you'll need to put the Xalan jar archive in your jre/lib/endorsed directory rather than the jre/lib/ex t directory [2] O therwise XMLSecurity will fail with strange error m essages O nce you've done that, using this pack age to digitally sign DO M docum ents is not too difficult Num erous sam ples are included with the pack age However, the user interface is nonex istent [2] Shortly before we went to press Sun posted a beta of Java 1.4.2 that includes a much more current version of Xalan If you're using Java 1.4.2 or later, you're good to go Slightly less advanced in the API departm ent but slightly m ore advanced when it com es to user interface is IBM's XSS4J This includes a couple of sam ple com m and line applications for signing docum ents First you'll need to use the k eytool bundled with the JDK to create a k ey based on a password C:> keytool -genkey -dname "CN=Elliotte Harold, OU=Metrotech, O=Polytechnic, L=Brooklyn, S=New York, C=US" -alias elharo -storepass mystorepassword -keypass mykeypassword (For various technical reasons the password can't be used as the k ey directly It needs to be transform ed into a m ore random sequence of bits.) Nex t you can run the program dsig.SampleSign2 across the docum ent to sign it C:\> java dsig.SampleSign2 elharo mystorepassword mykeypassword -ext file:///home/elharo/books/effectivexml/examples/order.xml > signed_order.xml Key store: file:///home/elharo/.keystore Sign: 703ms This is how I produced the enveloping and detached ex am ples earlier in this chapter (XSS4J does not yet support enveloped signatures.) However, m ore com m only you'll want to integrate digital signatures into your own application, and XML-Security has a com prehensive API that allows you to this There are also several com m ercial offerings for Java The first is Baltim ore Technologies' KeyTools XML (http://www.baltim ore.com /k eytools/x m l/index asp) Phaos has released a com m ercial XML Security Suite for Java (http://phaos.com /products/category/x m l.htm l) that supports XML encryption and XML digital signatures Both of these products rely on the JCE to the m ath Beyond Java, the pick ings are very slim at this tim e The only C/C++ library I've been able to locate is Infom osaic's payware SecureXML (http://www.infom osaic.net/) The System Security.Cryptography.XML pack age in the NET fram ework provides com plete support for signing and verifying XML digital signatures I haven't seen any libraries or tools in Perl, Python, or other languages But this is all still pretty bleeding-edge stuff; 2004 should see m any m ore options developed and released [ Team LiB ] [ Team LiB ] Item 49 Hide Confidential Data with XML Encryption As web services based on SO AP, REST, and XML-RPC ex plode in popularity, m ore and m ore sensitive data is passed around the Internet as XML docum ents This includes data thieves m ight want to use for illicit financial gain, such as credit card num bers, social security num bers, account num bers, and m ore It includes data governm ents m ight want to use to attack opponents, such as nam es, addresses, political beliefs, donor lists, and so forth It includes data users m ight sim ply wish to k eep private for its own sak e, such as m edical records and sex ual preferences There are large incentives for bad people to try to read XML docum ents m oving from one system to another XML encryption can help prevent this Not all docum ents need to be encrypted, but those that need encryption need it badly To som e ex tent, standard encryption technologies lik e PGP and HTTPS can render som e assistance These protocols, program s, and algorithm s are for the m ost part form at-neutral They can encrypt any sequence of bytes into another sequence of bytes Naturally, they can encrypt an XML file just as easily as an HTML file, a W ord docum ent, a JPEG im age, or any other com puter data; and som etim es this suffices However, none of these generic encryption tools retain any of the advantages of the XML nature of the original file The docum ents they produce are binary, not tex t They cannot be processed with standard XML tools XML encryption is a technology m ore geared to the specific needs of encrypting XML docum ents It allows som e parts of a docum ent to be encrypted while other parts are left in plain tex t It can encrypt different parts of a docum ent in different ways For ex am ple, a custom er can subm it an order to a m erchant in which the product ordered and the shipping address are encrypted with the m erchant's public k ey, but the credit card inform ation is encrypted with the credit card com pany's public k ey The m erchant can easily ex tract the inform ation needed and forward the rest to the credit card com pany for approval or rejection The m erchant has no way of k nowing or storing the user's credit card data and thus could not at a later tim e charge the custom er for products he or she hadn't ordered nor ex pose the data to hack ers [ Team LiB ] [ Team LiB ] Encryption Syntax I'm going to give you just a brief overview of what an encrypted XML docum ent look s lik e As with digital signatures (which use a lot of the sam e m ath), the arithm etic is far too com plex for m ost hum ans to by hand You'll always use a software application or library to encrypt docum ents Encrypted XML isn't intended to be authored by a tex t editor lik e norm al XML W hen a docum ent or portion of a docum ent is encrypted, that part is replaced by an EncryptedData elem ent lik e the one that follows Base64-encoded, encrypted key value Name of the key used to encrypt this data Where to find the key Base64-encoded, encrypted data or Each EncryptedData elem ent represents one chunk of encrypted XML This can decrypt to plain tex t, to a single elem ent, to several elem ents, or to m ix ed content The result of this replacem ent m ust be well-form ed That is, you cannot encrypt an attribute alone, or the start-tag of an elem ent but not the end-tag This is all sensible It just m eans that structures you encrypt are the structures found in the XML docum ent The Type attribute indicates what was encrypted It can have the following values http://www.w3.org/2001/04/xmlenc#Element: A single elem ent was encrypted http://www.w3.org/2001/04/xmlenc#Content: A sequence of XML nodes was encrypted, potentially including any num ber of elem ents, tex t nodes, com m ents, and processing instructions in any order and com bination At a m inim um , the EncryptedData elem ent has a CipherData child elem ent This contains either a CipherValue or a CipherReference A CipherValue contains the encrypted data encoded in Base64 A CipherReference points to the encrypted data using its URI attribute The data is not included with the docum ent For ex am ple, consider the com ic book order from Item 48, which is repeated in Ex am ple 49-1 Example 49-1 An Order Document Fables 2 DC Gen 13 46 Wildstorm Elliotte Rusty Harold 5555 3142 2718 2998 06 2006 If I encrypted the content of the CreditCard elem ent, the result would look som ething lik e Ex am ple 49-2 (depending on the choice of k ey and algorithm , of course) Example 49-2 Encrypting the Content of an Element Fables 2 DC Gen 13 46 Wildstorm ZPbIV3QYoAK/m1c81yu+37mylmmvFocDas7BxR94FA0qjm/ 6u0GY59lluoclaLiq/fGHXS8P69YShwIaehDGG2n56JS8B0/h3m1AHf5Ozm9zUop gyqn7k8HcXAkB7oAFLiKvHc/R+ZjU8XpVJdCFfTjaJ3Jy4bQNR3TWrbmCTPK5//C WedrnLuebpq2r88/y This would allow a process that did not have the k ey to k now that the encrypted data is credit card inform ation for a VISA card However, it would not k now the card num ber, the ex piration date, or the cardholder's nam e If I encrypted the entire CreditCard elem ent, the result would look lik e Ex am ple 49-3 Now you don't k now for sure that the encrypted data is credit card inform ation unless you k now the decryption k ey Example 49-3 Encrypting a Single Element Fables 2 DC Gen 13 46 Wildstorm ZPbIV3QYoAK/m1c81yu+37mylmmvFocDas7BxR94FA0qjm/ 6u0GY59lluoclaLiq/fGHXS8P69YShwIaehDGG2n56JS8B0/h3m1AHf5Ozm9zUop gyqn7k8HcXAkB7oAFLiKvHc/R+ZjU8XpVJdCFfTjaJ3Jy4bQNR3TWrbmCTPK5//C WedrnLuebpq2r88/y In som e cases, it m ay be useful to include additional inform ation beyond the encrypted data itself An em pty EncryptionMethod elem ent specifies the algorithm that was used to encrypt the data so that it can m ore easily be decrypted The Algorithm attribute contains a URI identifying the algorithm There's no ex haustive list of these because new algorithm s continue to be invented, but som e com m on ones include the following: Triple DES: http://www.w3.org/2001/04/x m lenc#tripledes-cbc AES 128 bit: http://www.w3.org/2001/04/x m lenc#aes128-cbc AES 256 bit: http://www.w3.org/2001/04/x m lenc#aes256-cbc AES 192 bit: http://www.w3.org/2001/04/x m lenc#aes192-cbc Depending on the algorithm , it m ay be useful to include either the actual k ey used or the nam e of the k ey If the nam e of the k ey is included, presum ably the recipient k nows how to find the value of that k ey in som e central repository The actual value of the encryption k ey m ay be included for public k ey/private k ey system s since k nowing the encryption k ey doesn't help you decrypt the m essage Alternately, because public k ey cryptography is relatively slow, the actual m essage m ay be encoded using a sym m etric cipher such as DES using a random ly chosen k ey The random k ey is then encoded using the recipient's public k ey and stored in the k ey info None of this inform ation is required for XML encryption All of it is allowed if you find it useful If present, such inform ation is stored in a KeyInfo elem ent in the http://www.w3.org/2000/09/xmldsig# nam espace As the URI suggests, this is the sam e KeyInfo elem ent used in XML digital signatures (See Item 48.) It can provide k eys by nam e, reference, or value Ex am ple 49-4 includes the RSA (public) k ey used to encrypt the data encoded by both nam e and value If you have the private k ey that m atches this public k ey, you can decrypt the inform ation Nobody else should be able to, at least not easily Example 49-4 Bundling Key Info with the Encrypted Data Fables 2 DC Gen 13 46 Wildstorm Bob V5foK5hhmbktQhyNdy/6LpQRhDUDsTvK+g9Ucj47es9AQJ3U xA7SEU+e0yQH5rm9kbCDN9o3aPIo7HbP7tX6WOocLZAtNfyx SZDU16ksL6WjubafOqNEpcwR3RdFsT7bCqnXPBe5ELh5u4VE y19MzxkXRgrMvavzyBpVRgBUwUl= AQAB ZPbIV3QYoAK/m1c81yu+37mylmmvFocDas7BxR94FA0qjm/6u0GY59l luoclaLiq/fGHXS8P69YShwIaehDGG2n56JS8B0/h3m1AHf5Ozm9zUo vHc/R+ZjU8XpVJdCFfTjaJ3Jy4bQNR3TWrbmCTPK5//CWedrnLuebpq 2r88/y For a sym m etric k ey, you'd norm ally just use the nam e you had previously agreed on for the k ey with the recipient Ex actly how k eys are nam ed is beyond the scope of the XML Encryption specification [ Team LiB ] [ Team LiB ] Encryption Tools Encryption software, whether for XML or otherwise, is restricted by law in m any jurisdictions, including the United States Consequently, encryption software is less available than it should be, and it is often ex cessively difficult to install or configure Vendors have to jum p through hoops to be allowed to publish, sell, and ex port their products The ex act num ber of hoops varies a lot from one jurisdiction to the nex t Thus, unfortunately, XML encryption tools and libraries are less advanced than they otherwise would be Alm ost all im plem entations of XML encryption at the current tim e seem to be Java class libraries, although that's lik ely to change in the future The only non-Java library I've found so far is Alek sey Sanin's XMLSec (http://www.alek sey.com /x m lsec), an open source im plem entation of XML Encryption for C and C++ that sits on top of the Gnom e project's libx m l and libx slt Moving into the Java realm , there are a lot m ore choices Baltim ore Technologies' KeyTools XML (http://www.baltim ore.com /k eytools/x m l/index asp) is a com m ercial offering written in Java that supports both XML encryption and digital signatures on top of the Java Cryptography Ex tension (JCE) Phaos has released a com m ercial XML Security Suite (http://phaos.com /products/category/x m l.htm l) for Java that also supports encryption and digital signatures Possibly the m ost advanced open source offering at the tim e of this writing is XML-Security (http://x m l.apache.org/security/) from the Apache XML Project This is the sam e library discussed in Item 48 for producing digital signatures It is a Java class library that runs on top of Java 1.3.1 and later It relies on Sun's Java Cryptography Ex tension to perform the necessary m ath The preferred im plem entation of this API is from the Legion of the Bouncy Castle, which, being based in Australia, doesn't have to subm it to U.S ex port laws The Apache XML project can't legally ship the Bouncy Castle JCE with its software, but the Ant build file will download it for you autom atically IBM's XSS4J also im plem ents various XML encryption algorithm s and has a slightly better user interface than XML-Security (that is, it has a user interface) It was used to encrypt the ex am ples shown in this chapter XSS4J prefers different im plem entations of the JCE It can run with Sun's own JCE, but it wants the IBM (http://www7b.boulder.ibm com /wsdd/wspvtdevk it-info.htm l) or IAIK (http://jce.iaik tugraz.at/products/01_jce/) im plem entations, especially if you want to use RSA encryption or k ey ex change The com plex ity of the JCE has m ade m ost im plem entations noninteroperable at the API level However, at the XML docum ent level, m atters are m uch better Encrypted XML produced by one tool can be read by different tools, provided they support the sam e algorithm s If you stick to the required algorithm s (basically AES and Triple DES for encryption, RSA for k ey ex change, SHA-1 for m essage digest, and Base64 for encoding), your docum ents should be able to be easily encrypted and decrypted by anyone who k nows the right k ey [ Team LiB ] [ Team LiB ] Item 50 Compress if Space Is a Problem Verbosity is a com m on criticism of XML However, in practice, m ost developers' intuitions about the verbosity of XML are wrong XML docum ents are alm ost always sm aller than the equivalent binary file form at The sad truth is that m ost m odern software pays little to no attention to optim izing docum ents for space However, if your XML docum ents are so big or your available space so sm all that size is a real issue, you can sim ply gzip (or zip or bzip or com press) the XML docum ents For ex am ple, consider Microsoft W ord A 70-page chapter including about a dozen screen shots and diagram s from one of m y previous book s occupied 6.7MB O pening that docum ent in O penO ffice 1.0 and im m ediately resaving it into O penO ffice's native com pressed XML form at reduced the file's size to 522K, a savings of m ore than 90% I unzipped the O penO ffice docum ent into its com ponent parts, and the resulting directory was also 6.7MB, alm ost ex actly the sam e size as the original binary file form at Most of that space was tak en up by the pictures For another ex am ple, consider a typical database O ne of the fundam ental principles of a m odern DBMS is that the physical storage is decoupled from the logical representation This allows the database to optim ize perform ance by carefully deciding where to place which fields on the disk Holes are left in the files to allow for insertion of additional data in the future Index es are created across the data Som e data m ay even be duplicated in m ultiple places if that helps optim ize perform ance But one thing that is not optim ized is storage space A typical relational database uses several tim es to several dozen tim es the space that would be required purely to store the data without worrying about optim ization As an ex perim ent, I took a sm all FileMak er Pro database containing inform ation about 650 book s and ex ported it to XML The original database was 1.5MB The ex ported XML docum ent was only 1.0MB, a savings of 33% This is actually on the sm all side of the savings you can ex pect by m oving to XML, m ostly because FileMak er does a better than average job of cram m ing data into lim ited space It's not uncom m on to produce XML docum ents that are as sm all as 10% of the size of the original database Inform ation theory tells us that given a perfectly efficient com pression algorithm , two docum ents containing the sam e inform ation will com press to the sam e final size, regardless of form at Reasonably fast com pression algorithm s lik e gzip and bzip2 aren't perfectly efficient Nonetheless, in actual tests when I've com pared gzipped XML docum ents to the gzipped binary equivalents, m ost files were within 10% of each other in size W hether the gzipped binary file is 10% sm aller or 10% larger than the gzipped XML equivalent seem s unpredictable Som etim es it's one way, som etim es the other; but at this point the details are too sm all to care about Java includes built-in support for zip, gzip, and inflate/deflate algorithm s in the java.util.zip pack age These are all im plem ented as filter stream s, so it's straightforward to hook one up to your original source of data and then pass it to a parser that reads from or writes to the stream as norm al For ex am ple, suppose you've built up a DO M Document object nam ed doc in m em ory and you want to serialize it into a file nam ed data.xml.gz in the current work ing directory The data in the file will be gzipped First open a FileOutputStream to the file, chain this to a GZipOutputStream, and then write the docum ent onto the OutputStream as norm al For ex am ple, the following code uses Xerces's XMLSerializer class to write a DO M Document object into a com pressed file Document doc; // load the document try { OutputStream fout = new FileOutputStream("data.xml.gz"); OutputStream out = new GZipOutputStream(fout); OutputFormat format = new OutputFormat(document); XMLSerializer output = new XMLSerializer(out, format); output.serialize(doc); } catch (IOException ex) { System.err.println(ex); } From this point forward you neither need to k now nor care that the data is com pressed It's all done behind the scenes autom atically Input is equally easy For ex am ple, suppose later you want to read data.xml.gz back into your program Decom pression adds just one line of code to hook up the GZipInputStream InputStream fin = new FileInputStream("data.xml.gz"); InputStream in = new GZipInputStream(fin); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = factory.newDocumentBuilder(); Document doc = parser.parse(in); // work with the document O f course, the sam e techniques work if you need to read or write from the network instead of a file You'll just hook up the filter stream s to network stream s rather than file stream s Sim ilar techniques are available for C and C++ Although com pression is not a standard part of the C or C++ libraries, Greg Roelofs, Mark Adler, and Jean-loup Gailly's zlib library (http://www.gzip.org/zlib/) should satisfy m ost needs zlib is available in source and binary form s for pretty m uch all m odern platform s Indeed, the java.util.zip pack age is just a wrapper around calls to this library Python includes the GzipFile class for convenient access to this library The Com press::Zlib m odule available from CPAN perform s the sam e task for Perl .NET aficionados can use Mik e Krueger's open source #ziplib (http://www.icsharpcode.net/O penSource/SharpZipLib/) instead Finally, if you're serving data over the W eb, m odern web servers and browsers have built-in support for com pression They can transparently com press and decom press docum ents as necessary before transm itting them Since bandwidth tends to be a lot m ore ex pensive and lim ited on both ends than CPU speed, this is norm ally a win-win proposition By no m eans should you let fear of fatness stop you from using XML file form ats Most of the tim e the fear is unfounded Even in those rare cases where it isn't, standard com pression algorithm s neatly solve the problem [ Team LiB ] [ Team LiB ] Recommended Reading Bray, Tim (ed.) Internet Media Type registration, consistency of use W orld W ide W eb Consortium , Septem ber 4, 2002 Available online at http://www.w3.org/2001/tag/2002/0129-m im e Dürst, Martin, and Asm us Freytag Unicode in XML and Other Markup Languages Unicode Consortium and W orld W ide W eb Consortium , February 2002 Available online at http://www.w3.org/TR/unicode-x m l/ Dürst, Martin, Asm us Freytag, Richard Ishida, Tex Tex in, Misha W olf, and Franỗois Yergeau (eds.) Character Model for the World Wide Web 1.0 W orld W ide W eb Consortium , April 2002 Available online at http://www.w3.org/TR/charm od/ Hollenbeck , Scott, Larry Masinter, and Marshall Rose Guidelines for the Use of Extensible Markup Language (XML) within IETF Protocols Internet Engineering Task Force, January 2003 Available online at http://www.ietf.org/rfc/rfc3470.tx t Jelliffe, Rick The XML and SGML Cookbook Upper Saddle River, NJ: Prentice Hall, 1999 Kohn, Dan, Murata Mak oto, Sim on St Laurent, and E W hitehead XML Media Types Internet Engineering Task Force, January 2001 Available online at http://www.ietf.org/rfc/rfc3023.tx t Megginson, David Structuring XML Documents Upper Saddle River, NJ: Prentice Hall, 1998 The MITRE Corporation and Mem bers of the x m l-dev Mailing List XML Schemas: Best Practices Available online at http://www.x front.com /BestPracticesHom epage.htm l Spencer, Paul (ed.) e-Government Schema Guidelines for XML Decem ber 2002 Available online at http://www.e-envoy.gov.uk /oee/oee.nsf/sections/guidelinestop/$file/guidelines_index htm The Unicode Consortium The Unicode Standard, Version 3.0 Boston, MA: Addison-W esley, 2000 W alsh, Norm an (ed.) Using Qualified Names (QNames) as Identifiers in Content W orld W ide W eb Consortium , July 25, 2002 Available online at http://www.w3.org/2001/tag/doc/qnam eids.htm l [ Team LiB ] Brought to You by Like the book? Buy it! ... ontents Effective XML: 50 Specific Ways to Improve Your XML By Elliotte Rusty Harold Publisher: Addison W esley Pub Date: Septem ber 22, 2003 ISBN: 0-321- 1504 0-6 Pages: 336 Copyright Praise for Effective. .. LiB ] [ Team LiB ] Titles in the Series Elliotte Rusty Harold, Effective XML: 50 Specific Ways to Improve Your XML 0321 1504 06 Diom idis Spinellis,Code Reading: The Open Source Perspective 0201799405... international@pearsontechgroup.com Visit Addison- W esley on the W eb: www.awprofessional.com Library of Congress Cataloging-in-Publication Data Harold, Elliotte Rusty Effective XML : 50 specific ways to im prove your XML / Elliotte

Ngày đăng: 26/03/2019, 17:09

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan