XML ™ Bible Elliotte Rusty Harold IDG Books Worldwide, Inc. An International Data Group Company Foster City, CA ✦ Chicago, IL ✦ Indianapolis, IN ✦ New York, NY 3236-7 FM.F.qc 6/30/99 2:59 PM Page iii XML™ Bible Published by IDG Books Worldwide, Inc. An International Data Group Company 919 E. Hillsdale Blvd., Suite 400 Foster City, CA 94404 www.idgbooks.com (IDG Books Worldwide Web site) Copyright © 1999 IDG Books Worldwide, Inc. All rights reserved. No part of this book, including interior design, cover design, and icons, may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording, or otherwise) without the prior written permission of the publisher. ISBN: 0-7645-3236-7 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 1O/QV/QY/ZZ/FC Distributed in the United States by IDG Books Worldwide, Inc. Distributed by CDG Books Canada Inc. for Canada; by Transworld Publishers Limited in the United Kingdom; by IDG Norge Books for Norway; by IDG Sweden Books for Sweden; by IDG Books Australia Publishing Corporation Pty. Ltd. for Australia and New Zealand; by TransQuest Publishers Pte Ltd. for Singapore, Malaysia, Thailand, Indonesia, and Hong Kong; by Gotop Information Inc. for Taiwan; by ICG Muse, Inc. for Japan; by Norma Comunicaciones S.A. for Colombia; by Intersoft for South Africa; by Eyrolles for France; by International Thomson Publishing for Germany, Austria and Switzerland; by Distribuidora Cuspide for Argentina; by Livraria Cultura for Brazil; by Ediciones ZETA S.C.R. Ltda. for Peru; by WS Computer Publishing Corporation, Inc., for the Philippines; by Contemporanea de Ediciones for Venezuela; by Express Computer Distributors for the Caribbean and West Indies; by Micronesia Media Distributor, Inc. for Micronesia; by Grupo Editorial Norma S.A. for Guatemala; by Chips Computadoras S.A. de C.V. for Mexico; by Editorial Norma de Panama S.A. for Panama; by American Bookshops for Finland. Authorized Sales Agent: Anthony Rudkin Associates for the Middle East and North Africa. For general information on IDG Books Worldwide’s books in the U.S., please call our Consumer Customer Service department at 800-762-2974. For reseller information, including discounts and premium sales, please call our Reseller Customer Service department at 800-434-3422. For information on where to purchase IDG Books Worldwide’s books outside the U.S., please contact our International Sales department at 317-596-5530 or fax 317-596-5692. For consumer information on foreign language translations, please contact our Customer Service department at 800-434-3422, fax 317-596-5692, or e-mail rights@idgbooks.com. For information on licensing foreign or domestic rights, please phone +1-650-655-3109. For sales inquiries and special prices for bulk quantities, please contact our Sales department at 650-655-3200 or write to the address above. For information on using IDG Books Worldwide’s books in the classroom or for ordering examination copies, please contact our Educational Sales department at 800-434-2086 or fax 317-596-5499. For press review copies, author interviews, or other publicity information, please contact our Public Relations department at 650-655-3000 or fax 650-655-3299. For authorization to photocopy items for corporate, personal, or educational use, please contact Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, or fax 978-750-4470. Library of Congress Cataloging-in-Publication Data Harold, Elliote Rusty. XML bible / Elliote Rusty Harold. p. cm. ISBN 0-7645-3236-7 (alk. paper) 1. XML (Document markup language) I. Title. QA76.76.H94H34 1999 99-31021 005.7’2 dc21 CIP LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND AUTHOR HAVE USED THEIR BEST EFFORTS IN PREPARING THIS BOOK. THE PUBLISHER AND AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS BOOK AND SPECIFICALLY DISCLAIM ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. THERE ARE NO WARRANTIES WHICH EXTEND BEYOND THE DESCRIPTIONS CONTAINED IN THIS PARAGRAPH. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES REPRESENTATIVES OR WRITTEN SALES MATERIALS. THE ACCURACY AND COMPLETENESS OF THE INFORMATION PROVIDED HEREIN AND THE OPINIONS STATED HEREIN ARE NOT GUARANTEED OR WARRANTED TO PRODUCE ANY PARTICULAR RESULTS, AND THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY INDIVIDUAL. NEITHER THE PUBLISHER NOR AUTHOR SHALL BE LIABLE FOR ANY LOSS OF PROFIT OR ANY OTHER COMMERCIAL DAMAGES, INCLUDING BUT NOT LIMITED TO SPECIAL, INCIDENTAL, CONSEQUENTIAL, OR OTHER DAMAGES. Trademarks: All brand names and product names used in this book are trade names, service marks, trademarks, or registered trademarks of their respective owners. IDG Books Worldwide is not associated with any product or vendor mentioned in this book. is a registered trademark or trademark under exclusive license to IDG Books Worldwide, Inc. from International Data Group, Inc. in the United States and/or other countries. 3236-7 FM.F.qc 6/30/99 2:59 PM Page iv Eleventh Annual Computer Press Awards 1995 Tenth Annual Computer Press Awards 1994 Eighth Annual Computer Press Awards 1992 Ninth Annual Computer Press Awards 1993 IDG is the world’s leading IT media, research and exposition company. Founded in 1964, IDG had 1997 revenues of $2.05 billion and has more than 9,000 employees worldwide. IDG offers the widest range of media options that reach IT buyers in 75 countries representing 95% of worldwide IT spending. IDG’s diverse product and services portfolio spans six key areas including print publishing, online publishing, expositions and conferences, market research, education and training, and global marketing services. More than 90 million people read one or more of IDG’s 290 magazines and newspapers, including IDG’s leading global brands — Computerworld, PC World, Network World, Macworld and the Channel World family of publications. IDG Books Worldwide is one of the fastest-growing computer book publishers in the world, with more than 700 titles in 36 languages. The “ For Dummies ® ” series alone has more than 50 million copies in print. IDG offers online users the largest network of technology-specific Web sites around the world through IDG.net (http://www.idg.net), which comprises more than 225 targeted Web sites in 55 countries worldwide. International Data Corporation (IDC) is the world’s largest provider of information technology data, analysis and consulting, with research centers in over 41 countries and more than 400 research analysts worldwide. IDG World Expo is a leading producer of more than 168 globally branded conferences and expositions in 35 countries including E3 (Electronic Entertainment Expo), Macworld Expo, ComNet, Windows World Expo, ICE (Internet Commerce Expo), Agenda, DEMO, and Spotlight. IDG’s training subsidiary, ExecuTrain, is the world’s largest computer training company, with more than 230 locations worldwide and 785 training courses. IDG Marketing Services helps industry-leading IT companies build international brand recognition by developing global integrated marketing programs via IDG’s print, online and exposition products worldwide. Further information about the company can be found at www.idg.com. 1/24/99 Welcome to the world of IDG Books Worldwide. IDG Books Worldwide, Inc., is a subsidiary of International Data Group, the world’s largest publisher of computer-related information and the leading global provider of information services on information technology. IDG was founded more than 30 years ago by Patrick J. McGovern and now employs more than 9,000 people worldwide. IDG publishes more than 290 computer publications in over 75 countries. More than 90 million people read one or more IDG publications each month. Launched in 1990, IDG Books Worldwide is today the #1 publisher of best-selling computer books in the United States. We are proud to have received eight awards from the Computer Press Association in recognition of editorial excellence and three from Computer Currents’ First Annual Readers’ Choice Awards. Our best- selling For Dummies ® series has more than 50 million copies in print with translations in 31 languages. IDG Books Worldwide, through a joint venture with IDG’s Hi-Tech Beijing, became the first U.S. publisher to publish a computer book in the People’s Republic of China. In record time, IDG Books Worldwide has become the first choice for millions of readers around the world who want to learn how to better manage their businesses. Our mission is simple: Every one of our books is designed to bring extra value and skill-building instructions to the reader. Our books are written by experts who understand and care about our readers. The knowledge base of our editorial staff comes from years of experience in publishing, education, and journalism — experience we use to produce books to carry us into the new millennium. In short, we care about books, so we attract the best people. We devote special attention to details such as audience, interior design, use of icons, and illustrations. And because we use an efficient process of authoring, editing, and desktop publishing our books electronically, we can spend more time ensuring superior content and less time on the technicalities of making books. You can count on our commitment to deliver high-quality books at competitive prices on topics you want to read about. At IDG Books Worldwide, we continue in the IDG tradition of delivering quality for more than 30 years. You’ll find no better book on a subject than one from IDG Books Worldwide. John Kilcullen Steven Berkowitz Chairman and CEO President and Publisher IDG Books Worldwide, Inc. IDG Books Worldwide, Inc. 3236-7 FM.F.qc 6/30/99 2:59 PM Page v Credits Acquisitions Editor John Osborn Development Editor Terri Varveris Contributing Writer Heather Williamson Technical Editor Greg Guntle Copy Editors Amy Eoff Amanda Kaufman Nicole LeClerc Victoria Lee Production IDG Books Worldwide Production Proofreading and Indexing York Production Services About the Author Elliotte Rusty Harold is an internationally respected writer, programmer, and educator both on the Internet and off. He got his start by writing FAQ lists for the Macintosh newsgroups on Usenet, and has since branched out into books, Web sites, and newsletters. He lectures about Java and object-oriented programming at Polytechnic University in Brooklyn. His Cafe con Leche Web site at http:// metalab.unc.edu/xml/ has become one of the most popular independent XML sites on the Internet. Elliotte is originally from New Orleans where he returns periodically in search of a decent bowl of gumbo. However, he currently resides in the Prospect Heights neighborhood of Brooklyn with his wife Beth and cats Charm (named after the quark) and Marjorie (named after his mother-in-law). When not writing books, he enjoys working on genealogy, mathematics, and quantum mechanics. His previous books include The Java Developer’s Resource, Java Network Programming, Java Secrets, JavaBeans, XML: Extensible Markup Language, and Java I/O. 3236-7 FM.F.qc 6/30/99 2:59 PM Page vi For Ma, a great grandmother 3236-7 FM.F.qc 6/30/99 2:59 PM Page vii 3236-7 FM.F.qc 6/30/99 2:59 PM Page viii Preface Welcome to the XML Bible. After reading this book I hope you’ll agree with me that XML is the most exciting development on the Internet since Java, and that it makes Web site development easier, more productive, and more fun. This book is your introduction to the exciting and fast growing world of XML. In this book, you’ll learn how to write documents in XML and how to use style sheets to convert those documents into HTML so legacy browsers can read them. You’ll also learn how to use document type definitions (DTDs) to describe and validate documents. This will become increasingly important as more and more browsers like Mozilla and Internet Explorer 5.0 provide native support for XML. About You the Reader Unlike most other XML books on the market, the XML Bible covers XML not from the perspective of a software developer, but rather that of a Web-page author. I don’t spend a lot of time discussing BNF grammars or parsing element trees. Instead, I show you how you can use XML and existing tools today to more efficiently produce attractive, exciting, easy-to-use, easy-to-maintain Web sites that keep your readers coming back for more. This book is aimed directly at Web-site developers. I assume you want to use XML to produce Web sites that are difficult to impossible to create with raw HTML. You’ll be amazed to discover that in conjunction with style sheets and a few free tools, XML enables you to do things that previously required either custom software costing hundreds to thousands of dollars per developer, or extensive knowledge of programming languages like Perl. None of the software in this book will cost you more than a few minutes of download time. None of the tricks require any programming. What You Need to Know XML does build on HTML and the underlying infrastructure of the Internet. To that end, I will assume you know how to use ftp files, send email, and load URLs in your Web browser of choice. I will also assume you have a reasonable knowledge of HTML at about the level supported by Netscape 1.1. On the other hand, when I discuss newer aspects of HTML that are not yet in widespread use like cascading style sheets, I will cover them in depth. 3236-7 FM.F.qc 6/30/99 2:59 PM Page ix x Preface To be more specific, in this book I assume that you can: ✦ Write a basic HTML page including links, images, and text using a text editor. ✦ Place that page on a Web server. On the other hand, I do not assume that you: ✦ Know SGML. In fact, this preface is almost the only place in the entire book you’ll see the word SGML used. XML is supposed to be simpler and more widespread than SGML. It can’t be that if you have to learn SGML first. ✦ Are a programmer, whether of Java, Perl, C, or some other language, XML is a markup language, not a programming language. You don’t need to be a programmer to write XML documents. What You’ll Learn This book has one primary goal; to teach you to write XML documents for the Web. Fortunately, XML has a decidedly flat learning curve, much like HTML (and unlike SGML). As you learn a little you can do a little. As you learn a little more, you can do a little more. Thus the chapters in this book build steadily on each other. They are meant to be read in sequence. Along the way you’ll learn: ✦ How an XML document is created and delivered to readers. ✦ How semantic tagging makes XML documents easier to maintain and develop than their HTML equivalents. ✦ How to post XML documents on Web servers in a form everyone can read. ✦ How to make sure your XML is well-formed. ✦ How to use international characters like _ and _ in your documents. ✦ How to validate documents with DTDs. ✦ How to use entities to build large documents from smaller parts. ✦ How attributes describe data. ✦ How to work with non-XML data. ✦ How to format your documents with CSS and XSL style sheets. ✦ How to connect documents with XLinks and Xpointers. ✦ How to merge different XML vocabularies with namespaces. ✦ How to write metadata for Web pages using RDF. 3236-7 FM.F.qc 6/30/99 2:59 PM Page x xi Preface In the final section of this book, you’ll see several practical examples of XML being used for real-world applications including: ✦ Web Site Design ✦ Push ✦ Vector Graphics ✦ Genealogy How the Book Is Organized This book is divided into five parts and includes three appendixes: I. Introducing XML II. Document Type Definitions III. Style Languages IV. Supplemental Technologies V. XML Applications By the time you’re finished reading this book, you’ll be ready to use XML to create compelling Web pages. The five parts and the appendixes are described below. Part I: Introducing XML Part I consists of Chapters 1 through 7. It begins with the history and theory behind XML, the goals XML is trying to achieve, and shows you how the different pieces of the XML equation fit together to create and deliver documents to readers. You’ll see several compelling examples of XML applications to give you some idea of the wide applicability of XML, including the Vector Markup Language (VML), the Resource Description Framework (RDF), the Mathematical Markup Language (MathML), the Extensible Forms Description Language (XFDL), and many others. Then you’ll learn by example how to write XML documents with tags you define that make sense for your document. You’ll see how to edit them in a text editor, attach style sheets to them, and load them into a Web browser like Internet Explorer 5.0 or Mozilla. You’ll even learn how you can write XML documents in languages other than English, even languages that aren’t written remotely like English, such as Chinese, Hebrew, and Russian. 3236-7 FM.F.qc 6/30/99 2:59 PM Page xi xii Preface Part II: Document Type Definitions Part II consists of Chapters 8 through 11, all of which focus on document type definitions (DTDs). An XML document may optionally contain a DTD that specifies which elements are and are not allowed in an XML document. The DTD specifies the exact context and structure of those elements. A validating parser can read a document and compare it to its DTD, and report any mistakes it finds. This enables document authors to make sure that their work meets any necessary criteria. In Part II, you’ll learn how to attach a DTD to a document, how to validate your documents against their DTDs, and how to write your own DTDs that solve your own problems. You’l learn the syntax for declaring elements, attributes, entities, and notations. You’ll see how you can use entity declarations and entity references to build both a document and its DTD from multiple, independent pieces. This allows you to make long, hard-to-follow documents much simpler by separating them into related modules and components. And you’ll learn how to integrate other forms of data like raw text and GIF image files in your XML document. Part III: Style Languages Part III consists of Chapters 12 through 15. XML markup only specifies what’s in a document. Unlike HTML, it does not say anything about what that content should look like. Information about an XML document’s appearance when printed, viewed in a Web browser, or otherwise displayed is stored in a style sheet. Different style sheets can be used for the same document. You might, for instance, want to use a style sheet that specifies small fonts for printing, another one that uses larger fonts for on-screen use, and a third with absolutely humongous fonts to project the document on a wall at a seminar. You can change the appearance of an XML docu- ment by choosing a different style sheet without touching the document itself. Part III describes in detail the two style sheet languanges in broadest use on the Web, Cascading Style Sheets (CSS) and the Extensible Style Language (XSL). CSS is a simple style-sheet language originally designed for use with HTML. CSS exists in two versions: CSS Level 1 and CSS Level 2. CSS Level 1 provides basic information about fonts, color, positioning, and text properties, and is reasonably well supported by current Web browsers for HTML and XML. CSS Level 2 is a more recent standard that adds support for aural style sheets, user interface styles, international and bi-directional text, and more. CSS is a relatively simple standard that spplies fixed style rules to the contents of particular elements. XSL, by contrast, is a more complicated and more powerful style language that cannot only apply styles to the contents of elements but can also rearrange elements, add boilerplate text, and transform documents in almost arbitrary ways. XSL is divided into two parts: a transformation language for converting XML trees to alternative trees, and a formatting language for specifying the appearance of the elements of an XML tree. Currently, the transformation language is better supported by most tools 3236-7 FM.F.qc 6/30/99 2:59 PM Page xii [...]... 3: Your First XML Document 49 Hello XML 49 Creating a Simple XML Document 50 Saving the XML File 50 Loading the XML File into a Web Browser 51 Exploring the Simple XML Document 52 Assigning Meaning to XML Tags 54 Writing a Style Sheet for an XML Document .55 Attaching a Style Sheet to an XML Document 56 Chapter 4: Structuring Data... Organization of the XML Data .62 XMLizing the Data 65 Starting the Document: XML Declaration and Root Element 65 XMLizing League, Division, and Team Data 67 XMLizing Player Data 69 XMLizing Player Statistics 70 Putting the XML Document Back Together Again 72 The Advantages of the XML Format 80 Preparing a Style Sheet for Document Display ... Part I: Introducing XML 1 Chapter 1: An Eagle’s Eye View of XML 3 Chapter 2: An Introduction to XML Applications 17 Chapter 3: Your First XML Document 49 Chapter 4: Structuring Data 59 Chapter 5: Attributes, Empty Tags, and XSL 95 Chapter 6: Well-Formed XML Documents Chapter 7: Foreign Languages and Non-Roman Text 161 Part II: Document Type Definitions... XML It explains in general what XML is and how it is used It shows you how the different pieces of the XML equation fit together, and how an XML document is created and delivered to readers What Is XML? XML stands for Extensible Markup Language (often written as eXtensibleMarkup Language to justify the acronym) XML is a set of rules for defining semantic tags that break a document into parts and identify... categories What is XML? Why are developers excited about XML? The life of an XML document Related technologies ✦ ✦ ✦ ✦ 3236-7 ch01.F.qc 4 6/29/99 1:03 PM Page 4 Part I ✦ Introducing XML The tags you create can be documented in a Document Type Definition (DTD) You’ll learn more about DTDs in Part II of this book For now, think of a DTD as a vocabulary and a syntax for certain kinds of documents For example,... (RDF) is an XML application used to embed meta-data in XML and HTML documents Meta-data is information about a document, such as the author, date, and title of a work, rather than the work itself All of these can be added to your own XML- based markup languages to extend their power and utility Part V: XML Applications Part V, which consists of Chapters 20–23, shows you four practical uses of XML in different... Character System 182 How to Write XML in Unicode 183 Inserting Characters in XML Files with Character References .183 Converting to and from Unicode 184 How to Write XML in Other Character Sets 185 Part II: Document Type Definitions 189 Chapter 8: Document Type Definitions and Validity 191 Document Type Definitions 191 Document Type Declarations .192... VoxML 32 3236-7 FM.F.qc xxii 6/30/99 2:59 PM Page xxii Contents Open Financial Exchange 34 Extensible Forms Description Language 36 Human Resources Markup Language 38 Resource Description Framework 40 XML for XML 42 XSL 42 XLL 43 DCD 43 Behind-the-Scene Uses of XML .44 Chapter 3: Your First XML Document... xxi Contents Preface ix Acknowledgments xvii Part I: Introducing XML 1 Chapter 1: An Eagle’s Eye View of XML 3 What Is XML? 3 XML Is a Meta-Markup Language 3 XML Describes Structure and Semantics, Not Formatting 4 Why Are Developers Excited about XML? 6 Design of Domain-Specific Markup Languages 6 Self-Describing Data ... 109 XSL Style Sheet Templates 110 The Body of the Document .111 The Title 113 Leagues, Divisions, and Teams 115 Players 120 Separation of Pitchers and Batters 122 CSS or XSL? 130 Chapter 6: Well-Formed XML Documents 133 #1: The XML declaration must begin the document 144 #2: Use Both Start and End Tags in Non-Empty Tags . Framework 40 XML for XML 42 XSL 42 XLL 43 DCD 43 Behind-the-Scene Uses of XML 44 Chapter 3: Your First XML Document 49 Hello XML 49 Creating a Simple XML Document. of the XML Data 62 XMLizing the Data 65 Starting the Document: XML Declaration and Root Element 65 XMLizing League, Division, and Team Data 67 XMLizing