< Day Day Up >
XML Basics
Although the name Extensible Markup Language (XML) sounds a bit cryptic, don't
worry: the format itself is actually quite easy to understand. In a nutshell, XML provides
a way of formatting and structuring information so that receiving applications can easily
interpret and use that data when it's moved from place to place. Although you may not
realize it, you already have plenty of experience structuring and organizing information.
Consider the following example.
Suppose you want to write a letter to a friend. You structure your thoughts (information)
in a format you know your friend will recognize. You begin by writing words on a piece
of paper, starting in the upper-left corner, and breaking your thoughts into paragraphs,
sentences, and words. You could use images to convey your thoughts, or write your
words in a circle, but that probably would confuse your friend. By writing your letter in a
format familiar to your friend, you can be confident that your message will be
conveyed—that is, you will have transferred your thoughts (data/information) to the
letter's recipient.
You can use XML in much the same way—as a format for conveying information. For
example, if you want to send data out of Flash for processing by a Web server, you
format that data as XML. The server then interprets the XML-formatted data and uses it
in the manner intended. Without XML, you could send chunks of data to a server, but the
server probably wouldn't know what to do with the first chunk or the second, or even how
the first chunk related to the second. XML gives meaning to these disparate bits of data
so the server can work with them in an organized and intelligent manner.
XML's simple syntax resembles HTML in that it employs tags, attributes, and values—
but the similarity ends there. Where HTML uses predefined tags (for example, <body>,
<head>, and <html>), in XML you create your own tags—that is, you don't pull them
from an existing library of tag names. Look at the following simple XML document:
<MyFriends>
<Name Gender="female">Kelly Makar</Name>
<Name Gender="male">Mike Grundvig</Name>
<Name Gender="male">Free Makar</Name>
</MyFriends>
Each complete tag (such as <Name></Name>) in XML is called a node, and any XML-
formatted data is called an XML document. Each XML document can contain only one
root node; the document just shown has a root node called MyFriends, which in turn has
three child nodes. The first child node has a node name of Name and a node value of
Kelly Makar. The word Gender in each child node is an attribute. Attributes are optional,
and each node can have an unlimited number of attributes. You'll typically use attributes
to store small bits of information that are not necessarily displayed onscreen—for
example, a user identification number.
The tags in this example (which we made up and defined) give meaning to the bits of
information shown (Kelly Makar, Mike Grundvig, and Free Makar).
The next XML document shows a more extended use of XML:
<AddressBook>
<Person>
<Name>Kelly Makar</Name>
<Street>121 Baker Street</Street>
<City>Some City</City>
<State>North Carolina</State>
</Person>
<Person>
<Name>Tripp Carter</Name>
<Street>777 Another Street</Street>
<City>Elizabeth City</City>
<State>North Carolina</State>
</Person>
</AddressBook>
This example shows how the data in an address book would be formatted as XML. If
there were 600 people listed in the address book, the Person node would appear 600 times
with the same structure.
So how do you create your own nodes and structure? How does the destination (ASP
page, socket, and so on) know how the document is formatted? And how does it know
what to do with each piece of information? The simple answer is that this intelligence has
to be built into your destination. Thus, if you were planning to build an address book in
Flash and wanted the information it contained to be saved in a database, you would send
an XML-formatted version of that data to an ASP page (or another scripted page of
choice), which would then parse that information and insert it into the appropriate fields
in a database. The important thing to remember is that the ASP page must be designed to
deal with data in this way. Because XML is typically used to transfer rather than store
information, the address book data would be stored as disparate information in database
fields, rather than stored as XML. When needed again, that information could be
extracted from the database, formatted as XML by a scripted page, and sent along to
Flash or any other application that requested it.
Web pages often use text files that contain XML-formatted information—for example, a
static XML file for storing information about which ASP pages to call, or what port and
IP to connect to when attempting to connect with a socket server.
N
ow that you know the basics of the XML format, here are some rules you need to
follow when you begin using it:
• You cannot begin node names with the letters XML; many XML parsers break
when they see XML at the beginning of a node name.
• You must properly terminate every node—for example, you would terminate
<Name> with </Name>. The slash (/) inside the final tag indicates that a node is
completed (terminated).
• You must URL-encode all special characters—which you can do by using the
escape() function in Flash. Many parsers interpret certain unencoded characters as
the start of a new node that is not terminated properly (because it wasn't a node in
the first place). An XML document with non-terminated nodes won't pass through
an XML parser completely. Attributes are less forgiving than text nodes because
they can fail to pass through the parser on characters such as a carriage return or
an ampersand. If you URL-encode the text, you won't experience this trouble.
• Most XML parsers are case sensitive, which means that all tags of the same type
must have the same case. If you start a node with <Name> and terminate it with
</name>, you're asking for trouble.
• You can have only one root node.
One more thing to note before you begin working with XML is that the clean XML
structure shown in these examples is not necessary. The carriage returns and tabs are
there to make it easier for us to read. These tabs and carriage returns are called white
space, and you can add or delete white space without affecting the overall structure.
< Day Day Up >
. (such as <Name></Name>) in XML is called a node, and any XML-
formatted data is called an XML document. Each XML document can contain only one
root. the basics of the XML format, here are some rules you need to
follow when you begin using it:
• You cannot begin node names with the letters XML; many XML