XML Mini-Tutorial Michael I. Schwartzbach Copyright © 2000 BRICS, University of Aarhus http://www.brics.dk/~mis/ITU/XML/ What is XML? HTML vs. XML A conceptual view of XML A concrete view of XML Applications of XML XML technologies Namespaces The recipe example Schema languages A schema for recipes XLink, XPointer, and XPath Pointing at recipes XML-QL Querying the recipes XSLT A style sheet for recipes Exercises XML Mini-Tutorial http://www.brics.dk/~mis/ITU/XML/ [18/09/2000 14:24:26] HTML, JavaScript, and XML Mini-Tutorials Michael I. Schwartzbach Copyright © 2000 BRICS, University of Aarhus http://www.brics.dk/~mis/ITU/ These mini-tutorials are created as part of the course Internet Programming at the IT-University of Copenhagen. HTML (PDF) JavaScript (PDF) XML (PDF) HTML, JavaScript, and XML Mini-Tutorials http://www.brics.dk/~mis/ITU/XML/info.html [18/09/2000 14:24:28] What is XML? XML is a framework for defining markup languages: there is no fixed collection of markup tags; ● each XML language is targeted at different application domains; ● the languages will share many features; ● there is a common set of tools for processing such languages. ● XML is not a replacement for HTML: HTML should ideally be just another XML language; ● in fact, XHTML is just that; ● XHTML is a (very popular) XML language for hypertext markup. ● XML is designed to: seperate syntax from semantics; ● support internationalization (Unicode) and platform independence; ● be the future of structured information, including databases. ● XML: what is it? http://www.brics.dk/~mis/ITU/XML/whatis.html [18/09/2000 14:24:29] HTML vs. XML Consider the following recipe collection published in HTML: <h1>Rhubarb Cobbler</h1> <h2>Maggie.Herrick@bbs.mhv.net</h2> <h3>Wed, 14 Jun 95</h3> Rhubarb Cobbler made with bananas as the main sweetener. It was delicious. Basicly it was <table> <tr><td> 2 1/2 cups <td> diced rhubarb (blanched with boiling water, drain) <tr><td> 2 tablespoons <td> sugar <tr><td> 2 <td> fairly ripe bananas sliced 1/4" round <tr><td> 1/4 teaspoon <td> cinnamon <tr><td> dash of <td> nutmeg </table> Combine all and use as cobbler, pie, or crisp. Related recipes: <a href="#GardenQuiche">Garden Quiche</a> There are many problems with this approach: the semantics is encoded into text formatting tags; ● there is no means of checking that a recipe is encoded correctly; ● it is difficult to change the layout of recipes (CSS is not enough). ● It would be much better to invent a special recipe markup language: <recipe id="117" category="dessert"> <title>Rhubarb Cobbler</title> <author><email>Maggie.Herrick@bbs.mhv.net</email></author> <date>Wed, 14 Jun 95</date> <description> Rhubarb Cobbler made with bananas as the main sweetener. It was delicious. </description> <ingredients> . XML vs. HTML http://www.brics.dk/~mis/ITU/XML/htmlvsxml.html (1 of 2) [18/09/2000 14:24:30] </ingredients> <preparation> Combine all and use as cobbler, pie, or crisp. </preparation> <related url="#GardenQuiche">Garden Quiche</related> </recipe> This example illustrates: the markup tags are chosen purely for logical structure; ● this is just one choice of markup detail level; ● we need a kind of "grammar" for XML recipe collections; ● we need a stylesheet to define presentation semantics. ● XML vs. HTML http://www.brics.dk/~mis/ITU/XML/htmlvsxml.html (2 of 2) [18/09/2000 14:24:30] A conceptual view of XML An XML document is a labeled tree. a leaf node is character data (a text string) - the actual data, ❍ a processing instruction - annotations for various processors, typically in document header, ❍ a comment - never any semantics attached, ❍ an entity declaration - simple macros. ❍ ● an internal node is an element, which is labeled with a name, and ❍ a set of attributes, each consisting of a name and a value. ❍ ● Often, comments and entity declarations are not explicitly represented in the tree. XML: a conceptual view http://www.brics.dk/~mis/ITU/XML/conceptual.html [18/09/2000 14:24:31] A concrete view of XML An XML document is a (Unicode) text with markup tags and other meta-information. Markup tags denote elements: .<foo attr="val" .> .</foo> . | | | | | | | a matching element end tag | | the contents of the element | an attribute with name attr and value val, values enclosed by ' or " an element start tag with name foo There is a short-hand notation for empty elements: .<foo attr="val" ./> . Note: XML is case sensitive!! An XML document must be well-formed: start and end tags must match; ● element tags must be properly nested; ● and some more subtle syntactical requirements. ● Special characters can be escaped using Unicode character references: & yields &; ● < and < both yield <. ● CDATA Sections are an alternative to escaping many characters: <![CDATA[<greeting>Hello, world!</greeting>]]> ● The strange syntax is a legacy from SGML . The following service checks well-formedness of an XML document (given a full URL): XML: a concrete view http://www.brics.dk/~mis/ITU/XML/concrete.html [18/09/2000 14:24:32] process clear Applications of XML There are already hundreds of serious applications of XML. XHTML W3C's XMLization of HTML 4.0. Example XHTML document: <?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head><title>Hello world!</title></head> <body><p>foobar</p></body> </html> CML Chemical Markup Language. Example CML document snippet: <molecule id="METHANOL"> <atomArray> <stringArray builtin="elementType">C O H H H H</stringArray> <floatArray builtin="x3" units="pm"> -0.748 0.558 -1.293 -1.263 -0.699 0.716 </floatArray> </atomArray> </molecule> WML Wireless Markup Language for WAP services: <?xml version="1.0"?> <wml> <card id="Card1" title="Wap-UK.com"> <p> Hello World </p> </card> </wml> There is a long list of many other XML applications. XML: applications http://www.brics.dk/~mis/ITU/XML/applications.html [18/09/2000 14:24:33] XML technologies Just a notation for trees is not enough: the real force of XML is generic languages and tools! ● The XML vision offers: namespaces - to avoid name clashes when a document uses several "sub-languages"; schemas - grammars to define classes of documents; linking between documents - a generalization of HTML anchors and links; addressing parts of documents - it is not enough that only the author can place anchors; transformation - conversion from one document class to another; querying - extraction of information. The site www.xmlsoftware.com has a comprehensive list of available XML tools. XML: technologies http://www.brics.dk/~mis/ITU/XML/tech.html [18/09/2000 14:24:34] Namespaces Consider an XML language WidgetML which uses XHTML as a sublanguage for help messages: <widget type="gadget"> <head size="medium"/> <big><subwidget ref="gizmo"/></big> <info> <head> <title>Description of gadget</title> </head> <body> <h1>Gadget</h1> A gadget contains a big gizmo </body> </info> </widget> We have some problems here: the meaning of head and big depends on the context; ● this complicates things for processors and might even cause ambiguities; ● the root of the problem is: one common name-space. ● The solution is to introduce explicit namespace declarations: <widget xmlns="http://www.widget.org" xmlns:xhtml="http://www.w3.org/TR/xhtml1" type="gadget"> <head size="medium"/> <big><subwidget ref="gizmo"/></big> <info> <xhtml:head> <xhtml:title>Description of gadget</xhtml:title> </xhtml:head> <xhtml:body> <xhtml:h1>Gadget</xhtml:h1> A gadget contains a big gizmo </xhtml:body> </info> </widget> Do not be confused by the use of URI for namespaces: they are not supposed to point to anything; ● it is simply the cheapest way of getting unqiue names; ● XML: namespaces http://www.brics.dk/~mis/ITU/XML/namespaces.html (1 of 2) [18/09/2000 14:24:35] [...]... the XML tree http://www.brics.dk/~mis/ITU /XML/ xpathrecipe.html [18/09/2000 14:24:39] XML: XML- QL XML- QL XML- QL is a query language for XML documents: XML document can be seen as generalizations of database relations; q XML- QL is a similar generalization of SQL; q it can extract data from exisiting XML documents and construct new XML documents Relations are special, restricted cases of XML trees: q XML. .. released until 2001 http://www.brics.dk/~mis/ITU /XML/ xmlql.html [18/09/2000 14:24:40] XML: querying the recipes Querying the recipes The following XML- QL queries extract information from the XML recipe document: WHERE $t IN "karoline .xml" CONSTRUCT $t < ?xml version="1.0"?< Filokurve med tigerrejer... collection as an XML document http://www.brics.dk/~mis/ITU /XML/ recipe.html [18/09/2000 14:24:35] XML: schemas Schema languages The syntax of a new XML language must be formalized: q this is similar to the formal syntax of a programming language; q however, usual context-free grammars are not expressive enough; q XML languages are described using schemas A modern schema language: q is itself an XML language... farsbrød og agurkesalat < /XML> CONSTRUCT { WHERE $t IN "karoline .xml" CONSTRUCT $t } http://www.brics.dk/~mis/ITU /XML/ xmlqlrecipe.html (1 of 7) [18/09/2000 14:24:42] XML: querying the recipes < ?xml version="1.0"?> ... navn="laks"> Laksemousse < /XML> http://www.brics.dk/~mis/ITU /XML/ xmlqlrecipe.html (7 of 7) [18/09/2000 14:24:42] XML: XSLT XSLT An XSLT style sheets transforms an XML document into another: q if the target language is XHTML, then this is similar to a CSS style sheet; q however, often the target language is really another XML language An XSLT style sheet: q uses pattern matching... URL): q http://www.brics.dk/~mis/ITU /XML/ karoline .xml http://www.brics.dk/~mis/ITU /XML/ xslt.html [18/09/2000 14:24:43] process clear XML: style sheet for recipes A style sheet for recipes The following XSLT style sheet illustrates many features (the two namespaces are in different colors): ... http://www.brics.dk/~mis/ITU /XML/ xsltrecipe.html (2 of 2) [18/09/2000 14:24:44] XML: exercises Exercises 1 Browse through the collection of XML applications 2 Add the recipe for tigerrejer to the XML recipe collection (save as file) Check that the result is well-formed XML 3 Apply the given style sheet to this extended collection 4 Add the necessary HTML to the following style sheet: Islagkage med chokolade . JavaScript (PDF) XML (PDF) HTML, JavaScript, and XML Mini-Tutorials http://www.brics.dk/~mis/ITU /XML/ info.html [18/09/2000 14:24:28] What is XML? XML is a framework. the XML tree. XML: pointing at recipes http://www.brics.dk/~mis/ITU /XML/ xpathrecipe.html [18/09/2000 14:24:39] XML- QL XML- QL is a query language for XML