<?xml version="1.0" encoding="UTF-16"?>
<vu:instructors
xmlns:vu="http://www.vu.com/empDTD"
xmlns="http://www.gu.au/empDTD"
xmlns:uky="http://www.uky.edu/empDTD">
<uky:faculty
uky:title="assistant professor"
uky:name="John Smith"
uky:department="Computer Science"/>
<academicStaff title="lecturer"
name="Mate Jones"
school="Information Technology"/>
</vu:instructors>
2.5 Addressing and Querying XML Documents
In relational databases, parts of a database can be selected and retrieved us- ing query languages such as SQL. The same is true for XML documents, for which there exist a number of proposals for query languages, such as XQL, XML-QL, and XQuery.
The central concept of XML query languages is apath expressionthat spec- ifies how a node, or a set of nodes, in the tree representation of the XML document can be reached. We introduce path expressions in the form of XPath because they can be used for purposes other than querying, namely, for transforming XML documents.
XPath is a language for addressing parts of an XML document. It operates on the tree data model of XML and has a non-XML syntax. The key concepts are path expressions. They can be
• Absolute (starting at the root of the tree); syntactically they begin with the symbol/, which refers to the root of the document, situated one level above the root element of the document;
• Relative to a context node.
Consider the following XML document:
<?xml version="1.0" encoding="UTF-16"?>
<!DOCTYPE library PUBLIC "library.dtd">
<library location="Bremen">
author
title title title
book book book name
Artificial Intelligence Smart
William
Computation of Theory Web Modern Sevices Artificial Intelligence Wise
Henry Cynthia
Singleton author
title title book book name library
root
name author
title book location
Bremen
Revised TechnologyBrowser Web
Semantic The
Figure 2.2 Tree representation of a library document
<author name="Henry Wise">
<book title="Artificial Intelligence"/>
<book title="Modern Web Services"/>
<book title="Theory of Computation"/>
</author>
<author name="William Smart">
<book title="Artificial Intelligence"/>
</author>
<author name="Cynthia Singleton">
<book title="The Semantic Web"/>
<book title="Browser Technology Revised"/>
</author>
</library>
Its tree representation is shown in figure 2.2.
In the following we illustrate the capabilities of XPath with a few examples of path expressions.
1. Address allauthorelements.
/library/author
This path expression addresses allauthorelements that are children of thelibraryelement node, which resides immediately below the root.
2.5 Addressing and Querying XML Documents 47
Using a sequence /t1/ . . . /tn, where each ti+1 is a child node ofti, we define a path through the tree representation.
2. An alternative solution for the previous example is //author
Here//says that we should consider all elements in the document and check whether they are of typeauthor. In other words, this path expres- sion addresses allauthorelements anywhere in the document. Because of the specific structure of our XML document, this expression and the previous one lead to the same result; however, they may lead to different results, in general.
3. Address thelocationattribute nodes withinlibraryelement nodes.
/library/@location
The symbol@is used to denote attribute nodes.
4. Address alltitleattribute nodes withinbookelements anywhere in the document, which have the value “Artificial Intelligence” (see figure 2.3).
//book/@title="Artificial Intelligence"
5. Address all books with title “Artificial Intelligence” (see figure 2.4).
//book[@title="Artificial Intelligence"]
We call a test within square brackets afilter expression. It restricts the set of addressed nodes.
Note the difference between this expression and the one in query 4. Here we addressbookelements the title of which satisfies a certain condition.
In query 4 we collectedtitleattribute nodes ofbookelements. A com- parison of figures 2.3 and 2.4 illustrates the difference.
6. Address the firstauthorelement node in the XML document.
//author[1]
author
title title title
book book name book
location
Henry
Computationof Theory Web Modern Sevices Artificial Intelligence Wise Bremen
name
Artificial Intelligence Smart
William name
author
title book library
root
book
Revised Technology
Browser Web
Semantic The Cynthia Singleton
book
title title author
Figure 2.3 Tree representation of query 4
author
title title title
book book book name location
Bremen
ComputationTheoryof Web
Modern Sevices Artificial Intelligence Wise Henry
name
Artificial Intelligence Smart
William name
author
title book library
root
book
Revised TechnologyBrowser Web
Semantic The Cynthia Singleton
author
title title book
Figure 2.4 Tree representation of query 5