Aggregating RSS Feeds Using XSL

Một phần của tài liệu Pro PHP XML and Web Services phần 5 pptx (Trang 41 - 49)

This example demonstrates how to use the XSL extensions and some of the XSLT functionality using RSS for the source data, since it is a data source everyone should be able to access. I will not explain the structure and workings of RSS (covered in detail in Chapter 14) in this example because the focus is on using the extension and XSLT functionality.

This example will show how to combine a couple of the PHP news feeds into a single XML data source and store them locally. The feed to be accessed or identified in a configuration file is named siteconfig.xmland contains the following document:

<?xml version="1.0"?>

<sites>

<site>

<name>PHP General</name>

<url>http://news.php.net/group.php?group=php.general&amp;format=rss</url>

</site>

<site>

<name>PHP Pear Dev</name>

<url>http://news.php.net/group.php?group=php.pear.dev&amp;format=rss</url>

</site>

</sites>

Two sites have been configured. The first one is the feed for the PHP General newsgroup, and the second is the feed for the PHP PEAR Dev group. The urlelement for these groups points to the locations of the respective RSS feeds from which you will pull the XML data.

This data is used by a style sheet, identified by the file rsscache.xsl, to transform the data from each feed into a single document, which is then stored locally in the file named rsscache.xml.

This local file works as a cache; here it must be updated manually, but it is possible to have this file automatically update on some specified schedule.

The style sheet that performs the transformation is as follows:

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:php="http://php.net/xsl" version="1.0">

<xsl:output method="xml" indent="yes" />

<xsl:template match="/">

<xsl:element name="channels">

<xsl:apply-templates select="/sites/site"/>

</xsl:element>

</xsl:template>

<xsl:template match="site">

<xsl:variable name="siteurl" select="url" />

<xsl:apply-templates select="php:functionString('retrieveRSS',

$siteurl)/channel">

<xsl:with-param name="sitename" select="name" />

</xsl:apply-templates>

</xsl:template>

<xsl:template match="channel">

<xsl:element name="channel">

<xsl:element name="title">

<xsl:copy-of select="$sitename" />

</xsl:element>

<xsl:copy-of select="link" />

<xsl:apply-templates select="item"/>

</xsl:element>

</xsl:template>

<xsl:template match="item">

<xsl:element name="item">

<xsl:copy-of select="title" />

<xsl:copy-of select="link" />

<xsl:copy-of select="pubDate" />

<xsl:element name="timestamp">

<xsl:value-of select="php:functionString('strtotime', pubDate)" />

</xsl:element>

</xsl:element>

</xsl:template>

</xsl:stylesheet>

Looking at this file, you will notice that the http://php.net/xslnamespace has been added. This will allow PHP functions to be called, assuming the processor calling the style sheet has enabled the use of PHP function calls. The outputmethod has been set to xml, because the output is to be a locally cached XML document, and indentinghas been enabled, which will allow easier readability if you happen to open the resulting rsscache.xmlfile in an editor.

Upon processing the configuration data, each of the siteelements is selected for further processing, matching on site. The template handling these nodes retrieves the remote RSS data. Rather than using the XSLT document()function, this calls a PHP function. This PHP function has been defined within the encompassing script and is written as follows:

function retrieveRSS($url) {

$doc = new DOMDocument();

if ($doc->load($url)) {

return $doc->documentElement;

}

return 0;

}

It accepts a single argument, $url, that is then retrieved using a DOMDocumentobject. Look- ing at the template making this call, a variable has been used and is set to the content of the urlelement. This value is then passed to the retrieveRSS()function from which the docu- ment element of the resulting XML document is returned or 0is returned upon failure.

Note The variable is not needed in this instance because using urlrather than $siteurlwould also work, but a variable was used for demonstration instead.

Assuming the document element was returned from the function, the template then calls xsl:apply-templates, selecting the channelelement from the returned node set. By “node set,”

I am simply referring to the document element. This would occur once for each of the siteele- ments from the configuration file, but note that each time the retrieveRSS()function is called, any returned node set is processed by the templates before the next call to the function. When applying the templates to the channelelement, a parameter is also being passed. The content of the nameelement for the specific sitebeing processed is passed using the sitenameparame- ter. The reason for this will be shown in the template matching on the channelelement.

Upon matching a channelelement, a new channelelement is created to encapsulate the data you want pulled into the new XML document. The reason xsl:elementis used instead of literal

<channel>and </channel>tags is because of the data being worked upon. The document came from the external PHP function residing in the phpnamespace. Using a literal element tag causes the phpnamespace declaration to be added to the elements. The resulting document does not need any namespace information for this example, so by using xsl:element, the channelelement is specifically created with no namespace information. This is where the parameter passed from the previous template comes into play. Rather than use the title for the channel from the RSS feed, you would like to have the name you defined in the config file for the channel used instead for the new document. You do not need to do anything special with the link, so it is simply copied, using xsl:copy-of, into the new document. The template then applies templates to the itemelements from the channel.

The template matching on itemjust pulls a few elements from the RSS feed for each item.

They are still contained within an itemelement, but the nodes being copied are only title, link, and pubDate. When the data from this resulting document is finally to be transformed into HTML, it would be nice to be able to do some sorting using the XSLT sorting functionality.

You have no way to perform date sorting natively, so add a timestampelement. The value here simply takes the date specified by pubDateand calls the PHP strtotime()function to convert the date into a Unix time stamp. You can then use this value, being purely numeric, to perform date sorting.

The local rsscache.xmlfile is created using the previous XML data and style sheet from the following script. This is a single script, so the function you are interested in is buildCache().

This function uses a generic function to load the DOMDocumentobjects, create the XSLT proces- sor, transform the data, and save the result document to a file.

The entire script for this example, referenced by the filename rssrender.php, appears as follows:

<?php

/* The configuration file storing the sites to pull RSS data from.

It must be readable by the Web server */

$site_config = 'siteconfig.xml';

/* Template used to render the cached RSS */

$render_xsl = 'itemrender.xsl';

/* This file stores the summarize RSS information.

It must be read/writable by the Web server */

$rsscache = 'rsscache.xml';

/* Template used to build the RSS cache */

$rsscache_xsl = 'rsscache.xsl';

/* function called from the $rsscache_xsl template */

function retrieveRSS($url) {

$doc = new DOMDocument();

if ($doc->load($url)) {

return $doc->documentElement;

}

return 0;

}

/* Generic function to transform XML data using XSL extension */

function genericProcess($xmlfile, $xslfile, $params=NULL, $outputfile=NULL) {

$doc = new DOMDocument();

$doc->load($xmlfile);

$xsl = new DOMDocument();

$xsl->load($xslfile);

$proc = new xsltprocessor();

$proc->registerPHPFunctions();

$proc->importStylesheet($xsl);

if (is_array($params)) {

foreach ($params AS $key=>$value) {

$proc->setParameter(NULL, $key, $value);

} }

if ($outputfile == NULL) {

if ($outdoc = $proc->transformToDoc($doc)) {

$outdoc->formatOutput = TRUE;

return $outdoc->saveXML();

} } else {

return $proc->transformToURI($doc, $outputfile);

} }

/* Build the RSS Cache file */

function buildCache() {

genericProcess($GLOBALS['site_config'], $GLOBALS['rsscache_xsl'], NULL,

$GLOBALS['rsscache']);

}

$xslparams = NULL;

$cacheBuilt = FALSE;

$sorted = NULL;

/* Perform actions based on HTML form submissions */

if (isset($_POST['buildcache']) && ! empty($_POST['buildcache'])) { buildCache();

} elseif (isset($_POST['sortit']) && ! empty($_POST['sortit']) &&

isset($_POST['sort']) && ! empty($_POST['sort'])) {

$sorted = $_POST['sort'];

$xslparams = array('sortparam'=>$_POST['sort']);

}

if (file_exists($rsscache)) {

$cacheBuilt = TRUE;

}

?>

<html>

<body>

<b>RSS Items:</b><br>

<form method="post">

<table>

<tr>

<td><input type="submit" name="buildcache" value="Update Cache">

&nbsp;&nbsp;&nbsp;&nbsp;</td>

<?php if ($cacheBuilt) { ?>

<td>

<select name="sort">

<option value="">Published Date</option>

<option value="channel" <?php if ($sorted == "channel")

print "selected"; ?>>Channel</option>

<option value="title" <?php if ($sorted == "title")

print "selected"; ?>>Item Title</option>

</select>&nbsp;&nbsp;

<input type="submit" name="sortit" value="Sort">

</td>

<?php } ?>

</tr>

</table>

</form><br><br>

<?php

if ($cacheBuilt) {

print genericProcess($rsscache, $render_xsl, $xslparams);

} else {

print "Cache not built. Please update Cache.";

} ?>

</body>

</html>

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:param name="sortparam" select="datetime" />

<xsl:output method="html"/>

<xsl:template match="/">

<table>

<xsl:apply-templates select="//channel/item">

<xsl:sort select="../title[$sortparam='channel']" order="ascending" />

<xsl:sort select="./title[$sortparam='title']" order="ascending"

case-order="lower-first" />

<xsl:sort select="./timestamp" order="descending" />

</xsl:apply-templates>

</table>

</xsl:template>

<xsl:template match="channel">

<xsl:element name="channel">

<xsl:copy-of select="title" />

<xsl:copy-of select="link" />

<xsl:apply-templates select="item"/>

</xsl:element>

</xsl:template>

<xsl:template match="item">

<tr>

<td colspan="3">

<a>

<xsl:attribute name="href">

<xsl:value-of select="link" />

</xsl:attribute>

<xsl:value-of select="title"/>

</a>

</td>

</tr><tr>

<!-- Insert some non breaking spaces.

Rather than add DTD for nbsp numeric codes are used instead. -->

<td>&#160;&#160;&#160;&#160;&#160;</td>

<td>Channel:

<a>

<xsl:attribute name="href">

<xsl:value-of select="../link" />

</xsl:attribute>

<xsl:value-of select="../title"/>

</a>

</td>

<td>Published: <xsl:copy-of select="pubDate" /></td>

</tr>

</xsl:template>

</xsl:stylesheet>

The only part of this template that probably needs explaining is the sortparamparameter and how sorting takes place. The sortparameteris passed into the style sheet from the XSLT processor based upon the form submission. This allows you to choose how the sorting should be performed in the resulting HTML output. The problem you run into is that the xsl:sort element can be used only within the context of the xsl:apply-templatesand xsl:for-each elements. The possible solutions would be to create multiple templates to process item ele- ments and call them based on the type of sorting that needs to be performed, use xsl:choose and select the xsl:apply-templatescall based on the value of the sortparam, or find a way to work around the issue, as in this example.

You can request three types of sorting. The default is a simple datetimesort. The items are ordered by their pubDatein descending order. This was the reason the additional timestamp element was added. The sorting is actually performed on that element. The second sorting method is by channel title, using the name that came from the site’s config file, in ascending order followed by pubDatein descending order. Lastly, the items can be sorted by the title of the item in ascending order followed by pubDatein descending order. Now the question you most likely have is, how can this work when all the xsl:sortelements are called within the scope of the same xsl:apply-templatescall?

The first xsl:sortelement defined performs a sort based on the titleelement from the parent of the current item. The qualifier for the select, however, tests the equality of the sortparam. Unless the value channelwas passed from the XSLT processor, the qualifier fails,

resulting in nothing being selected for this sort. The same trick is used in the second xsl:sort element, but this time it checks for the value title. If the value matches, then sorting is per- formed on the titleelement of the item. The last xsl:sorthas no such qualifier. If you remember the sort ordering, the last sort key for every sorting is the datetime. This key will always be used and thus is never invalidated by a qualifier.

Note Although this sorting trick does work, it will result in a slower transformation compared to using xsl:chooseor defining multiple templates. It does, however, create a more compact style sheet. The per- formance issue really depends upon the amount of data being processed.

When running this example within a Web server, the cache can be created and/or updated by clicking the Update Cache button. Sorting is simply changed by selecting the desired sort option and clicking the Sort button. Figure 10-2 shows a rendered page that has been sorted by the item name.

Figure 10-2.Rendered HTML page

Một phần của tài liệu Pro PHP XML and Web Services phần 5 pptx (Trang 41 - 49)

Tải bản đầy đủ (PDF)

(94 trang)