Table 14-6. Continued Element Use Description author Recommended A Person construct providing information about the author of a feed. A feed element must contain one or more author ele- ments unless every entry element contains at least one author element. link Recommended A link, as defined in the “Common Constructs” section, to a related Web page. category Optional Associates a category, as defined in the “Common Constructs” section, with the feed. A feed can have zero or more category elements. contributor Optional A Person construct providing information for a contributor to the feed. You can use zero or more contributor elements. generator Optional Identifies the agent used to create the feed. icon Optional Identifies a small image, by means of a URL, for the feed. logo Optional Identifies a larger image, by means of a URL, for the feed. rights Optional A Text construct containing any rights, such as copyrights, for the feed. subtitle Optional A Text construct containing a description or subtitle for the feed. A document using the metadata elements from Table 14-6 could look something like the one in Listing 14-4. Listing 14-4. Sample Atom Feed Document Using Optional Elements <?xml version="1.0" encoding="UTF-8"?> <feed xmlns="http://www.w3.org/2005/Atom"> <title>Example Feed</title> <id>http://www.example.com/</id> <updated>2005-10-02T15:15:00Z</updated> <author> <name>John Smith</name> </author> <author> <name>Jane Doe</name> </author> <link rel="self" href="/atom/" /> <category term="technology"/> <category term="PHP"/> <contributor> <name>John Doe</name> </contributor> <generator uri="/phpatomgen.php" version="1.0"> Example PHP Atom Generator </generator> CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM548 6331_c14_final.qxd 2/16/06 4:34 PM Page 548 <icon>http://www.example.com/feedicon.jpg</icon> <logo>http://www.example.com/feedlogo.jpg</logo> <rights> © 2005 John Smith </rights> <subtitle>Description of Example Atom Feed</subtitle> <! Zero or more entry elements > </feed> entry Element Atom does not require a feed to contain any entry elements, which is similar to RSS 2.0, because it does not require items. Using the Atom format, however, an entry element can be part of a feed and also can be its own document. This section will cover the structure of an entry element because it is the same whether used a child element of a feed element or used stand-alone as the document element of an Atom entry document. The only difference is that because Atom elements must live within the Atom namespace, an entry element used as an Atom entry document must declare the namespace, http://www.w3.org/2005/Atom, while a child entry element within a feed would normally already be within the scope of this name- space. Many of the possible child elements, shown in Table 14-7, of an entry element are used in a similar fashion as those used by the feed element. Table 14-7. Entry Child Elements Element Use Description title Required A Text construct containing the title or name of the entry. id Required A permanent and universally unique IRI. If this is not a URI, it is not dereferenced and is compared on a character-to-character basis like a URI. updated Required A Date construct indicating the date and time of the last signi- ficant modification. author Recommended A Person construct providing information about the author of a feed. An entry element must contain at least one author ele- ment unless one is contained by the feed or is provided within a source element for the current entry. content Recommended Contains or links to the complete content, as defined in the “Common Constructs” section, of the entry. This element must be provided if the entry does not contain an alternate link and should be provided if there is no summary. link Recommended A link, as defined in the “Common Constructs” section, to a related Web page. An alternate link must be used if the entry does not contain a content element. summary Recommended A Text construct that provides a short summary or description of the entry. It is recommended that a summary element be used when no content element is used, the content is remote and uses an src attribute, or the content is Base64-encoded. category Optional Associates categories, as defined in the “Common Constructs” section, with the entry. A feed can have zero or more category elements. There can be zero or more category elements. Continued CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 549 6331_c14_final.qxd 2/16/06 4:34 PM Page 549 Table 14-7. Continued Element Use Description contributor Optional A Person construct providing information for a contributor to the entry. You can use zero or more contributor elements. published Optional A Date construct containing the initial creation date and time of the entry. source Optional A source element is used when an entry is copied from another feed. I will explain this element in further detail following this table. rights Optional A Text construct containing any rights, such as copyrights, for the entry. I have explained each of the elements in Table 14-7 elsewhere in the chapter. The only ele- ment that needs more clarification is the source element. You use a source element when an entry is copied from another feed. Its children can be any of those used by the entry’s original parent feed element except for entry elements, especially when the element is not already contained by the entry. For example, if you used an entry from Listing 14-3 to create an Atom entry document, it could look like the following: <?xml version="1.0" encoding="UTF-8"?> <entry xmlns="http://www.w3.org/2005/Atom"> <title>Article 1</title> <link href="http://www.example.com/pub/article1.html"/> <id>http://www.example.com/pub/article1.html</id> <updated>2005-10-02T11:35:27Z</updated> <summary>This is the description for article 1.</summary> <source> <link href="http://www.example.com/"/> <author> <name>Rob Richards</name> <email>rrichards@php.net</email> </author> </source> </entry> If you look at the source element, you will see that it used the link and author elements from the original feed. This pertains to Atom entry documents and also when an entry from one feed is incorporated into another feed. The original feed information for the entry is maintained with the entry, keeping it completely separate from the current feed yet allowing the entry to reference its original feed. The author, contributor, rights, and category ele- ments are some elements to preserve from the original feed because they provide the most important information pertaining to the origins and rights for the entry. Choosing a Format With three competing technologies, how do you choose one to use? If you are going to be sub- scribing to a feed, the answer is simple. You use what is offered and what your reader supports. CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM550 6331_c14_final.qxd 2/16/06 4:34 PM Page 550 The hard part comes when you are the one creating the feed. Personally, when faced with a decision like this, I often will check around to see what the big corporations are doing. It is normally a safe bet that if several of them are using the same technology, it means good sup- port exists for it. Of course, big companies also have a decent amount of resources behind them, so even if the support is not there, it usually arrives quickly. In my opinion, RSS 2.0 looks like a safe bet, although I am not ruling out the others. With a quick look at some RSS 2.0 implementers, you will see names such as Yahoo, the Wall Street Journal, MSNBC, and IBM. This does not even include those providing podcasts. This, how- ever, doesn’t mean you have to use RSS 2.0 or even select just a single format. If you look at the open source community, it is not surprising to find sites providing feeds in all three formats. Unlike a company that normally mandates how its information is accessed, open source sites tend to lean more toward freedom of choice. No matter what aggregator or reader you are using, as long as it’s compatible with at least one of the technologies, you will be able to access the information. Comparing the three formats, my first choice is RSS 2.0. It is simple to use and has a high usage rate. Second on my list is Atom. I consider Atom to be a wildcard format. It has a great structure and offers more flexibility than RSS 2.0, but it does not yet have the user base RSS 2.0 does. Remember, Atom was created as a competing format because of all the problems between the two RSS camps. So, unlike the RSS branches that already had user bases (though divided), Atom started from the bottom. I consider it a wildcard because it still has the possibility of gain- ing more widespread usage. RSS 1.0 is my least favorite. I think the structure is a bit awkward, and the use of namespaces a bit extensive for my liking. You should also take into account that RSS 1.0 is built on RDF technology, which in my opinion just overcomplicates things. In the end, the choice is up to you. Everything here has been my opinion, not the voice of the Great Oz. Only you understand who your audience is and your users’ needs. You know the type of content your feed will be supplying. Finally, you will be the one who has to support it. The advice offered should help you decide which format (or even formats) best suits your needs. Seeing Some Examples in Action Content syndication varies depending upon the technologies you are comparing. For this rea- son, the examples in the following sections are not overly complex examples that attempt to demonstrate the complete functionality of each of the formats. I will demonstrate a simple API for creating minimal RSS 1.0, RSS 2.0, and Atom feeds using DOM; a simple RSS 2.0 parser using SimpleXML; and a simple Atom parser using XMLReader. You could extend each of these examples to create much more feature-rich applications. Creating Simple Feeds Using DOM Depending upon the type of feed and the different support being added to it, building a feed manually using DOM can become complex, especially when trying to support multiple formats. This example will demonstrate how to use DOM to create feeds in multiple formats and support the minimal requirements for each format. The code is split into four classes. The Syndicator class is the base class, which is not instantiated directly, that provides the bulk of functionality for building a feed. The remaining classes, which extend the Syndicator class, are the ones that are instantiated to create a feed in a specific format. The RSS1 class supports an RSS 1.0 feed, the RSS2 class supports RSS 2.0 feeds, and the Atom class supports Atom 1.0 feed documents. CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 551 6331_c14_final.qxd 2/16/06 4:34 PM Page 551 Syndicator Class The Syndicator class is the base class and provides the majority of functionality for creating a feed. Because of the differing feed formats, much has been generalized in this class with specifics provided by the extending classes. This class is not meant to be directly instantiated. In actuality, this class should be made abstract, but in the event you are not fluent with OOP or some of the newer aspects of PHP 5, I have written it as a regular class: class Syndicator { protected $rssDoc = NULL; protected $docElement = NULL; protected $root = NULL; protected $items = NULL; protected $hasChannel = TRUE; protected $tagMap = array('item'=>'item', 'feeddesc'=>'description', 'itemdesc'=>'description'); const ITEM = 0; const FEED = 1; All class properties are protected because they are not meant to be accessed outside an instantiated object. The first three properties are required because of the differing structures. The rssDoc property holds the DOMDocument object you are using to create the feed. The docElement property holds the DOMElement object to which item or entry elements are added. This normally is the document element except in the case of RSS 2.0. The item elements are added to the channel element in that format, which is actually a child of the document ele- ment. The docElement property acts as a pseudo-document element, so you can add item and entry elements using common functionality. The root property holds the DOMElement to which you add the metadata for the feed. Again, this varies depending upon the format you are using. For an Atom feed, the value of this property is the feed element, which is the document element. For an RSS 1.0/RSS 2.0 feed, the value of this property is the channel element. I will show how to use the remaining properties later in the example. The defaults for these, how- ever, are for the RSS 1.0 and 2.0 feeds. /* Common element creation function that handles namespace creation properly */ protected function createSyndElement($namespace, $name, $value=NULL) { if (is_null($namespace)) { return $this->rssDoc->createElement($name, $value); } else { return $this->rssDoc->createElementNS($namespace, $name, $value); } } /* Default link element creation function as Atom has a different format */ protected function createLink($parent, $url) { $link = $this->createSyndElement($this->NS, 'link', $url); $parent->appendChild($link); } CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM552 6331_c14_final.qxd 2/16/06 4:34 PM Page 552 The following function, createRSSNode(), adds a title, link, and description to the element passed as the first parameter. In the case of an Atom feed, it also creates the updated and id elements. Links in Atom feeds are created differently than in RSS 1.0 and RSS 2.0 feeds; thus, the example uses a createLink() function. As you will see in the Atom class, it is overridden so the element is created in the proper format. A $type variable is passed into this method to indicate the type of element for which these child elements are being created. The reason for this is to determine the element for the description. RSS 1.0 and RSS 2.0 use the element description for both the channel and item elements. Atom, on the other hand, uses subtitle for the feed element and content for the entry element. Based on the type, the proper name is taken from the tagMap array, which is also overridden in the Atom class. /* Generic method to create appropriate title, link, and description for an element */ protected function createRSSNode($type, $parent, $title, $url, $description, $pubDate = NULL, $id=NULL) { $this->createLink($parent, $url); $title = $this->createSyndElement($this->NS, 'title', $title); $parent->appendChild($title); if ($type == Syndicator::ITEM) { $titletag = $this->tagMap['itemdesc']; } else { $titletag = $this->tagMap['feeddesc']; } $description = $this->createSyndElement($this->NS, $titletag, $description); $parent->appendChild($description); The remaining functionality of the createRSSNode() method is specific to Atom. These methods could be supported with additional coding for both RSS 1.0 and 2.0 but are currently out of the scope of this example. To do so would require supporting extending modules, the Dublin Core in particular, for RSS 1.0. These are required for a valid Atom feed so currently work properly only for that format. /* id elements and updated elements are specific to Atom - corresponding elements from other formats not currently supported */ if (! is_null($id)) { $idnode = $this->createSyndElement($this->NS, 'id', $id); $parent->appendChild($idnode); } if (! is_null($pubDate)) { $datenode = $this->createSyndElement($this->NS, 'updated', $pubDate); $parent->appendChild($datenode); } } The constructor performs all the initial setup for the feed. Each class defines a SHELL property, which is just a template for the document. It is used to easily create a document with the initial namespaces declared properly. The hasChannel property is set to FALSE for the Atom class because it is the only format not using a channel element. Once the object is instantiated, CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 553 6331_c14_final.qxd 2/16/06 4:34 PM Page 553 the constructor will have properly set up the properties mentioned earlier and set the initial metadata for either the feed element or the channel element based on the values passed to the constructor. function __construct($title, $url, $description, $pubDate = NULL, $id=NULL) { try { $this->rssDoc = new DOMDocument(); $this->rssDoc->loadXML($this->SHELL); $this->docElement = $this->rssDoc->documentElement; if ($this->hasChannel) { $root = $this->createSyndElement($this->NS, 'channel'); $this->root = $this->docElement->appendChild($root); } else { $this->root = $this->docElement; } $this->createRSSNode(Syndicator::FEED, $this->root, $title, $url, $description, $pubDate, $id); return; } catch (DOMException $e) { throw new Exception($e->getMessage()); } throw new Exception("Unable to Create Object"); } The addItem() method is pretty simple. It creates an element using the name pulled from the tagMap, which is entry for Atom and item for RSS 1.0 and 2.0. The new element is then appended to the node held by the docElement property. The createRSSNode() method is then called, passing the type Syndicator::ITEM constant, which will result in the title, link, descrip- tion, possible ID, and updated elements to be created on this new element. public function addItem($title, $link, $description=NULL, $pubDate = NULL, $id=NULL) { $item = $this->createSyndElement($this->NS, $this->tagMap['item']); if ($this->docElement->appendChild($item)) { $this->createRSSNode(Syndicator::ITEM, $item, $title, $link, $description, $pubDate, $id); return TRUE; } return FALSE; } /* Method used as a holder and is overridden in the Atom class */ public function addAuthor($name) { trigger_error("Function not yet implemented"); return FALSE; } CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM554 6331_c14_final.qxd 2/16/06 4:34 PM Page 554 /* Simple method to return the formatted XML document as a string */ function dump() { if ($this->rssDoc) { $this->rssDoc->formatOutput = TRUE; return $this->rssDoc->saveXML(); } return ""; } } RSS1 Class The RSS1 class is the class to be instantiated when creating an RSS 1.0 feed. It has a format much different than RSS 2.0 and Atom do and therefore must override some methods to sup- port its structure properly. The first area to look at is the properties and the constant it defines. The RDFNS constant is used only within this class. It defines the rdf namespace because it is quite long and because the constant makes it easier to use. This namespace is needed for a few elements, and attributes are specific to RSS 1.0. The NS property sets the common name- space used within the Syndicator class. Using the property allows the Syndicator class to use generalized code shared amongst the classes when creating elements. class RSS1 extends Syndicator { const RDFNS = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'; protected $NS = 'http://purl.org/rss/1.0/'; /* Following is formatted for readability */ protected $SHELL = '<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" />'; The addToItems() method is unique to this class. RSS 1.0 requires items to be referenced within the channel element. The items property, which you saw defined in the Syndicator class, holds the DOMElement to which the rdf:li elements are added. Upon the addition of the first item, the structure is set up, which includes the items element and the rdf:Seq element, which is the parent for the rdf:li items. This method is never called publicly, and hence you have the private accessor. Instead, it is called by the overridden addItem() method in this class. private function addToItems($url) { if (is_null($this->items)) { $container = $this->createSyndElement($this->NS, 'items'); $this->root->appendChild($container); $this->items = $this->rssDoc->createElementNS(self::RDFNS, 'Seq'); $container->appendChild($this->items); } CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 555 6331_c14_final.qxd 2/16/06 4:34 PM Page 555 $item = $this->rssDoc->createElementNS(self::RDFNS, 'li'); $this->items->appendChild($item); $item->setAttribute("resource", $url); } The only reason that the addItem() method has been overridden is to support the cre- ation of the rdf:li elements. This method first calls the parent addItem() method and then makes a call to the internal addToItems() method. public function addItem($title, $link, $description=NULL, $pubDate = NULL, $id=NULL) { if (parent::addItem($title, $link, $description, $pubDate, $id)) { $this->addToItems($link); return TRUE; } return FALSE; } As you probably recall from the RSS 1.0 section, the channel and item elements must con- tain an rdf:about attribute. The createRSSNode() method is overridden to create this attribute prior to the createRSSNode() method from the Syndicator class being called. protected function createRSSNode($type, $parent, $title, $url, $description, $pubDate = NULL) { $parent->setAttributeNS(self::RDFNS, 'rdf:about', $url); parent::createRSSNode($type, $parent, $title, $url, $description, $pubDate); } } RSS2 Class The RSS2 class instantiates an object to create an RSS 2.0 document. This class is extremely simple. RSS 2.0 does not use a namespace, so the NS property is set to NULL, and the tem- plate is simply the rss element with a version. The structure of an RSS 2.0 feed differs from that of RSS 1.0; as in RSS 2.0, all elements reside within the channel element. The construc- tor has been overridden so that once the constructor from the Syndicator class has been called, the docElement property can be set to point to the proper node. In this case, both the root and docElement properties point to the channel element. class RSS2 extends Syndicator { protected $NS = NULL; protected $SHELL = '<rss version="2.0" />'; CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM556 6331_c14_final.qxd 2/16/06 4:34 PM Page 556 function __construct($title, $url, $description, $pubDate = NULL, $id=NULL) { try { parent::__construct($title, $url, $description, $pubDate, $id); $this->docElement = $this->root; } catch (Exception $e) { throw new Exception($e->getMessage()); } } } Atom Class The Atom class, used to instantiate an object to create an Atom 1.0 feed, is not much more difficult than using the RSS2 class. Its NS property is set to the Atom namespace, and the SHELL property is set to the initial feed element. The hasChannel variable is set to FALSE in this case. When the con- structor is called, a channel element will not be created, and the docElement property will be set accordingly. The class also defines a custom tagMap. Atom tags vary slightly from the RSS 1.0 and 2.0 tags, which is the reason for the use of this array mapping. class Atom extends Syndicator { protected $NS = 'http://www.w3.org/2005/Atom'; protected $SHELL = '<feed xmlns="http://www.w3.org/2005/Atom" />'; protected $hasChannel = FALSE; protected $tagMap = array('item'=>'entry', 'feeddesc'=>'subtitle', 'itemdesc'=>'content'); Atom has a different syntax for a link element. This method overrides the default method so that the link is created in the proper format: protected function createLink($parent, $url) { $link = $this->rssDoc->createElementNS($this->NS, 'link'); $parent->appendChild($link); $link->setAttribute('href', $url); } Atom also requires that the feed and entry elements contain an updated id element. In the event no value has been passed to these parameters for the constructor and addItem() methods, the values are automatically populated. The id is set to the URL, and the pubDate is set to the current date and time. ■Note If you are not familiar with the value c passed to the date function, it is a new format character as of PHP 5 that formats dates in ISO 8601 format. This format is compatible with the Atom Date construct. CHAPTER 14 ■ CONTENT SYNDICATION: RSS AND ATOM 557 6331_c14_final.qxd 2/16/06 4:34 PM Page 557 [...]... been about general XML technologies and tools Moving forward, the remaining chapters focus more on Web services and data exchange through the use of XML This chapter will cover WDDX, which is a common XML format for exchanging data structures; specifically, the chapter will explain what WDDX is, how to use it, and how to use the wddx extension in PHP Although WDDX itself is not a Web service, it can... types 5 67 6331_c15_final.qxd 568 2/16/06 4:33 PM Page 568 CHAPTER 15 ■ WEB DISTRIBUTED DATA EXCHANGE (WDDX) WDDX is not a formal standard but is built upon open standards, specifically XML 1.0, and is freely available for both use and redistribution WDDX development and future evolution has moved to an open project, OpenWDDX.org (http://www.openwddx.org) Although you can find some information and software... 2.0 Parser Using SimpleXML SimpleXML provides a simple way to parse feeds As long as no default namespaces have been used in the feeds, you have little to deal with other than understanding the structure As you are already aware from Chapter 7, you access elements as properties by name, and you access attributes like an array with string indexes < ?php /* Define some RSS 2.0 and other compatible feeds... Transfer Protocol (FTP), Simple Mail Transfer Protocol (SMTP), and , Post Office Protocol (POP) Basically, you can use any protocol that supports transferring textual data Background Allaire created WDDX in 1998 to provide distributed computing support to its ColdFusion platform With WDDX, variables (which include a name, data type, and value) can be serialized into an XML document from one application and. .. Through the recent chapters, you have gotten closer to working with XML and the Internet, with content syndication being primarily an XML- based Web technology In the next chapter, you will begin to enter the world of Web Services, starting with Web Distributed Data Exchange (WDDX) 6331_c15_final.qxd 2/16/06 4:33 PM CHAPTER Page 5 67 15 ■■■ Web Distributed Data Exchange (WDDX) W ith the exception of content... name= "php_ class_name"> myClass 6331_c15_final.qxd 2/16/06 4:33 PM Page 579 CHAPTER 15 ■ WEB DISTRIBUTED DATA EXCHANGE (WDDX) myClass default 0 ... used on a number of platforms and programming languages, especially PHP WDDX Data Types Thinking of the data in terms of variables and their data types in PHP the question becomes, , how can you send the data to another system, using XML, for processing? For example, you might have the following variables, whose values need to be sent to another system: $myinteger = 1; Using XML, you might serialize the... like so: 1 This does provide more flexibility, but any systems that are exchanging data have to understand the structure and know how it should be processed A different solution might involve using XML Schemas to indicate data types, but, again, the system needs to know how to process the document WDDX provides a solution to this problem Through its common format, the value... be included using the length and type attributes There is not a native binary type in PHP so you will typically handle this , data using PHP strings For example: Using WDDX Although you could work with WDDX using the XML parsers in PHP the wddx extension provides , quick and simple functionality for... feeds */ $rssfeed = array(); /* The PHP RSS feeds are RSS version 0.93 */ $rssfeed['PHPGEN'] = 'http://news .php. net/group .php? group =php. general&format=rss'; /* The YAHOO RSS feeds are RSS version 2.0 */ $rssfeed['YAHOOTOPNEWS'] = 'http://rss.news.yahoo.com/rss/topstories'; /* The Planet PHP RSS feed is RSS version 0.91 */ $rssfeed['PLNTPHP'] = 'http://www.planet -php. org/rss/'; /* Apress new book list . it, and how to use the wddx extension in PHP. Although WDDX itself is not a Web service, it can be used to create Web services. Introducing WDDX WDDX is an XML technology that allows data and. docElement property can be set to point to the proper node. In this case, both the root and docElement properties point to the channel element. class RSS2 extends Syndicator { protected $NS = NULL; protected. $subtitle); } ?> XMLReader has an easy API to understand. The code should be more than enough to understand how it is being parsed. Using PEAR XML_ RSS The PEAR XML_ RSS class, mentioned in Chapter 13, provides