C h a p t e r 7 : T h e I n t e r n e t 171 C h a p t e r 7 : T h e I n t e r n e t 171 FIGURE 7-10 With this plug-in you can make the busiest of web pages load quickly on a mobile browser. FIGURE 7-11 This is the original Yahoo! home page before the plug-in is applied. 172 P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s 172 P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s About the Plug-in This plug-in accepts a string containing the HTML to be converted, along with other required arguments, and returns a properly formatted HTML document with various formatting elements removed. It takes these arguments: • $html The HTML to convert • $url The URL of the page being converted • $style If “yes”, style and JavaScript elements are retained, otherwise they are stripped out • $images If “yes”, images are kept, otherwise they are removed Variables, Arrays, and Functions $dom Document object of $contents $xpath XPath object for traversing $dom $hrefs Object containing all a href= link elements in $dom $links Array of all the links discovered in $url $to Array containing the version of what each $link should be changed to in order to ensure it is absolute $count Integer containing the number of elements in $to $link Each link in turn extracted from $links $j Integer counter for iterating through $to PIPHP_RelToAbsURL() Plug-in 21: This function converts a relative URL to absolute. How It Works This function starts off by creating a DOM object that is loaded with the HTML from $html. Then an XPath object is created from this, with which all a href= tags are extracted and placed in the object $hrefs. After initializing the arrays $links and $to, which will contain the links before and after converting to absolute format, all occurrences of & are converted to & symbols, and then all & symbols to the token !!**1**!!, to avoid the suspected str_ replace() bug that doesn’t handle & symbols well. Next the link parts of the tags are pulled out from $hrefs and placed into the array $links using a for loop, and all duplicate links are removed from the array, which is then sorted. After this, the technique used in plug-ins 46 and 48 is implemented to swap all links in $html with numbered tokens. This ensures that multiple replaces don’t interfere with each other. First the $to array is loaded with a proper URL which has had any !!**1**!! tokens changed back to & symbols after running them through PIPHP_RelToAbsURL() to ensure they are absolute. This makes sure that legal URLs will be substituted when the tokens are later changed back. To be flexible, the plug-in supports three types of links—double quoted, single quoted, and unquoted—each case being handled by one of the str_replace() calls. This function substitutes links within $html for the token !!$count!!. This means that the first link becomes !!0!!, the second !!1!!, and so on, as $count is incremented at each pass. C h a p t e r 7 : T h e I n t e r n e t 173 C h a p t e r 7 : T h e I n t e r n e t 173 With all the tokens having been substituted they can now be swapped with their associated links from the $to array. This is achieved using the following for loop. Then, any remaining occurrences of the URL encoded format http%3A%2F%2F are rectified to http://, and any !!**1**!! tokens are returned to being & symbols. Next, if $style does not have the value “yes”, then whitespace, styling, and JavaScript are removed from $html. After this, $images is also tested and if it’s equal to “yes”, then images are allowed to remain in place. This is achieved, along with removing all remaining tags, by appending the tag <img> to the list of allowed tags in $allowed, which is then passed to the strip_tags() function, along with $html. If $images is not equal to “yes”, then the <img> tag will not be appended to $allowed, and consequently all image tags will also be removed by this function. Upon completing all the processing, the result (in $html) is returned. How to Use It To convert HTML to a format more suitable for mobile browsers, use the plug-in like this: $url = "http://yahoo.com"; $html = file_get_contents($url); $style = "no"; $images = "no"; echo PIPHP_HTMLToMobile($html, $url, $style, $images); This loads in the HTML from the index page at www.yahoo.com and then passes it to the plug-in with both $style and $images set to “no”. This means that neither styling nor JavaScript will be allowed in the converted HTML, and neither will images. If $style is set to “yes”, then style tags and JavaScript are retained in the HTML. If $images is also equal to “yes”, then some images will be retained—but not all, due to a lot of the page’s content being removed. If you play with this plug-in you’ll find that often you can set both $style and $images to “yes” and many web pages will still return a lot less information because the strip_ tags() function removes plenty of HTML not strictly needed to use a web page. Remember that this plug-in relies on plug-in 21, PIPHP_RelToAbsURL(). Therefore, you must also copy it into your program or otherwise include it. The Plug-in function PIPHP_HTMLToMobile($html, $url, $style, $images) { $dom = new domdocument(); @$dom ->loadhtml($html); $xpath = new domxpath($dom); $hrefs = $xpath->evaluate("/html/body//a"); $links = array(); $to = array(); $count = 0; $html = str_replace('&', '&', $html); $html = str_replace('&', '!!**1**!!', $html); 174 P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s for ($j = 0 ; $j < $hrefs->length ; ++$j) $links[] = $hrefs->item($j)->getAttribute('href'); $links = array_unique($links); sort($links); foreach ($links as $link) { if ($link != "") { $temp = str_replace('!!**1**!!', '&', $link); $to[$count] = urlencode(PIPHP_RelToAbsURL($url, $temp)); $html = str_replace("href=\"$link\"", "href=\"!!$count!!\"", $html); $html = str_replace("href='$link'", "href='!!$count!!'", $html); $html = str_replace("href=$link", "href=!!$count!!", $html); ++$count; } } for ($j = 0 ; $j < $count ; ++$j) $html = str_replace("!!$j!!", $to[$j], $html); $html = str_replace('http%3A%2F%2F', 'http://', $html); $html = str_replace('!!**1**!!', '&', $html); if (strtolower($style) != "yes") { $html = preg_replace('/[\s]+/', ' ', $html); $html = preg_replace('/<script[^>]*>.*?<\/script>/i', '', $html); $html = preg_replace('/<style[^>]*>.*?<\/style>/i', '', $html); } $allowed = "<a><p><h><i><b><u><s>"; if (strtolower($images) == "yes") $allowed .= "<img>"; return strip_tags($html, $allowed); } CHAPTER 8 Chat and Messaging . changed to in order to ensure it is absolute $count Integer containing the number of elements in $to $link Each link in turn extracted from $links $j Integer counter for iterating through $to PIPHP_RelToAbsURL() Plug- in. object for traversing $dom $hrefs Object containing all a href= link elements in $dom $links Array of all the links discovered in $url $to Array containing the version of what each $link should be. s About the Plug- in This plug- in accepts a string containing the HTML to be converted, along with other required arguments, and returns a properly formatted HTML document with various formatting elements