1. Trang chủ
  2. » Công Nghệ Thông Tin

Plug in PHP 100 POWER SOLUTIONS- P39 docx

5 181 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 5
Dung lượng 221,14 KB

Nội dung

156 P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s 156 P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s So, having got the & problem out of the way, a new document object is created in $dom and the document in $contents is loaded into it. This makes the whole HTML page easily searchable using the $xpath object, which is created from $dom. Next, five types of tags are searched for using the $xpath object; a href=, img src=, iframe src=, script src=, and link src=. All the associated strings for each tag are then placed in the objects $hrefs, $sources, $iframes, $scripts, and $css. The reason for this is that it is necessary to ensure that all links within a page are of the absolute type so that the page this plug-in returns can be served up from any server and, by grabbing all the links, it will be possible to perform a relative-to-absolute conversion on each. To facilitate this, all the separate objects are then traversed, and the links found in each extracted into the array $links. Then, to ensure there is no duplication of conversions, the array_unique() function is called to remove all duplicates, and the resulting set of unique URLs is then saved back into the $links array. Now that the entire set of links from the document are in the $links array, a foreach loop is used to iterate through each. The first part of the loop ensures there was actually a URL supplied in a link before continuing, and if so, the string variable $temp is assigned the contents of each link, but with the & symbols replaced. This is so that the array $to can be assigned an untokenized URL in the next step, in which the value /$redirect?u= is assigned to the current element of $to, as indexed by $count, which is incremented after each insertion. After the /$redirect?u=, the URL itself is attached to the end of the element, after first running it through the PIPHP_RelToAbsURL() function to ensure it is absolute. So, if the value in $redirect is webproxy.php, and the link to add is http://google.com, then $to[$count] will be assigned the string /webproxy.php?u=http://google.com. Now it’s time to make the link replacements within the document itself, which, as you’ll recall, is stored in $contents. This is done by two sets of str_replace() calls to cover the three types of links allowed in an HTML document: (1) single quoted, (2) double quoted, and (3) without quotes. To do this, all href="link", href='link' and href=link statements are replaced with a unique token comprising the value of $count surrounded by two pairs of exclamation marks. The first link is replaced with !!0!!, the second with !!1!!, and so on. Again, I chose this as being unlikely to appear within an HTML document. This process is then repeated with all occurrences of src="link", src='link', and src=link. At the end of the loop there will be no URLs remaining in the document, only the exclamation mark tokens representing them. And there’s a very good reason for all these shenanigans, which is that the final part of this plug-in needs to convert all the links to absolute, but if it tried to do this with all the links still in place it would seriously mess up. To explain why, imagine that the server being proxied is http://server.com and therefore all occurrences of /news/index.html must be replaced with http://server .com/news/index.html. This is all fine and dandy, but what if all occurrences of /news/ need changing too? When this happens it will also impact the previous change because the newly converted http://server.com/news/index.html strings will get changed to http://server.com/http://server.com/news/index.html. Do you see the problem? The changes will get changed. This is why all the links that need converting are first pre-processed into tokens. Then all the tokens can be safely processed into the absolute URLs, without new changes modifying previous ones. And that’s what the next bit of code does. It’s a for loop that iterates through all the entries in $to (the absolute URLs) and changes each of the tokens in turn to each of the values in $to. C h a p t e r 7 : T h e I n t e r n e t 157 C h a p t e r 7 : T h e I n t e r n e t 157 Once all that has been achieved, then all the links in the document will now be in absolute format, so it’s safe to make a final conversion, changing any remaining !!**1**!! tokens back into & symbols, the result of which is then returned by the plug-in. How to Use It At it’s simplest, all you need to use this plug-in is to create a program, perhaps called webproxy.php, looking like this, but also including the functions: PIPHP_SimpleWebProxy() and PIPHP_RelToAbsURL(): $url = urldecode($_GET['u']); echo PIPHP_SimpleWebProxy($url, "webproxy.php"); This program should be saved in the document root of your server. The first line simply extracts the contents following the ?u= part of a GET request (the query string) into the variable $url, and the second makes the call to the plug-in. You can call up the web proxy by typing a command such as the following into your browser’s address bar (making sure you always enter the http:// part of the URL or the program won’t work): webproxy.php?u=http://google.com Or, more likely, if your server domain is myserver.com: http://myserver.com/webproxy.php?u=http://google.com Your new web proxy will now work, including sending images, because each link in a document has been converted to run through the web proxy, and therefore all images do so, too. However, to make the program work as well as possible, you will probably want to support all the content types checked for near the start of the plug-in, and send the correct headers for each prior to sending the data. Therefore your program should probably look more like this (not forgetting to also add the two plug-ins it relies on): $url = urldecode($_GET['u']); $result = PIPHP_SimpleWebProxy($url, "webproxy.php"); switch(strtolower(substr($url, -4))) { case ".jpg": header("Content-type: image/jpeg"); die($result); case ".gif": header("Content-type: image/gif"); die($result); case ".png": header("Content-type: image/png"); die($result); case ".ico": header("Content-type: image/x-icon"); die($result); case ".css": header("Content-type: text/css"); die($result); case ".xml": header("Content-type: text/xml"); die($result); case ".htm": case "html": case ".php": header("Content-type: text/html"); die($result); default: if (strtolower(substr($url, -3)) == ".js") header("Content-type: application/x-javascript"); die($result); } 158 P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s 158 P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s In the preceding code a switch statement is used to determine the current file type. Then, the appropriate header for each is sent to the browser, followed by the contents, as returned in $result. This is sent using the die() function since it combines both an echo and an exit statement in one. In the case of HTML files, the extensions .htm and .php, as well as .html are allowed. Under the default section, .js JavaScript files are caught. They are handled separately as their extensions are only two characters long, instead of three. Finally, if nothing else matches, $contents is simply sent without a header, and it is hoped this will be good enough (generally it is). Not including whitespace and comments, you will now have a web proxy pr ogram in under 100 lines of code that will work quite well, but you should realize that it only likes properly formed pages and is not forgiving of badly formatted HTML. Therefore some pages will display strangely, if at all. But now that you know how it all works, you can easily tweak the code to your preferences. Remember that plug-in 21, PIPHP_RelToAbsURL(), also needs to be copied into, or otherwise included in, your program. The Plug-in function PIPHP_SimpleWebProxy($url, $redirect) { $contents = @file_get_contents($url); if (!$contents) return NULL; switch(strtolower(substr($url, -4))) { case ".jpg": case ".gif": case ".png": case ".ico": case ".css": case ".js": case ".xml": return $contents; } $contents = str_replace('&amp;', '&', $contents); $contents = str_replace('&', '!!**1**!!', $contents); $dom = new domdocument(); @$dom ->loadhtml($contents); $xpath = new domxpath($dom); $hrefs = $xpath->evaluate("/html/body//a"); $sources = $xpath->evaluate("/html/body//img"); $iframes = $xpath->evaluate("/html/body//iframe"); $scripts = $xpath->evaluate("/html//script"); $css = $xpath->evaluate("/html/head/link"); $links = array(); for ($j = 0 ; $j < $hrefs->length ; ++$j) $links[] = $hrefs->item($j)->getAttribute('href'); for ($j = 0 ; $j < $sources->length ; ++$j) $links[] = $sources->item($j)->getAttribute('src'); for ($j = 0 ; $j < $iframes->length ; ++$j) $links[] = $iframes->item($j)->getAttribute('src'); C h a p t e r 7 : T h e I n t e r n e t 159 C h a p t e r 7 : T h e I n t e r n e t 159 for ($j = 0 ; $j < $scripts->length ; ++$j) $links[] = $scripts->item($j)->getAttribute('src'); for ($j = 0 ; $j < $css->length ; ++$j) $links[] = $css->item($j)->getAttribute('href'); $links = array_unique($links); $to = array(); $count = 0; sort($links); foreach ($links as $link) { if ($link != "") { $temp = str_replace('!!**1**!!', '&', $link); $to[$count] = "/$redirect?u=" . urlencode(PIPHP_RelToAbsURL($url, $temp)); $contents = str_replace("href=\"$link\"", "href=\"!!$count!!\"", $contents); $contents = str_replace("href='$link'", "href='!!$count!!'", $contents); $contents = str_replace("href=$link", "href=!!$count!!", $contents); $contents = str_replace("src=\"$link\"", "src=\"!!$count!!\"", $contents); $contents = str_replace("src='$link'", "src='!!$count!!'", $contents); $contents = str_replace("src=$link", "src=!!$count!!", $contents); ++$count; } } for ($j = 0 ; $j < $count ; ++$j) $contents = str_replace("!!$j!!", $to[$j], $contents); return str_replace('!!**1**!!', '&', $contents); } Page Updated? If you want to allow your users to be notified whenever one of your pages is updated, or perhaps you would like to be informed when a web page that interests you has been changed, all you need is this plug-in. For example, Figure 7-7 shows the index page at www.pluginphp.com being monitored for changes. 47 160 P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s 160 P l u g - i n P H P : 1 0 0 P o w e r S o l u t i o n s About the Plug-in This plug-in accepts the URL of a web page to monitor and lets you know whether it has been changed. It returns 1 if the page has changed, 0 if it is unchanged, -1 if the page is a new one not yet in the datafile, or -2 if the page was inaccessible. It takes these arguments: • $url The URL to check • $datafile The filename of a file containing the datafile Variables, Arrays, and Functions $contents String containing the contents of $url $checksum String containing the result of passing $contents through the md5() function $rawfile String containing the contents of $datafile $data Array containing the lines extracted from $rawfile $left Array of all the left halves of $data $right Array of all the right halves of $data $exists Integer pointer to the location in $left of $page if it is already in the datafile $j Integer counter for iterating through $left PIPHP_PU_F1() Function to extract the left half of a supplied string PIPHP_PU_F2() Function to extract the right half of a supplied string How It Works This plug-in loads the contents of $page into $contents, returning the value FALSE if it could not be fetched. Otherwise, an md5() checksum is made of the page’s contents. This is a one-way function that creates a 32-character unique string. Should even one letter change on a web page, the resulting md5() string will be substantially different, so it’s the perfect way to detect changes in a web page. Next a check is made to see whether $datafile already exists. If it does, then its contents are loaded into $rawfile, which is then split line by line into the array $data by using the explode() function based around the \n linefeeds in the file. Then, instead of FIGURE 7-7 Monitoring changes to web pages is automatic with this plug-in. . Functions $contents String containing the contents of $url $checksum String containing the result of passing $contents through the md5() function $rawfile String containing the contents of $datafile $data. your preferences. Remember that plug- in 21, PIPHP_RelToAbsURL(), also needs to be copied into, or otherwise included in, your program. The Plug- in function PIPHP_SimpleWebProxy($url, $redirect). would like to be informed when a web page that interests you has been changed, all you need is this plug- in. For example, Figure 7-7 shows the index page at www.pluginphp.com being monitored for

Ngày đăng: 07/07/2014, 08:20

w