Tài liệu Advanced PHP Programming- P6 ppt

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	50
Dung lượng	510,94 KB

Nội dung

228 Chapter 9 External Performance Tunings Pre-Fork, Event-Based, and Threaded Process Architectures The three main architectures used for Web servers are pre-fork, event-based, and threaded models. In a pre-fork model, a pool of processes is maintained to handle new requests. When a new request comes in, it is dispatched to one of the child processes for handling. A child process usually serves more than one request before exiting. Apache 1.3 follows this model. In an event-based model, a single process serves requests in a single thread, utilizing nonblocking or asyn- chronous I/O to handle multiple requests very quickly. This architecture works very well for handling static files but not terribly well for handling dynamic requests (because you still need a separate process or thread to the dynamic part of each request). thttpd, a small, fast Web server written by Jef Poskanzer, utilizes this model. In a threaded model, a single process uses a pool of threads to service requests. This is very similar to a pre- fork model, except that because it is threaded, some resources can be shared between threads. The Zeus Web server utilizes this model. Even though PHP itself is thread-safe, it is difficult to impossible to guaran- tee that third-party libraries used in extension code are also thread-safe. This means that even in a threaded Web server, it is often necessary to not use a threaded PHP, but to use a forked process execution via the fastcgi or cgi implementations. Apache 2 uses a drop-in process architecture that allows it to be configured as a pre-fork, threaded, or hybrid architecture, depending on your needs. In contrast to the amount of configuration inside Apache, the PHP setup is very similar to the way it was before.The only change to its configuration is to add the following to its httpd.conf file: Listen localhost:80 This binds the PHP instance exclusively to the loopback address. Now if you want to access the Web server, you must contact it by going through the proxy server. Benchmarking the effect of these changes is difficult. Because these changes reduce the overhead mainly associated with handling clients over high-latency links, it is difficult to measure the effects on a local or high-speed network. In a real-world setting, I have seen a reverse-proxy setup cut the number of Apache children necessary to support a site from 100 to 20. Operating System Tuning for High Performance There is a strong argument that if you do not want to perform local caching, then using a reverse proxy is overkill.A way to get a similar effect without running a separate server is to allow the operating system itself to buffer all the data. In the discussion of reverse proxies earlier in this chapter, you saw that a major component of the network wait time is the time spent blocking between data packets to the client. The application is forced to send multiple packets because the operating system has a limit on how much information it can buffer to send over a TCP socket at one time. Fortunately, this is a setting that you can tune. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 229 Language-Level Tunings On FreeBSD, you can adjust the TCP buffers via the following: #sysctl –w net.inet.tcp.sendspace=131072 #sysctl –w net.inet.tcp.recvspace=8192 On Linux, you do this: #echo “131072” > /proc/sys/net/core/wmem_max When you make either of these changes, you set the outbound TCP buffer space to 128KB and the inbound buffer space to 8KB (because you receive small inbound requests and make large outbound responses).This assumes that the maximum page size you will be sending is 128KB. If your page sizes differ from that, you need to change the tunings accordingly. In addition, you might need to tune kern.ipc.nmbclusters to allocate sufficient memory for the new large buffers. (See your friendly neighborhood systems administrator for details.) After adjusting the operating system limits, you need to instruct Apache to use the large buffers you have provided. For this you just add the following directive to your httpd.conf file: SendBufferSize 131072 Finally, you can eliminate the network lag on connection close by installing the lingerd patch to Apache.When a network connection is finished, the sender sends the receiver a FIN packet to signify that the connection is complete.The sender must then wait for the receiver to acknowledge the receipt of this FIN packet before closing the socket to ensure that all data has in fact been transferred successfully. After the FIN packet is sent, Apache does not need to do anything with the socket except wait for the FIN-ACK packet and close the connection.The lingerd process improves the efficiency of this operation by handing the socket off to an exterior daemon (lingerd), which just sits around waiting for FIN-ACKs and closing sockets. For high-volume Web servers, lingerd can provide significant performance benefits, especially when coupled with increased write buffer sizes. lingerd is incredibly simple to compile. It is a patch to Apache (which allows Apache to hand off file descriptors for closing) and a daemon that performs those closes. lingerd is in use by a number of major sites, including Sourceforge.com, Slashdot.org,andLiveJournal.com. Proxy Caches Even better than having a low-latency connection to a content server is not having to make the request at all. HTTP takes this into account. HTTP caching exists at many levels: n Caches are built into reverse proxies n Proxy caches exist at the end user’s ISP n Caches are built in to the user’s Web browser Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 230 Chapter 9 External Performance Tunings Figure 9.5 shows a typical reverse proxy cache setup.When a user makes a request to www.example.foo, the DNS lookup actually points the user to the proxy server. If the requested entry exists in the proxy’s cache and is not stale, the cached copy of the page is returned to the user, without the Web server ever being contacted at all; otherwise, the connection is proxied to the Web server as in the reverse proxy situation discussed earlier in this chapter. Figure 9.5 A request through a reverse proxy. Many of the reverse proxy solutions, including Squid, mod_proxy,andmod_accel, support integrated caching. Using a cache that is integrated into the reverse proxy server is an easy way of extracting extra value from the proxy setup. Having a local cache guaran- tees that all cacheable content will be aggressively cached, reducing the workload on the back-end PHP servers. client PHP webserver client reverse proxy client High Latency Internet Traffic Internet return cache page Is content cached? yes low latency connection no Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 231 Cache-Friendly PHP Applications Cache-Friendly PHP Applications To take advantage of caches, PHP applications must be made cache friendly. A cache- friendly application understands how the caching policies in browsers and proxies work and how cacheable its own data is.The application can then be set to send appropriate cache-related directives with browsers to achieve the desired results. There are four HTTP headers that you need to be conscious of in making an application cache friendly: n Last-Modified n Expires n Pragma: no-cache n Cache-Control The Last-Modified HTTP header is a keystone of the HTTP 1.0 cache negotiation ability. Last-Modified is the Universal Time Coordinated (UTC; formerly GMT) date of last modification of the page.When a cache attempts a revalidation, it sends the Last- Modified date as the value of its If-Modified-Since header field so that it can let the server know what copy of the content it should be revalidated against. The Expires header field is the nonrevalidation component of HTTP 1.0 revalidation.The Expires value consists of a GMT date after which the contents of the requested documented should no longer be considered valid. Many people also view Pragma: no-cache as a header that should be set to avoid objects being cached. Although there is nothing to be lost by setting this header, the HTTP specification does provide an explicit meaning for this header, so its usefulness is regulated by it being a de facto standard implemented in many HTTP 1.0 caches. In the late 1990s, when many clients spoke only HTTP 1.0, the cache negotiation options for applications where rather limited. It used to be standard practice to add the following headers to all dynamic pages: function http_1_0_nocache_headers() { $pretty_modtime = gmdate(‘D, d M Y H:i:s’) . ‘ GMT’; header(“Last-Modified: $pretty_modtime”); header(“Expires: $pretty_modtime”); header(“Pragma: no-cache”); } This effectively tells all intervening caches that the data is not to be cached and always should be refreshed. When you look over the possibilities given by these headers, you see that there are some glaring deficiencies: Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 232 Chapter 9 External Performance Tunings n Setting expiration time as an absolute timestamp requires that the client and server system clocks be synchronized. n The cache in a client’s browser is quite different than the cache at the client’s ISP. A browser cache could conceivably cache personalized data on a page, but a proxy cache shared by numerous users cannot. These deficiencies were addressed in the HTTP 1.1 specification, which added the Cache-Control directive set to tackle these problems.The possible values for a Cache- Control response header are set in RFC 2616 and are defined by the following syntax: Cache-Control = “Cache-Control”“:” l#cache-response-directive cache-response-directive = “public” | “private” | “no-cache” | “no-store” | “no-transform” | “must-revalidate” | “proxy-revalidate” | “max-age”“=” delta-seconds | “s-maxage”“=” delta-seconds The Cache-Control directive specifies the cacheability of the document requested. According to RFC 2616, all caches and proxies must obey these directives, and the headers must be passed along through all proxies to the browser making the request. To specify whether a request is cacheable, you can use the following directives: n public—The response can be cached by any cache. n private—The response may be cached in a nonshared cache.This means that the request is to be cached only by the requestor’s browser and not by any intervening caches. n no-cache—The response must not be cached by any level of caching.The no- store directive indicates that the information being transmitted is sensitive and must not be stored in nonvolatile storage. If an object is cacheable, the final directives allow specification of how long an object may be stored in cache. n must-revalidate—All caches must always revalidate requests for the page. During verification, the browser sends an If-Modified-Since header in the request. If the server validates that the page represents the most current copy of the page, it should return a 304 Not Modified response to the client. Otherwise, it should send back the requested page in full. n proxy-revalidate—This directive is like must-revalidate, but with proxy- revalidate, only shared caches are required to revalidate their contents. n max-age—This is the time in seconds that an entry is considered to be cacheable Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 233 Cache-Friendly PHP Applications without revalidation. n s-maxage—This is the maximum time that an entry should be considered valid in a shared cache. Note that according to the HTTP 1.1 specification, if max-age or s-maxage is specified, they override any expirations set via an Expire header. The following function handles setting pages that are always to be revalidated for fresh- ness by any cache: function validate_cache_headers($my_modtime) { $pretty_modtime = gmdate(‘D, d M Y H:i:s’, $my_modtime) . ‘ GMT’; if($_SERVER[ ‘IF_MODIFIED_SINCE’] == $gmt_mtime) { header( “HTTP/1.1 304 Not Modified”); exit; } else { header(“Cache-Control: must-revalidate”); header( “Last-Modified: $pretty_modtime”); } } It takes as a parameter the last modification time of a page, and it then compares that time with the Is-Modified-Since header sent by the client browser. If the two times are identical, the cached copy is good, so a status code 304 is returned to the client, sig- nifying that the cached copy can be used; otherwise, the Last-Modified header is set, along with a Cache-Control header that mandates revalidation. To utilize this function, you need to know the last modification time for a page. For a static page (such as an image or a “plain” nondynamic HTML page), this is simply the modification time on the file. For a dynamically generated page (PHP or otherwise), the last modification time is the last time that any of the data used to generate the page was changed. Consider a Web log application that displays on its main page all the recent entries: $dbh = new DB_MySQL_Prod(); $result = $dbh->execute( “SELECT max(timestamp) FROM weblog_entries”); if($results) { list($ts) = $result->fetch_row(); validate_cache_headers($ts); } The last modification time for this page is the timestamp of the latest entry. If you know that a page is going to be valid for a period of time and you’re not con- cerned about it occasionally being stale for a user, you can disable the must-revalidate header and set an explicit Expires value.The understanding that the data will be some- Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 234 Chapter 9 External Performance Tunings what stale is important:When you tell a proxy cache that the content you served it is good for a certain period of time, you have lost the ability to update it for that client in that time window.This is okay for many applications. Consider, for example, a news site such as CNN’s. Even with breaking news stories, having the splash page be up to one minute stale is not unreasonable.To achieve this, you can set headers in a number of ways. If you want to allow a page to be cached by shared proxies for one minute, you could call a function like this: function cache_novalidate($interval = 60) { $now = time(); $pretty_lmtime = gmdate( ‘D, d M Y H:i:s’, $now) . ‘ GMT’; $pretty_extime = gmdate( ‘D, d M Y H:i:s’, $now + $interval) . ‘ GMT’; // Backwards Compatibility for HTTP/1.0 clients header(“Last Modified: $pretty_lmtime”); header(“Expires: $pretty_extime”); // HTTP/1.1 support header( “Cache-Control: public,max-age=$interval”); } If instead you have a page that has personalization on it (say, for example, the splash page contains local news as well), you can set a copy to be cached only by the browser: function cache_browser($interval = 60) { $now = time(); $pretty_lmtime = gmdate(‘D, d M Y H:i:s’, $now) . ‘ GMT’; $pretty_extime = gmdate( ‘D, d M Y H:i:s’, $now + $interval) . ‘ GMT’; // Backwards Compatibility for HTTP/1.0 clients header( “Last Modified: $pretty_lmtime”); header( “Expires: $pretty_extime”); // HTTP/1.1 support header( “Cache-Control: private,max-age=$interval,s-maxage=0”); } Finally, if you want to try as hard as possible to keep a page from being cached any- where, the best you can do is this: function cache_none($interval = 60) { // Backwards Compatibility for HTTP/1.0 clients header(“Expires: 0”); header(“Pragma: no-cache”); // HTTP/1.1 support header(“Cache-Control: no-cache,no-store,max-age=0,s-maxage=0,must-revalidate”); } Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 235 Content Compression The PHP session extension actually sets no-cache headers like these when session_start() is called. If you feel you know your session-based application better than the extension authors, you can simply reset the headers you want after the call to session_start(). The following are some caveats to remember in using external caches: n Pages that are requested via the POST method cannot be cached with this form of caching. n This form of caching does not mean that you will serve a page only once. It just means that you will serve it only once to a particular proxy during the cacheability time period. n Not all proxy servers are RFC compliant.When in doubt, you should err on the side of caution and render your content uncacheable. Content Compression HTTP 1.0 introduced the concept of content encodings—allowing a client to indicate to a server that it is able to handle content passed to it in certain encrypted forms. Compressing content renders the content smaller.This has two effects: n Bandwidth usage is decreased because the overall volume of transferred data is lowered. In many companies, bandwidth is the number-one recurring technology cost. n Network latency can be reduced because the smaller content can be fit into fewer network packets. These benefits are offset by the CPU time necessary to perform the compression. In a real-world test of content compression (using the mod_gzip solution), I found that not only did I get a 30% reduction in the amount of bandwidth utilized, but I also got an overall performance benefit: approximately 10% more pages/second throughput than without content compression. Even if I had not gotten the overall performance increase, the cost savings of reducing bandwidth usage by 30% was amazing. When a client browser makes a request, it sends headers that specify what type of browser it is and what features it supports. In these headers for the request, the browser sends notice of the content compression methods it accepts, like this: Content-Encoding: gzip,defalte There are a number of ways in which compression can be achieved. If PHP has been compiled with zlib support (the –enable-zlib option at compile time), the easiest way by far is to use the built-in gzip output handler.You can enable this feature by setting the php.ini parameter, like so: zlib.output_compression On Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 236 Chapter 9 External Performance Tunings When this option is set, the capabilities of the requesting browser are automatically determined through header inspection, and the content is compressed accordingly. The single drawback to using PHP’s output compression is that it gets applied only to pages generated with PHP. If your server serves only PHP pages, this is not a problem. Otherwise, you should consider using a third-party Apache module (such as mod_deflate or mod_gzip) for content compression. Further Reading This chapter introduces a number of new technologies—many of which are too broad to cover in any real depth here.The following sections list resources for further investi- gation. RFCs It’s always nice to get your news from the horse’s mouth. Protocols used on the Internet are defined in Request for Comment (RFC) documents maintained by the Internet Engineering Task Force (IETF). RFC 2616 covers the header additions to HTTP 1.1 and is the authoritative source for the syntax and semantics of the various header directives.You can download RFCs from a number of places on the Web. I prefer the IETF RFC archive: www.ietf.org/rfc.html. Compiler Caches You can find more information about how compiler caches work in Chapter 21 and Chapter 24. Nick Lindridge, author of the ionCube accelerator, has a nice white paper on the ionCube accelerator’s internals. It is available at www.php-accelerator.co.uk/ PHPA_Article.pdf. APC source code is available in PEAR’s PECL repository for PHP extensions. The ionCube Accelerator binaries are available at www.ioncube.com. The Zend Accelerator is available at www.zend.com. Proxy Caches Squid is available from www.squid-cache.org.The site also makes available many excel- lent resources regarding configuration and usage.A nice white paper on using Squid as an HTTP accelerator is available from ViSolve at http://squid.visolve.com/ white_papers/reverseproxy.htm. Some additional resources for improving Squid’s performance as a reverse proxy server are available at http://squid.sourceforge.net/ rproxy. mod_backhand is available from www.backhand.org. The usage of mod_proxy in this chapter is very basic.You can achieve extremely ver- satile request handling by exploiting the integration of mod_proxy with mod_rewrite. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. 237 Further Reading See the Apache project Web site (http://www.apache.org) for additional details. A brief example of mod_rewrite/mod_proxy integration is shown in my presentation “Scalable Internet Architectures” from Apachecon 2002. Slides are available at http://www. omniti.com/~george/talks/LV736.ppt. mod_accel is available at http://sysoev.ru/mod_accel. Unfortunately, most of the documentation is in Russian. An English how-to by Phillip Mak for installing both mod_accel and mod_deflate is available at http://www.aaanime.net/pmak/ apache/mod_accel. Content Compression mod_deflate is available for Apache version 1.3.x at http://sysoev.ru/ mod_deflate.This has nothing to do with the Apache 2.0 mod_deflate. Like the documentation for mod_accel, this project’s documentation is almost entirely in Russian. mod_gzip was developed by Remote Communications, but it now has a new home, at Sourceforge: http://sourceforge.net/projects/mod-gzip. Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... fault), the lock held by that process is released, preventing a deadlock from occurring PHP opts for whole-file locking with its flock() function Ironically, on most systems, this is actually implemented internally by using fcntl Here is the caching example reworked to use file locking: < ?php $file = $_SERVER[ PHP_ SELF’]; $cachefile = “$file.cache”; $lockfp = @fopen($cachefile, “a”); if(filesize($cachefile)... the process holding them exits Many operating systems have had bugs that under certain rare circumstances could cause locks to not be released on process death Many of the PHP SAPIs (including mod _php the traditional way for running PHP on Apache) are not single-request execution architectures.This means that if you leave a lock 249 250 Chapter 10 Data Component Caching lying around at request shutdown,... to a swapping implementation is simple: < ?php $cachefile = “{$_SERVER[ PHP_ SELF’]}.cache”; if(file_exists($cachefile)) { include($cachefile); return; } else { $cachefile_tmp = $cachefile.”.”.getmypid(); $cachefp = fopen($cachefile_tmp, “w”); ob_start(); } ?> Today is Cookie-Based Caching < ?php for ($i=1; $i < ?php } ?> ?php echo “Hello World”; header(“Content-Type: text/plain”); ?> You get this error: Cannot add header information - headers already sent In an HTTP response, all the headers must be sent at the beginning of the response, before any content (hence the name headers) Because PHP by default sends out content as it comes in, when you send headers... programmers coming from Java or mod_perl In PHP, all user data structures are destroyed at request shutdown.This means that with the exception of resources (such as persistent database connections), any objects you create will not be available in subsequent requests Although in many ways this lack of cross-request persistence is lamentable, it has the effect of making PHP an incredibly sand-boxed language,... object because a DBM file can only store strings (Actually, it can store arbitrary contiguous binary structures, but PHP sees them as strings.) If there is existing data under the key foo, it is replaced Some DBM drivers (DB4, for example) can support multiple data values for a given key, but PHP does not yet support this To get a previously stored value, you use the get() method to look up the data by... System V resources all come from a global pool, so even an occasional lost segment can cause you to quickly run out of available segments Even if PHP implemented shared memory segment reference counting for you (which it doesn’t), this would still be an issue if PHP or the server it is running on crashed unexpectedly In a perfect world this would never happen, but occasional segmentation faults are not... fragment that you can blindly include in the navigation bar With the tools you’ve created, the personalized navigation bar code looks like this: < ?php $userid = $_COOKIE[‘MEMBERID’]; $user = new User($userid); if(!$user->name) { header(“Location: /login .php ); } $navigation = $user->get_interests(); ?> ’s Home ... start actually generating the page: < ?php ob_start(); ?> This turns on output buffering support All output henceforth is stored in an internal buffer.Then you add the page code exactly as you would in a regular script: Today is After all the content is generated, you grab the content and flush it: < ?php $output = ob_get_contents(); ob_end_flush(); . remove this watermark. 231 Cache-Friendly PHP Applications Cache-Friendly PHP Applications To take advantage of caches, PHP applications must be made cache friendly single drawback to using PHP s output compression is that it gets applied only to pages generated with PHP. If your server serves only PHP pages, this is not

Ngày đăng: 26/01/2014, 09:20

Xem thêm