Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 50 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
50
Dung lượng
510,94 KB
Nội dung
228
Chapter 9 External Performance Tunings
Pre-Fork, Event-Based, and Threaded Process Architectures
The three main architectures used for Web servers are pre-fork, event-based, and threaded models.
In a pre-fork model, a pool of processes is maintained to handle new requests. When a new request comes
in, it is dispatched to one of the child processes for handling. A child process usually serves more than one
request before exiting. Apache 1.3 follows this model.
In an event-based model, a single process serves requests in a single thread, utilizing nonblocking or asyn-
chronous I/O to handle multiple requests very quickly. This architecture works very well for handling static
files but not terribly well for handling dynamic requests (because you still need a separate process or thread
to the dynamic part of each request). thttpd, a small, fast Web server written by Jef Poskanzer, utilizes
this model.
In a threaded model, a single process uses a pool of threads to service requests. This is very similar to a pre-
fork model, except that because it is threaded, some resources can be shared between threads. The Zeus
Web server utilizes this model. Even though PHP itself is thread-safe, it is difficult to impossible to guaran-
tee that third-party libraries used in extension code are also thread-safe. This means that even in a threaded
Web server, it is often necessary to not use a threaded PHP, but to use a forked process execution via the
fastcgi or cgi implementations.
Apache 2 uses a drop-in process architecture that allows it to be configured as a pre-fork, threaded, or
hybrid architecture, depending on your needs.
In contrast to the amount of configuration inside Apache, the PHP setup is very similar
to the way it was before.The only change to its configuration is to add the following to
its
httpd.conf file:
Listen localhost:80
This binds the PHP instance exclusively to the loopback address. Now if you want to
access the Web server, you must contact it by going through the proxy server.
Benchmarking the effect of these changes is difficult. Because these changes reduce
the overhead mainly associated with handling clients over high-latency links, it is difficult
to measure the effects on a local or high-speed network. In a real-world setting, I have
seen a reverse-proxy setup cut the number of Apache children necessary to support a site
from 100 to 20.
Operating System Tuning for High Performance
There is a strong argument that if you do not want to perform local caching, then using
a reverse proxy is overkill.A way to get a similar effect without running a separate server
is to allow the operating system itself to buffer all the data. In the discussion of reverse
proxies earlier in this chapter, you saw that a major component of the network wait time
is the time spent blocking between data packets to the client.
The application is forced to send multiple packets because the operating system has a
limit on how much information it can buffer to send over a TCP socket at one time.
Fortunately, this is a setting that you can tune.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
229
Language-Level Tunings
On FreeBSD, you can adjust the TCP buffers via the following:
#sysctl –w net.inet.tcp.sendspace=131072
#sysctl –w net.inet.tcp.recvspace=8192
On Linux, you do this:
#echo “131072” > /proc/sys/net/core/wmem_max
When you make either of these changes, you set the outbound TCP buffer space to
128KB and the inbound buffer space to 8KB (because you receive small inbound
requests and make large outbound responses).This assumes that the maximum page size
you will be sending is 128KB. If your page sizes differ from that, you need to change the
tunings accordingly. In addition, you might need to tune
kern.ipc.nmbclusters to
allocate sufficient memory for the new large buffers. (See your friendly neighborhood
systems administrator for details.)
After adjusting the operating system limits, you need to instruct Apache to use the
large buffers you have provided. For this you just add the following directive to your
httpd.conf file:
SendBufferSize 131072
Finally, you can eliminate the network lag on connection close by installing the lingerd
patch to Apache.When a network connection is finished, the sender sends the receiver a
FIN packet to signify that the connection is complete.The sender must then wait for the
receiver to acknowledge the receipt of this FIN packet before closing the socket to
ensure that all data has in fact been transferred successfully. After the FIN packet is sent,
Apache does not need to do anything with the socket except wait for the FIN-ACK
packet and close the connection.The lingerd process improves the efficiency of this
operation by handing the socket off to an exterior daemon (lingerd), which just sits
around waiting for FIN-ACKs and closing sockets.
For high-volume Web servers, lingerd can provide significant performance benefits,
especially when coupled with increased write buffer sizes. lingerd is incredibly simple
to compile. It is a patch to Apache (which allows Apache to hand off file descriptors for
closing) and a daemon that performs those closes. lingerd is in use by a number of
major sites, including Sourceforge.com, Slashdot.org,andLiveJournal.com.
Proxy Caches
Even better than having a low-latency connection to a content server is not having to
make the request at all. HTTP takes this into account.
HTTP caching exists at many levels:
n
Caches are built into reverse proxies
n
Proxy caches exist at the end user’s ISP
n
Caches are built in to the user’s Web browser
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
230
Chapter 9 External Performance Tunings
Figure 9.5 shows a typical reverse proxy cache setup.When a user makes a request to
www.example.foo, the DNS lookup actually points the user to the proxy server. If the
requested entry exists in the proxy’s cache and is not stale, the cached copy of the page is
returned to the user, without the Web server ever being contacted at all; otherwise, the
connection is proxied to the Web server as in the reverse proxy situation discussed earlier
in this chapter.
Figure 9.5 A request through a reverse proxy.
Many of the reverse proxy solutions, including Squid, mod_proxy,andmod_accel, sup-
port integrated caching. Using a cache that is integrated into the reverse proxy server is
an easy way of extracting extra value from the proxy setup. Having a local cache guaran-
tees that all cacheable content will be aggressively cached, reducing the workload on the
back-end PHP servers.
client
PHP webserver
client
reverse proxy
client
High Latency
Internet Traffic
Internet
return
cache
page
Is content
cached?
yes
low latency connection
no
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
231
Cache-Friendly PHP Applications
Cache-Friendly PHP Applications
To take advantage of caches, PHP applications must be made cache friendly. A cache-
friendly application understands how the caching policies in browsers and proxies work
and how cacheable its own data is.The application can then be set to send appropriate
cache-related directives with browsers to achieve the desired results.
There are four HTTP headers that you need to be conscious of in making an appli-
cation cache friendly:
n
Last-Modified
n
Expires
n
Pragma: no-cache
n
Cache-Control
The Last-Modified HTTP header is a keystone of the HTTP 1.0 cache negotiation
ability. Last-Modified is the Universal Time Coordinated (UTC; formerly GMT) date
of last modification of the page.When a cache attempts a revalidation, it sends the Last-
Modified date as the value of its If-Modified-Since header field so that it can let the
server know what copy of the content it should be revalidated against.
The Expires header field is the nonrevalidation component of HTTP 1.0 revalida-
tion.The Expires value consists of a GMT date after which the contents of the request-
ed documented should no longer be considered valid.
Many people also view Pragma: no-cache as a header that should be set to avoid
objects being cached. Although there is nothing to be lost by setting this header, the
HTTP specification does provide an explicit meaning for this header, so its usefulness is
regulated by it being a de facto standard implemented in many HTTP 1.0 caches.
In the late 1990s, when many clients spoke only HTTP 1.0, the cache negotiation
options for applications where rather limited. It used to be standard practice to add the
following headers to all dynamic pages:
function http_1_0_nocache_headers()
{
$pretty_modtime = gmdate(‘D, d M Y H:i:s’) . ‘ GMT’;
header(“Last-Modified: $pretty_modtime”);
header(“Expires: $pretty_modtime”);
header(“Pragma: no-cache”);
}
This effectively tells all intervening caches that the data is not to be cached and always
should be refreshed.
When you look over the possibilities given by these headers, you see that there are
some glaring deficiencies:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
232
Chapter 9 External Performance Tunings
n
Setting expiration time as an absolute timestamp requires that the client and server
system clocks be synchronized.
n
The cache in a client’s browser is quite different than the cache at the client’s ISP.
A browser cache could conceivably cache personalized data on a page, but a proxy
cache shared by numerous users cannot.
These deficiencies were addressed in the HTTP 1.1 specification, which added the
Cache-Control directive set to tackle these problems.The possible values for a Cache-
Control response header are set in RFC 2616 and are defined by the following syntax:
Cache-Control = “Cache-Control”“:” l#cache-response-directive
cache-response-directive =
“public”
| “private”
| “no-cache”
| “no-store”
| “no-transform”
| “must-revalidate”
| “proxy-revalidate”
| “max-age”“=” delta-seconds
|
“s-maxage”“=” delta-seconds
The Cache-Control directive specifies the cacheability of the document requested.
According to RFC 2616, all caches and proxies must obey these directives, and the head-
ers must be passed along through all proxies to the browser making the request.
To specify whether a request is cacheable, you can use the following directives:
n
public—The response can be cached by any cache.
n
private—The response may be cached in a nonshared cache.This means that the
request is to be cached only by the requestor’s browser and not by any intervening
caches.
n
no-cache—The response must not be cached by any level of caching.The no-
store directive indicates that the information being transmitted is sensitive and
must not be stored in nonvolatile storage. If an object is cacheable, the final direc-
tives allow specification of how long an object may be stored in cache.
n
must-revalidate—All caches must always revalidate requests for the page.
During verification, the browser sends an
If-Modified-Since header in the
request. If the server validates that the page represents the most current copy of the
page, it should return a
304 Not Modified response to the client. Otherwise, it
should send back the requested page in full.
n
proxy-revalidate—This directive is like must-revalidate, but with proxy-
revalidate, only shared caches are required to revalidate their contents.
n
max-age—This is the time in seconds that an entry is considered to be cacheable
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
233
Cache-Friendly PHP Applications
without revalidation.
n
s-maxage—This is the maximum time that an entry should be considered valid
in a shared cache. Note that according to the HTTP 1.1 specification, if max-age
or s-maxage is specified, they override any expirations set via an Expire header.
The following function handles setting pages that are always to be revalidated for fresh-
ness by any cache:
function validate_cache_headers($my_modtime)
{
$pretty_modtime = gmdate(‘D, d M Y H:i:s’, $my_modtime) . ‘ GMT’;
if($_SERVER[
‘IF_MODIFIED_SINCE’] == $gmt_mtime) {
header(
“HTTP/1.1 304 Not Modified”);
exit;
}
else {
header(“Cache-Control: must-revalidate”);
header(
“Last-Modified: $pretty_modtime”);
}
}
It takes as a parameter the last modification time of a page, and it then compares that
time with the Is-Modified-Since header sent by the client browser. If the two times
are identical, the cached copy is good, so a status code 304 is returned to the client, sig-
nifying that the cached copy can be used; otherwise, the Last-Modified header is set,
along with a Cache-Control header that mandates revalidation.
To utilize this function, you need to know the last modification time for a page. For a
static page (such as an image or a “plain” nondynamic HTML page), this is simply the
modification time on the file. For a dynamically generated page (PHP or otherwise), the
last modification time is the last time that any of the data used to generate the page was
changed.
Consider a Web log application that displays on its main page all the recent entries:
$dbh = new DB_MySQL_Prod();
$result = $dbh->execute(
“SELECT max(timestamp)
FROM weblog_entries”);
if($results) {
list($ts) = $result->fetch_row();
validate_cache_headers($ts);
}
The last modification time for this page is the timestamp of the latest entry.
If you know that a page is going to be valid for a period of time and you’re not con-
cerned about it occasionally being stale for a user, you can disable the must-revalidate
header and set an explicit
Expires value.The understanding that the data will be some-
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
234
Chapter 9 External Performance Tunings
what stale is important:When you tell a proxy cache that the content you served it is
good for a certain period of time, you have lost the ability to update it for that client in
that time window.This is okay for many applications.
Consider, for example, a news site such as CNN’s. Even with breaking news stories,
having the splash page be up to one minute stale is not unreasonable.To achieve this, you
can set headers in a number of ways.
If you want to allow a page to be cached by shared proxies for one minute, you could
call a function like this:
function cache_novalidate($interval = 60)
{
$now = time();
$pretty_lmtime = gmdate(
‘D, d M Y H:i:s’, $now) . ‘ GMT’;
$pretty_extime = gmdate(
‘D, d M Y H:i:s’, $now + $interval) . ‘ GMT’;
// Backwards Compatibility for HTTP/1.0 clients
header(“Last Modified: $pretty_lmtime”);
header(“Expires: $pretty_extime”);
// HTTP/1.1 support
header(
“Cache-Control: public,max-age=$interval”);
}
If instead you have a page that has personalization on it (say, for example, the splash page
contains local news as well), you can set a copy to be cached only by the browser:
function cache_browser($interval = 60)
{
$now = time();
$pretty_lmtime = gmdate(‘D, d M Y H:i:s’, $now) . ‘ GMT’;
$pretty_extime = gmdate(
‘D, d M Y H:i:s’, $now + $interval) . ‘ GMT’;
// Backwards Compatibility for HTTP/1.0 clients
header(
“Last Modified: $pretty_lmtime”);
header(
“Expires: $pretty_extime”);
// HTTP/1.1 support
header(
“Cache-Control: private,max-age=$interval,s-maxage=0”);
}
Finally, if you want to try as hard as possible to keep a page from being cached any-
where, the best you can do is this:
function cache_none($interval = 60)
{
// Backwards Compatibility for HTTP/1.0 clients
header(“Expires: 0”);
header(“Pragma: no-cache”);
// HTTP/1.1 support
header(“Cache-Control: no-cache,no-store,max-age=0,s-maxage=0,must-revalidate”);
}
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
235
Content Compression
The PHP session extension actually sets no-cache headers like these when
session_start() is called. If you feel you know your session-based application better
than the extension authors, you can simply reset the headers you want after the call to
session_start().
The following are some caveats to remember in using external caches:
n
Pages that are requested via the POST method cannot be cached with this form of
caching.
n
This form of caching does not mean that you will serve a page only once. It just
means that you will serve it only once to a particular proxy during the cacheability
time period.
n
Not all proxy servers are RFC compliant.When in doubt, you should err on the
side of caution and render your content uncacheable.
Content Compression
HTTP 1.0 introduced the concept of content encodings—allowing a client to indicate
to a server that it is able to handle content passed to it in certain encrypted forms.
Compressing content renders the content smaller.This has two effects:
n
Bandwidth usage is decreased because the overall volume of transferred data is
lowered. In many companies, bandwidth is the number-one recurring technology
cost.
n
Network latency can be reduced because the smaller content can be fit into fewer
network packets.
These benefits are offset by the CPU time necessary to perform the compression. In a
real-world test of content compression (using the
mod_gzip solution), I found that not
only did I get a 30% reduction in the amount of bandwidth utilized, but I also got an
overall performance benefit: approximately 10% more pages/second throughput than
without content compression. Even if I had not gotten the overall performance increase,
the cost savings of reducing bandwidth usage by 30% was amazing.
When a client browser makes a request, it sends headers that specify what type of
browser it is and what features it supports. In these headers for the request, the browser
sends notice of the content compression methods it accepts, like this:
Content-Encoding: gzip,defalte
There are a number of ways in which compression can be achieved. If PHP has been
compiled with zlib support (the –enable-zlib option at compile time), the easiest way
by far is to use the built-in
gzip output handler.You can enable this feature by setting
the
php.ini parameter, like so:
zlib.output_compression On
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
236
Chapter 9 External Performance Tunings
When this option is set, the capabilities of the requesting browser are automatically
determined through header inspection, and the content is compressed accordingly.
The single drawback to using PHP’s output compression is that it gets applied only to
pages generated with PHP. If your server serves only PHP pages, this is not a problem.
Otherwise, you should consider using a third-party Apache module (such as
mod_deflate or mod_gzip) for content compression.
Further Reading
This chapter introduces a number of new technologies—many of which are too broad
to cover in any real depth here.The following sections list resources for further investi-
gation.
RFCs
It’s always nice to get your news from the horse’s mouth. Protocols used on the Internet
are defined in Request for Comment (RFC) documents maintained by the Internet
Engineering Task Force (IETF). RFC 2616 covers the header additions to HTTP 1.1
and is the authoritative source for the syntax and semantics of the various header direc-
tives.You can download RFCs from a number of places on the Web. I prefer the IETF
RFC archive:
www.ietf.org/rfc.html.
Compiler Caches
You can find more information about how compiler caches work in Chapter 21 and
Chapter 24.
Nick Lindridge, author of the ionCube accelerator, has a nice white paper on the
ionCube accelerator’s internals. It is available at
www.php-accelerator.co.uk/
PHPA_Article.pdf.
APC source code is available in PEAR’s PECL repository for PHP extensions.
The ionCube Accelerator binaries are available at
www.ioncube.com.
The Zend Accelerator is available at
www.zend.com.
Proxy Caches
Squid is available from www.squid-cache.org.The site also makes available many excel-
lent resources regarding configuration and usage.A nice white paper on using Squid as
an HTTP accelerator is available from ViSolve at
http://squid.visolve.com/
white_papers/reverseproxy.htm. Some additional resources for improving Squid’s
performance as a reverse proxy server are available at
http://squid.sourceforge.net/
rproxy.
mod_backhand is available from www.backhand.org.
The usage of
mod_proxy in this chapter is very basic.You can achieve extremely ver-
satile request handling by exploiting the integration of
mod_proxy with mod_rewrite.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
237
Further Reading
See the Apache project Web site (http://www.apache.org) for additional details. A brief
example of mod_rewrite/mod_proxy integration is shown in my presentation “Scalable
Internet Architectures” from Apachecon 2002. Slides are available at http://www.
omniti.com/~george/talks/LV736.ppt.
mod_accel is available at http://sysoev.ru/mod_accel. Unfortunately, most of the
documentation is in Russian. An English how-to by Phillip Mak for installing both
mod_accel and mod_deflate is available at http://www.aaanime.net/pmak/
apache/mod_accel.
Content Compression
mod_deflate is available for Apache version 1.3.x at http://sysoev.ru/
mod_deflate.This has nothing to do with the Apache 2.0 mod_deflate. Like the docu-
mentation for mod_accel, this project’s documentation is almost entirely in Russian.
mod_gzip was developed by Remote Communications, but it now has a new home,
at Sourceforge:
http://sourceforge.net/projects/mod-gzip.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
[...]... fault), the lock held by that process is released, preventing a deadlock from occurring PHP opts for whole-file locking with its flock() function Ironically, on most systems, this is actually implemented internally by using fcntl Here is the caching example reworked to use file locking: < ?php $file = $_SERVER[ PHP_ SELF’]; $cachefile = “$file.cache”; $lockfp = @fopen($cachefile, “a”); if(filesize($cachefile)... the process holding them exits Many operating systems have had bugs that under certain rare circumstances could cause locks to not be released on process death Many of the PHP SAPIs (including mod _php the traditional way for running PHP on Apache) are not single-request execution architectures.This means that if you leave a lock 249 250 Chapter 10 Data Component Caching lying around at request shutdown,... to a swapping implementation is simple: < ?php $cachefile = “{$_SERVER[ PHP_ SELF’]}.cache”; if(file_exists($cachefile)) { include($cachefile); return; } else { $cachefile_tmp = $cachefile.”.”.getmypid(); $cachefp = fopen($cachefile_tmp, “w”); ob_start(); } ?> Today is Cookie-Based Caching < ?php for ($i=1; $i < ?php } ?> ?php echo “Hello World”; header(“Content-Type: text/plain”); ?> You get this error: Cannot add header information - headers already sent In an HTTP response, all the headers must be sent at the beginning of the response, before any content (hence the name headers) Because PHP by default sends out content as it comes in, when you send headers... programmers coming from Java or mod_perl In PHP, all user data structures are destroyed at request shutdown.This means that with the exception of resources (such as persistent database connections), any objects you create will not be available in subsequent requests Although in many ways this lack of cross-request persistence is lamentable, it has the effect of making PHP an incredibly sand-boxed language,... object because a DBM file can only store strings (Actually, it can store arbitrary contiguous binary structures, but PHP sees them as strings.) If there is existing data under the key foo, it is replaced Some DBM drivers (DB4, for example) can support multiple data values for a given key, but PHP does not yet support this To get a previously stored value, you use the get() method to look up the data by... System V resources all come from a global pool, so even an occasional lost segment can cause you to quickly run out of available segments Even if PHP implemented shared memory segment reference counting for you (which it doesn’t), this would still be an issue if PHP or the server it is running on crashed unexpectedly In a perfect world this would never happen, but occasional segmentation faults are not... fragment that you can blindly include in the navigation bar With the tools you’ve created, the personalized navigation bar code looks like this: < ?php $userid = $_COOKIE[‘MEMBERID’]; $user = new User($userid); if(!$user->name) { header(“Location: /login .php ); } $navigation = $user->get_interests(); ?> ’s Home ... start actually generating the page: < ?php ob_start(); ?> This turns on output buffering support All output henceforth is stored in an internal buffer.Then you add the page code exactly as you would in a regular script: Today is After all the content is generated, you grab the content and flush it: < ?php $output = ob_get_contents(); ob_end_flush(); . remove this watermark.
231
Cache-Friendly PHP Applications
Cache-Friendly PHP Applications
To take advantage of caches, PHP applications must be made cache friendly single drawback to using PHP s output compression is that it gets applied only to
pages generated with PHP. If your server serves only PHP pages, this is not