Queue Resource-intensive Operations

Một phần của tài liệu Pro PHP Security phần 9 pps (Trang 28 - 38)

More common than administrative operations, at least in most PHP applications, are resource-intensive operations, which require a similar separation layer but for entirely different reasons. In these cases, the focus is on controlling the quantity of operations allowed at any one time. In other words, the fact that these processes are being initiated by _`S`Uj isn’t the problem; it’s that _`S`Uj is actually a front for the entire Internet-connected world, and that there may be tens or hundreds of webserver processes running as _`S`Uj at the same time on the same server.

Under normal circumstances, serving flat or even PHP-based dynamic web pages doesn’t require a large number of server cycles or huge amounts of memory. Even when a website experiences lots of traffic, the processes that handle web requests are so efficient that they will saturate a 100-megabit Ethernet connection before running out of other system resources.

Properly tuned systems handle this sort of pounding day in and day out.

But, as we discussed earlier, some processes do indeed require an unusual amount of CPU time, memory, or other access to hardware. When PHP applications that rely on these opera- tions are exposed to a sudden burst of web requests, the server will slow to a crawl unless a queue is employed to limit the damage. A queue is a first-in, first-out list of messages. For our purposes here, those messages will be requests for resource-intensive operations made by your application.

The queue, then, is a list of jobs to be done, in order, by some other process that specializes in such things and has the level of access required to consume system resources. This process, which runs in the background, is known as a batch processor because every time it runs it reads, executes, and removes the next batch of jobs waiting in the queue. In many cases, the batch size will only be one job at a time, but because the batch processor continues to act on job after job as they come to the front of the queue, the overall effect is the same.

The Implications of Queuing

The batch processor to handle queued resource-intensive operations from your online PHP application could be a normal PHP script called periodically by Tc`_ and intended to take care of everything that is in the queue at that time. We suggest later in this chapter, in the “Using Tc`_ to Run a Script” section, that this is the most suitable solution for operations that are likely to take a considerable amount of time, or that can wait for some amount of time before being efficiently executed in a batch.

424 C H A P T E R 2 2 ■ S A F E L Y E X E C U T I N G S Y S T E M C O M M A N D S

The processor could alternatively be a daemon, a script running continuously in the back- ground and constantly checking for new requests in the queue. We suggest later in this chapter, in the “A PHP Batch-processing Daemon” section, that this is the most suitable solution for resource-intensive operations that need to be carried out immediately, while the calling script is waiting.

By evaluating whether there are resources available to carry out the requested operation, both Tc`_ the script and the daemon will allow you to achieve a strict level of control over how often such processes run, and how many of them are allowed to run at any given time.

Unfortunately, this queuing involves a lot of extra work up front, work that has an impact on the flow of your application. In addition to building a script or daemon to carry out the batch processing, you must have some way of getting the results back to the user who made the request.

Because queued jobs may not be executed immediately, you may need to build a job- ticketing system into your application. A job-ticketing system associates each job in the queue with a PHP session, so that the user can check the progress of the job and obtain the results once the operations have been carried out by the batch processor. More likely, it is the user’s browser that does the actual checking via a meta-refresh tag, while displaying a “Please be patient; we’re working on it” notice, possibly along with a thermometer or scrolling bar symbol (which typically provides at best only the vaguest approximation of real progress, and at worst a completely fictitious version).

For operations that could take a really long time to run, such as 3D rendering or video encoding, an email notification system is generally preferable to a session-based job-ticketing system. In this case, each job in the queue is associated with an email address (or IM or SMS account). When the job is completed, which may be long after the original session expired, the batch processor sends the user a message containing a link to the finished product.

Controlling Parallelization

The main reason for separating job requests from job execution is to control the number of resource-intensive jobs that are allowed to operate at any one time. The simultaneous execu- tion of similar jobs is called parallelization. Depending on the kind of job for which execution is being requested, and the current load on your server, you may be able to allow some small number of simultaneous processes to work at one time. Or you may want to allow only one.

Your batch-processing scripts need a way to discover how many other operations are in progress, in order to determine whether resources are available for their own operations.

The simplest way to prevent parallelization is to require that only one job can be run at a time. In this case, some sort of signal, typically either a file or a database flag, is set to indicate that a batch-processing operation is in progress, and that a new one should not be started at this time. Your batch-processing script would first, before initiating processing, check for the existence of a file, possibly located at something like gRc cf_ aYaSReTY. If the file exists, it will be taken as a sign that another job is executing, and your script will need to either exit or sleep for some period of time and try again. If the file doesn’t exist, the script will create one and then start the next job in the queue. Once all the queued jobs have been cleared, or the batch- processing script reaches the end of its life, the gRc cf_ aYaSReTY signal file is unlinked, allowing the next batch-processing script to take over when it runs.

Rather than use a file to indicate batch processing, you could use a PHP CLI binary compiled with the V_RS]VdY^`a directive, and store a flag in unix shared memory. Each new

SnyderSouthwell_5084C22.fm Page 424 Thursday, July 28, 2005 3:00 PM

process would check for this flag in a particular shared memory segment, and exit if it exists.

An introduction to PHP’s shared memory functions can be found at Yeea+ aYa_Ve dY^`a. There is an inherent flaw in either system: if your batch-processing script exits prema- turely for any reason (a fatal error, a kill signal, power failure, and system shutdown are all possibilities), then the signal file (or even shared memory segment in the less extreme cases) will remain, preventing any subsequent processing scripts from starting up, even though no jobs are in progress.

Using Process Control In PHP Daemons

All operating systems include features that allow processes to spawn and control child processes. So if you have a daemon whose job is processing a queue of batch jobs, it can spawn a number of children to handle a sudden influx of jobs, and kill them again when things quiet down. Daemons written in PHP can take advantage of the Process Control functions described at Yeea+ aYa_Ve cVWaT_e]. The Process Control functions are not supported by default in the CGI and CLI versions of PHP, which must be compiled with the V_RS]VaT_e] configura- tion option for that support to exist. Process control is not supported at all in Apache’s ^`UPaYa, and will cause “unexpected results,” according to the PHP Manual, when used in a webserver context. Since our goal is to move processing away from the webserver, this is certainly an acceptable limitation.

The fundamental difference between a PHP daemon and any other command-line PHP script is that the daemon is meant to run continuously in the background, and so is written on the one hand to conserve memory and resources, and on the other to handle the standard system signals.

A Brief Description of Signal Handling

Signals are a simple form of interprocess communication. There are some 32 different signals that can be sent to a process using the unix \Z]] command which, despite its name, can be used to send any defined signal. They range from the default E6C>, which asks the process to terminate, to the user-defined signal FD6C", which could be defined to mean anything at all.

Another common signal is 9FA, or hang-up, which typically causes a daemon to relaunch using fresh configuration file values. Two particular signals, <:== and DE@A, cannot be ignored by a process. The rest can be caught and either handled in some way, or ignored completely. Each signal has a default action (usually “terminate the process”) that is carried out if the signal is not caught.

The two signals that are most important to a daemon are E6C> and 49=5. Catching a E6C>

signal allows your daemon to close any existing connections and children, and exit gracefully.

We will discuss the 49=5 signal after introducing the notion of child processes in the next section.

Forking to Handle Simultaneous Requests

Very often a PHP daemon will need to respond to a number of simultaneous, or near simulta- neous, requests at the same time. In this case, the daemon should act as a parent process, constantly looping and listening for requests. When a request is detected, the daemon creates a child process to handle the request. That way, if another request is received before the first process finishes, it can simply be handed off to another child. The PHP function used to create

426 C H A P T E R 2 2 ■ S A F E L Y E X E C U T I N G S Y S T E M C O M M A N D S

a child process is aT_e]PW`c\ (see Yeea+ aYa_Ve aT_e]PW`c\ for more information). This parent-child handoff is how Apache handles requests. The parent YeeaU listens for incoming messages on port 80, and either hands them off to an existing child or spawns a new child process to handle them.

When a program forks, the kernel creates an exact copy of it. To the child process, at that instant, (almost) everything looks identical to how it looks to the parent process. As both processes carry out execution of the script, parent and child diverge.

The only difference between parent and child at the time of forking is this one: the child has its own unique process ID, and has a parent process ID that is set to the parent’s PID.

Parent and child do not share the same memory (it is actually copied, not merely referenced), but the child does have a copy of all of its parent’s resource descriptors. So for instance, the child processes will possess any file handles that were held by the parent at the time of forking.

In addition to having an identical memory structure, parent and child both continue executing the script at the same point.

This leads, almost immediately, to the emergence of a second difference between parent and child. When the aT_e]PW`c\ operation is complete, it returns a different value depending on whether it is returning to the child or the parent. To the forked child, it returns !. To the parent, it returns the process ID of the child. Most scripts will use a conditional statement to test this return value, to determine whether the current process is still the parent (in which case the return value is the child’s PID), or if it has become a new child process (in which case the return value is !), and then act accordingly.

When a child process is terminated, the parent automatically receives a 49=5 signal, which means “child status has changed.” At this point, the child becomes a “zombie” process, hanging on until its parent acknowledges its termination. In PHP, the aT_e]PhRZe function is used to determine the PID of a terminated child, and to free the resources and eliminate the zombie.

A Demonstration Daemon

We will demonstrate how to create a daemon with a moderately complex command-line PHP script that forks into a background process and then maintains five active children, killing the oldest and starting a new one every five seconds. The children each write something random to a log file every few seconds so that we know they are working. We will use this daemon as a pattern when implementing a more useful utility later in the chapter. This code can be found also as dZ^a]V5RV^`_5V^`aYa in the Chapter 22 folder of the downloadable archive of code for Pro PHP Security at Yeea+ hhhRacVddT`^.

fdc ]`TR] SZ_ aYa -0aYa

Wf_TeZ`_d

U]`XWf_TeZ`_hcZeVdR^VddRXVe`R]`XWZ]VhZeYTfccV_eA:5 Wf_TeZ`_U]`X^VddRXVl

SnyderSouthwell_5084C22.fm Page 426 Thursday, July 28, 2005 3:00 PM

X]`SR]]`X ]`TReZ`_`W]`XWZ]V UaZU aRcV_eA:5

TaZU, TYZ]UA:5 ZWV^aejTaZUl TfccV_eac`TVddZdRTYZ]U TaZU.TaZU,

acVWZi.TaZU, n

V]dVl

TfccV_eac`TVddZdRaRcV_e acVWZi.UaZU,

n

XVeWZ]VYR_U]Ve`RaaV_Ue`]`XWZ]V TWa.W`aV_]`XR,

hRZeW`cR_ViT]fdZgV]`T\`_eYV]`X W]`T\TWa=@4<P6I,

hcZeVeYV^VddRXV

WhcZeVTWaacVWZi^VddRXVMcM_, cV]VRdVeYV]`T\

W]`T\TWa=@4<PF?, T]`dVeYV]`XWZ]VYR_U]V WT]`dVTWa,

V_U`WU]`XWf_TeZ`_

n

dZXPYR_U]VcWf_TeZ`_TReTYVdR_UYR_U]VddZX_R]d Wf_TeZ`_dZXPYR_U]VcdZX_`l

X]`SR]TYZ]UTYZ]UcV_,

U]`XCVTVZgVUdZX_R]dZX_`,

428 C H A P T E R 2 2 ■ S A F E L Y E X E C U T I N G S Y S T E M C O M M A N D S

dhZeTYdZX_`l TRdVD:8E6C>+

YR_U]VdYfeU`h_eRd\d ZWTYZ]Ul

\Z]]R]]TYZ]Uac`TVddVd W`cVRTYTYZ]UcV_2DTaZUl a`dZiP\Z]]TaZUD:8E6C>, aT_e]PhRZedeRefd, n

U]`XEVc^Z_ReZ_XacZ_ePcTYZ]UcV_", n

V]dVl

U]`XEVc^Z_ReZ_X, n

ViZe, ScVR\, TRdVD:89FA+

YR_U]VcVdeRcecVbfVded ZWTYZ]Ul

\Z]]R]]TYZ]Uac`TVddVd W`cVRTYTYZ]UcV_2DTaZUl a`dZiP\Z]]TaZUD:8E6C>, aT_e]PhRZedeRefd, n

_`h]Rf_TYR_VhdZ^a]V5RV^`_5V^`aYa

U]`XCVdeRceZ_XacZ_ePcTYZ]UcV_", dYV]]PViVTaYadZ^a]V5RV^`_aYa/ UVg _f]]#/", n

V]dVl

U]`X4RfXYecVdeRcehRZeZ_XW`cE6C>, n

ViZe, ScVR\, TRdVD:849=5+

TYZ]UdeRefdTYR_XVfdVaT_e]PhRZee`T]VR_fak`^SZV TaZU.aT_e]PhRZedeRefd,

U]`X4RfXYeD:849=5Wc`^TaZUdeRefdhRddeRefd, ScVR\,

UVWRf]e+

YR_U]VR]]`eYVcdZX_R]d

U]`XhYZTYZdR_f_YR_U]VUdZX_R], n

SnyderSouthwell_5084C22.fm Page 428 Thursday, July 28, 2005 3:00 PM

V_U`WdZXPYR_U]VcWf_TeZ`_

n

dTYVUf]VdZX_R]TYVT\Z_X UVT]RcVeZT\d.", dVefadZX_R]YR_U]Vcd

aT_e]PdZX_R]D:8E6C>dZXPYR_U]Vc, aT_e]PdZX_R]D:89FAdZXPYR_U]Vc, aT_e]PdZX_R]D:849=5dZXPYR_U]Vc, `aV_R]`XWZ]VcVd`fcTV

]`X.URV^`_]`X, Wa.W`aV_]`Xh,

dZ^a]V5RV^`_5V^`aYaT`_eZ_fVd

The script begins with initialization details. The U]`X function does nothing more than write messages into a log file, locking it during writes to make sure that other daemons possibly running at the same time don’t step on what it is doing.

The dZXPYR_U]Vc function does the heavy work of handling signals, using the dhZeTY function to decide exactly what to do, and using the U]`X function to write out informative messages to the log.

The UVT]RcV construct tells the script to generate a tick for every line of the script; each time a tick is generated, the process checks to see whether it has been sent a signal (see Yeea+ aYa_Ve UVT]RcV and Yeea+ aYa_Ve aT_e] for more information). The three invoca- tions of the aT_e]PdZX_R] function tell the dZXPYR_U]Vc function that it is to handle the D:8E6C>, D:89FA, and D:849=5 signals, contained in system constants. Finally, the file in which the daemon’s messages will be saved by U]`X is created and opened for writing.

T`_eZ_fVddZ^a]V5RV^`_5V^`aYa

acZ_e7`c\Z_XZ_e`eYVSRT\Xc`f_U_`hMcM_, TcVReVeYVURV^`_

W`c\.aT_e]PW`c\,

eYVURV^`__`hViZdedR]`_XdZUVeYVdTcZae,W`cZeW`c\.!

ZWW`c\..."l

ViZe4`f]U_`eW`c\McM_, n

V]dVZWW`c\l eYVdTcZaeViZed

ViZeDeRceVUSRT\Xc`f_UURV^`_hZeYA:5W`c\McM_, n

430 C H A P T E R 2 2 ■ S A F E L Y E X E C U T I N G S Y S T E M C O M M A N D S

eYVURV^`_XVedZed`h_A:5 UaZU.a`dZiPXVeaZU,

eYVURV^`_UVeRTYVdWc`^eYVT`_ec`]]Z_XeVc^Z_R]

ZWa`dZiPdVedZUl

U]`X5RV^`_T`f]U_`eUVeRTY, ViZe,

n

d]VVa",

ac`gVeYReWZ]VUVdTcZae`cZdZ_YVcZeVUSjeYVURV^`_

WhcZeVWa7Z]VUVdTcZae`chRdZ_YVcZeVUWc`^`cZXZ_R]ac`TVddMcM_, WT]`dVWa,

U]`X:R^faR_Ucf__Z_XRdRURV^`_, Z_eZR]ZkV

TYZ]UcV_.RccRj, TYZ]U.72=D6,

dZ^a]V5RV^`_5V^`aYaT`_eZ_fVd

The script prints an informative message on the console, forks a child process (which will continue to run as a daemon), and then (since its sole purpose was to create that daemon) exits, leaving the child process running to manage things in the future. This process proves that it has inherited its parent’s environment by writing directly to the log file, and then (after logging an informational message) initializes by setting two necessary variables, an array to hold a list of the children that it creates, and a TYZ]U flag used to differentiate itself from its children.

T`_eZ_fVddZ^a]V5RV^`_5V^`aYa

]``aW`cVgVcf_eZ]D:8E6C>ZdcVTVZgVU hYZ]VECF6l

d]VVaW`c&dVT`_UdVRTY]``a d]VVa&,

ZWTYZ]Ul \Z]]`]UVdeTYZ]U

ZWT`f_eTYZ]UcV_/#l

\Z]]aZU.RccRjPdYZWeTYZ]UcV_, U]`X<Z]]Z_X\Z]]aZU_`h, a`dZiP\Z]]\Z]]aZUD:8E6C>, d]VVa",

n

SnyderSouthwell_5084C22.fm Page 430 Thursday, July 28, 2005 3:00 PM

TcVReVRTYZ]Uac`TVdd W`c\.aT_e]PW`c\, TYZ]Uac`TVdd_`hViZded ZWW`c\..."l U]`X4`f]U_`eW`c\, n

V]dVZWW`c\l TYZ]UcV_LN.W`c\, d]VVa#,

U]`X2UUVUW`c\e`TYZ]UcV_, n

V]dVl W`c\.!,_VhTYZ]Uac`TVddViVTfeVdYVcV TYZ]U.ECF6,

d]VVa",

TaZU.a`dZiPXVeaZU,

U]`XDeRceZ_XfaRdTYZ]U,

dVe_ZTVgR]fVe`#!W`c]`hVdeacZ`cZej ac`TP_ZTV#!,

n n V]dVl

ViZdeZ_XTYZ]UcV_d]VVaW`cd`^VcR_U`^eZ^V cR_U`^5V]Rj.cR_U$!!$!!!"!!!, fd]VVacR_U`^5V]Rj,

U]`X4YVT\Z_XZ_RWeVccR_U`^5V]Rj^ZTc`dVT`_Ud, n

V_UhYZ]V]``a n

0/

The daemon process (and each child that it forks) enters into an infinite loop. For the daemon, because the TYZ]U flag is 72=D6, that loop consists of checking to see whether a child needs to be killed (and doing so if it needs to), and creating new children. For each child, the value of W`c\ is !, and so that process goes immediately to the V]dV clause, where it sets the TYZ]U flag to ECF6, obtains its own PID, and writes a message to the log. Every other time through, the child simply sleeps and checks in until it is killed. The daemon will continue running until it itself is killed from the console.

We show now the output from this script, which simply announces on the console that it is creating a child, reports that child’s PID, and then exits.

7`c\Z_XZ_e`eYVSRT\Xc`f_U_`h DeRceVUSRT\Xc`f_UURV^`_hZeYA:5#*'"(

432 C H A P T E R 2 2 ■ S A F E L Y E X E C U T I N G S Y S T E M C O M M A N D S

From there, we must turn to the log file that is being generated by all those background calls to the U]`X function. A sample log generated by this daemon is reprinted, in part, here:

7Z]VUVdTcZae`chRdZ_YVcZeVUWc`^`cZXZ_R]ac`TVdd

#*'"(:R^faR_Ucf__Z_XRdRURV^`_

#*'")DeRceZ_XfaRdTYZ]U

#*'"(2UUVU#*'")e`TYZ]UcV_

#*'"*DeRceZ_XfaRdTYZ]U

#*'")4YVT\Z_XZ_RWeVc#%)%!!!^ZTc`dVT`_Ud

#*'"(2UUVU#*'"*e`TYZ]UcV_

#*'"*4YVT\Z_XZ_RWeVc"!)&!!!^ZTc`dVT`_Ud

#*'#!DeRceZ_XfaRdTYZ]U

#*'"(2UUVU#*'#!e`TYZ]UcV_

#*'")4YVT\Z_XZ_RWeVc#)&!!!!^ZTc`dVT`_Ud

#*'"*4YVT\Z_XZ_RWeVc*&)!!!^ZTc`dVT`_Ud

#*'"(<Z]]Z_X#*'")_`h

#*'")CVTVZgVUdZX_R]"&

#*'")EVc^Z_ReZ_X

#*'"(CVTVZgVUdZX_R]#!

#*'"(4RfXYeD:849=5Wc`^#*'")deRefdhRd!

#*'"(CVTVZgVUdZX_R]"&

#*'$&CVTVZgVUdZX_R]"&

#*'$&EVc^Z_ReZ_X

#*'$'CVTVZgVUdZX_R]"&

#*'$'EVc^Z_ReZ_X

#*'$*CVTVZgVUdZX_R]"&

#*'$*EVc^Z_ReZ_X

#*'"(EVc^Z_ReZ_X2ccRj

L!N./#*'$&

L"N./#*'$' L#N./#*'$*

This log file output shows the first few children being created, and then the first expiring child being killed. After the ellipsis, the daemon reports catching a E6C> signal sent with the console command \Z]]#*'"(.

Using a Nice Value to Assign a Lower Priority

We have discussed using signals and children to control the execution of background jobs, but there is a third technique as well, one that allows you to change the relative priority of a back- ground process. You may have noticed when reading through dZ^a]V5RV^`_5V^`aYa that each child calls a function named ac`TP_ZTV with a value of 20.

SnyderSouthwell_5084C22.fm Page 432 Thursday, July 28, 2005 3:00 PM

Một phần của tài liệu Pro PHP Security phần 9 pps (Trang 28 - 38)

Tải bản đầy đủ (PDF)

(53 trang)