594 5 Safety and Risk i n Engineering Design where: RC = risk cost C 0 = initial cost constant (set to zero for cost comparisons) C 1 = cost constant multiplied with the CER variable of mass C 2 = cost constant multiplied with the CER variable of material C s = cost variable for ensuring required reliability and safety. The cost of ensuring the required reliability and safety relative to the selected at- tributes can be formulated as C s = C f R (5.14) where: C f = cost of failure relative to the selected attributes R = risk of a failure incident occurring. The risk of a failure incident occurring can be formulated as R = p·c (5.15) where: p = the probability of the event occurr ing c = the consequence of the risk on the estimate. 5.2.2.2 Process Operational Risk Modelling Complex process systems, especially complex integrations of systems, increasingly have to cope with risk in their operating environment. As a result, it is necessary and useful to develop a safety hypothesis,expressedas a risk equation, which relatessys- tem throughput capacity to risk. Such a risk equation has its roots in financial risk management and has been expanded to measure the mean expected loss risk, which is more suitable for process systems in general. Such a measure not only quantifies risk but also clarifies system safety principles during conceptual design. Early iden- tification of specific risk costs and safety benefits of different design alternatives enables avoidance or mitig ation of hazards that could result in operational losses. a) Overview of the Risk Hypothesis and Risk Equation From Eqs. (4.23) and (4.24) in Sect. 4.2.1.2, a process system is considered to be a functional unit that converts inputs to outputs, and which may be composed of sub-systems connected either in series or in parallel, enabling the system to convert a set of process inputs, I p , to a set of process outputs, O p , per unit time, so that O p is equivalent to the system throughput, T p , where the yield is 100%. 5.2 Theoretical Overview of Safety and Risk in Engineering Design 595 Equation (4.23) is reviewed here as the following expression Process throughput T C proc = Material in process Processing time (5.16) = Rated capacity (C r ) The term throughput capacity relates engineering process throughput T p to rated capacity C r .IfT p is the maximum value for O p ,thenT p is seen as the throughput capacity of the system, measuredas the units of output per unit time when the system is operating at rated capacity. In general, if the system is operating at a fraction f of throughput capacity T p , due to process fluctuations, where f is an average constant (i.e. 0.95), then the reduced throughput,U, can be determined. The reduced throughput, U, can be expressed as U = f ×T p (5.17) In reality, the system will be exposed to unpredictable fluctuations in through- put capacity and, over a period of time t, the mean and, thus, expected throughput capacity will be T p ,where T p = n ∑ t=0 U t /n (5.18) where: T p = mean throughput capacity n = number of time periods. In real loss-deviation time periods, the actual capacity values can be expressed as the series S T p = {T p −L 1 ,T p −L 2 , T p −L n } (5.19) where L 1 ,L 2 , ,L n are loss deviations from the average T p . The expected or average T p actually rarely occurs, if at all. In reality, it is the unpredictable sequence of losses (L 1 ,orL 2 , ,L n ) with respect to an average or expected throughput capacity T p , in a given time period, which is used in the mea- sure of risk of loss of throughput. Two meaningful measures of risk may be used, the traditional standard deviation measure, and a new measure, the mean expected loss that in many cases is more suitable for systems in general. b) Risk Measures Risk measures are statistical measures, such as the standard deviation risk (SD-risk) with respect to the mean throughput capacity T p ; if twice the standard deviation is used, then an even stronger risk measure is obtained, the two-standard deviations risk (2-SD-risk) measure. A new measure more suitable for process systems in gen- eral, termed the mean expected loss risk (MEL-risk) with respect to hazard-free T p , is proposed (Bradley 2001). 596 5 Safety and Risk i n Engineering Design In general, risk of loss L of throughput capacity has two components, namely the probability of a hazard occurring, and the size of the loss in throughput with respect to some standard level of throughput. A MEL-risk of loss L means that the average loss, with respect to the mean throughput capacity T p in a period where the hazard does not occur, is exactly L. The standarddeviationmeasure of possible loss with respect to the mean through- put capacity, T p , is the SD-risk measure. This measure is obtained by determining the standard deviation of the mean s of all the deviations (L 1 ,L 2 , ,L n ) from the mean throughput capacity T p . An SD-risk of s means that, in the next time unit, there is: • a 50% chance or probability of a loss from the expected throughput capacity T p , • a 34.1% chance of a loss between 0 and s from the expected T p , • a 15.9% chance of a loss > s. For a two-standard deviations measure, there is a 47.7% chance of a loss between 0and2s with respect to T p . This implies that there is a 13.6% chance of a loss between s and 2s, and a 2.3% chance of a loss >2s, both losses with reference to the mean throughput capacity T p . In specifying an SD-risk, the standard deviation of the variations in throughput must be specified, as well as the standard level of throughput. A 2-SD-risk of 2s means that, in the next time un it, there is: • a 50% chance or probability of a loss from the expected throughput capacity T p , • a 47.7% chance of a loss between 0 and 2s from the expected T p , • a 2.3% chance of a loss > 2s. It is assumed that the losses in each time unit are distributed normally, and the percentages are obtained from a normal distribution function table. These percent- ages will inevitably be different if the distribution departs from normal. The SD-risk measure is widely used in financial risk analysis, particularly for stock and bond portfolio management, since stock and bond prices follow a random pattern that gives rise to a near-normal distribution of price changes (Beaumont 1986). Where there is exposure to future loss, which can be made up of two loss com- ponents, namely a certain loss and a probable loss, the SD-risk measure considers only the probable loss, which in effect is the true risk. This is better explained with the aid of an example: assume a system has a mean throughput capacity T p = 400 if there was no future loss exposure. Suppose that the system has exposure to a future loss in T p with a mean of 100 and a standard deviation of 14 where the least loss is always greater than 70. This implies a certain loss of 70 plus a loss that makes up the balance with a mean of 30. This balance can, however, be as small as 0 (left side of the mean) and as large as 60 (right side of the mean), with a standard deviation of 14. The future loss thus has a certain loss of 70 and a probable loss of 30, with a standard deviation about the mean of the loss variations of 30 that is equal to 14. This standard deviation about the mean of the probable loss is th e SD-risk. The sys- tem has a certain loss of 70 and a probable loss with a mean of 30 and an SD-risk of 14. 5.2 Theoretical Overview of Safety and Risk in Engineering Design 597 To deal with the problems that arise in arbitrary systems, whe re variations in throughput depart significantly from the normal distribution and the distribution of losses is not norma l, an additional risk measure becomes essential. This is the mean expected loss risk (MEL-risk). Suppose that for a system exposed to risk, there is at least one hazard-freetime period in which, by chance, the hazard does not occur,and where the loss with respect to the mean throughput capacity T p is L in this hazard- free time period, and where a loss exceeding L is not probable (but a loss less than L is probable). Thus, in the best-case scenario, the total hazard-free throughput capac- ity is T p −L. Then all other throughput capacities, each in a time period where the hazard does occur in varying degrees of intensity, i.e. T p −L 1 ,T p −L 2 , ,T p −L n , may be considered as exhibiting losses, or loss deviations, with respect to the value of T p in the hazard-free time period. The mean of these loss deviations from T p in a h azard-free time period may be used as a measure of the risk. This measure of expected loss in the future with respect to the throughput capacity for a hazard-free time period is the mean expected loss risk (MEL-risk). Thus, a MEL-risk of loss L means that the average loss, with respect to the mean throughput capacity T p in a time p eriod where the hazard does not occur, is exactly L. In specifying a MEL- risk, the mean deviation of the variations in throughput must be specified, as well as the standard level of throughput. A MEL-risk of loss L is two standard devia- tions from the mean T p . The definitions of the loss variance, standard deviation or SD-risk, and two standard deviations or MEL-risk of loss L from the mean T p are considered by their formulation. The variance (V) is the square of the differences between the losses and their average V =(1/n) · ∑ (L k −A L ) 2 (5.20) where: L k = the loss L k (k = 1ton)forn losses A L = the average (or mean) (1/n) ∑ L k . The standard deviation (SD) is the spread about the average (or mean) SD 2 =(1/n) · ∑ (L k −A L ) 2 SD = (1/n) · ∑ (L k −A L ) 2 (5.21) SD is the root mean square deviation between the losses and their average (SD 2 is the difference between the average of the squares and the square of the average), and can be computed as MEL-risk = (1/n) ·L 2 k − (1/n) · ∑ L k 2 (5.22) A1—standard deviation, SD 1 SD 1 = (1/n) ·L 2 k − (1/n) · ∑ L k 2 SD 1 = SD-risk (5.23) 598 5 Safety and Risk i n Engineering Design A2—standard deviation, SD 2 SD 2 = (1/n) ·L 2 k − (1/n) · ∑ L k 2 SD 2 = MEL-risk (5.24) where: L k = the loss L k (k = 1ton) for n losses. There are two extreme cases with regard to T p for a hazard-free period of time (Bradley 2001): (i) Explicit hazard-free case: In the explicit case, the h azard-free throughput capacity T p −L cannot be ex- ceeded beyond the value of L. This throughputcapacity remains in a time period when no hazard occurs. However, a hazard is certain to occur sometime. Thus, over a period of time, there will be a distribution of n losses about the mean and, in at least one of the n time periods, there will occur a loss deviation L with respect to the mean throughput capacity T p . However, no loss deviation below L will ever occur. The concept of a hazard-free throughput capacity level T p −L implies: (1) that no variation in throughput capacity can occur leading to a throughput capacity below the hazard-free level, and (2) that the only variations in throughput capacity that can occur must lead to a throughput capacity at or below the hazard-free level. This ensures that a ll probablelosses are included in, andcertain losses excluded from, the MEL-risk measure. (ii) Implicit hazard-free case: In the implicit case, the values in each time period fluctuate about the mean throughput capacity T p , and the distribution of the deviations from the mean follows some reasonably bell-shaped distribution, where large but usually im- probableloss deviations from the mean throughputcapacity T p occur, and where no explicit hazard-free throughput capacity can be determined. In such a case, a hazard-freethroughputcapacity T p −L may be defined where the loss L is two standard deviations from the mean. For this case, the MEL-risk is defined as the mean expected loss with respect to T p −L for the hazard-free period, with a value equivalent to two standard deviations of the mean throughput capacity T p . MEL-risk can therefore be v iewed as the hazard-free deviation, either explicit or implicit, from the throughput capacity T p , and is also equal to the average loss to be expected in a future hazard-free time period, with respect to throughputcapacity T p . 5.2 Theoretical Overview of Safety and Risk in Engineering Design 599 5.2.2.3 Hazard and Operability Studies for Risk Prediction Safety issues have to be considered throughout an engineered installation’s life cy- cle, from design, manufacture, installation, assembly and construction, through to start-up and operation. The later the hazardous operating modes are detected in this development process, the more serious and expensive they become to avoid or miti- gate in terms of the required plant modifications. Thus, an extensive and systematic examination of safety aspects has to be carried out carefully and at the earliest pos- sible opportunity in the engineering design stage. To meet these essential demands, a thorough safety and hazards analysis is compulsory during the engineeringdesign and development stages, for official approval to commence with construction. The initial step of such an analysis is process hazard identification (PHI),which aims at identifying potential hazards that may be caused either by the nature of the process or the intended systems configuration. Further steps in this analysis are achieved by a variety of methods such aswhat-if analyses and safety checklists, usu- ally incorporated in a more formal hazard and operability study (HazOp) conducted as early as possible in the conceptual and/or preliminary design phases. However, investigations in these early design phases identify faults only in the basic plant layout because no detailed specifications of process behaviour, or of the required controller equipment, may yet be available. Therefore, in the later detail engineer- ing phase, further examination of the dynamic behaviour of systems is necessary to determine fail safe control by programmable logic controllers (PLCs) or distributed control systems (DCSs). The technique of HazOp h as been used and developed over approximately four decades for identifying po ten tial hazards and operability problems caused by devi- ations from the design intent of both new and existing process plants. Because of the high profile of process plant accidents, emphasis has often been placed upon the identification of hazards but, in so doing, potential operability problems have been neglected. Yet, it is in the latter area that benefits of a HazOp study are usually the greatest. With respect to ‘design intent’, all industrial processes are designed for a purpose. Process systems are designed and constructed to achieve desired objec- tives. In order to do so, each item of equipmentmust consistently function according to specified criteria. These criteria can be classified as the ‘design intent’ for each particular item. As an example, in the cooling water system of Fig. 5.5, consider now the cooling water circuit piping in which th e pumps are installed. A simplified statement o f the d esign intent of this small section of the reactor cooling system would be ‘to continuously circulate cooling water at an initial temperature of X ◦ C and at a rate of Y l per hour’. It is usually at this low level of design intent that a HazOp study is directed. The use of the word ‘deviation’ now becomes easier to understand. In the case of the cooling water circuit, a deviation or departure from the design intent would be a cessation of circulation, or the water being at an excessively high initial temperature. It is important to note the difference between a deviation and its cause. In this case, failure of the pump would be a cause, not a deviation, and a bent shaft due to insufficient lubrication would be a possible root cause. Essentially,the HazOp 600 5 Safety and Risk i n Engineering Design procedure involves taking a full description of a process system and systematically questioning every part of it to establish how deviations from the design intent can arise. Once identified, an assessment is made as to whether such deviations and their consequences can have a negative effect upon the safe and efficient operation of the system. If considered necessary, remedial action is then taken. An essential feature in this p rocess of questioning and systematic analysis is the use of keywords to focus attention on deviations and their possible causes. In Sect. 5.2.1.5, keywords consisted of guidewords, attributes and process parame- ters. In the early conceptual phase of engineering design, when many equipment attributes and process parameters have not yet been defined but it is considered ex- pedient to conduct a preliminary HazOp study, these keywords are simplified by grouping into two subsets: • Primary keywords, which focus attention upon a particular aspect of the design intent or an associated process condition or par ameter (e.g. isolate, vent, open, clean, drain, purge, inspect, maintain, start-up and shut- down). • Secondary keywords, which are combined with a primary keyword to suggest possible deviations (e.g. no, less, more, also, other, early, late, reverse, fluctuation). The usefulness of a preliminary HazOp study thus revolves around the effective ap- plication of these two subsets of keywords—for example,(pressure/maintain)(pres- sure/less) as primary and secondary keyword combinations. a) Primary and Secondary Keywords Primary keywords reflect both the process design intent and operational aspects of the system being studied. Typical process-oriented words are very similar to the pro- cess parameters of Sect. 5.2.1.5, as the words employed will generally depend upon the process being studied, whether at systems level or at a more detailed component level. However, the technique is hazard and operability studies; thus, added to the above primary keywords might be relevant operational words such as those given in Table 5.9. This latter type of primary keyword is sometimes either overlooked or g iven secondary importance. Improper consideration of the word ‘isolate’, for example, can result in impromptu and sometimes h azardous means of taking a non-essential Table 5.9 Operational primary keywords Isolate Drain Vent Purge Open Inspect Clean Maintain Start-up Shutdown 5.2 Theoretical Overview of Safety and Risk in Engineering Design 601 item of equipment offline for repairs because no secure means of isolation has been provided. Sufficient consideration of the words ‘start-up’ and ‘shutdown’ are par- ticularly important, as most hazardous situations arise during these activities. For example, during commissioning it is found that the plant cannot be brought on- stream because no provision for safe manual override of the safety system trips has been provided, or it may be discovered that it is necessary to shut down an entire system just to re-calibrate or replace a pressure gauge. Secondary keywords are similar to the HazOp guidewords of Sect. 5.2.1.5 and, when applied in conjunction with a primary keyword, they suggest potential de- viations or problems. Although they tend to be a standard set, the following list is takenfromTable 5.5 with a reviewof their meaningsin line with industrial processes (Table 5.10). Table 5.10 Operational secondary keywords: standard HazOp guidewords Secondary keywords (standard HazOp guidewords) Word Meaning No The design intent does not occur (e.g. flow/no) or the operational aspect is not achievable (isolate/no) Less A quantitative decrease in the design intent occurs (e.g. pressure/less) More A quantitative increase in the design intent occurs (e.g. temperature/more) Reverse The opposite of t he design intent occurs (e.g. flow/reverse) Also The design intent is completely f ulfilled but, in addition, som e other related activity occurs (e.g. flow/also, i ndicating contamination in a product stream, or level/also meaning material in a tank or vessel that should not be there) Other The activity occurs but not in the way intended (e.g. flow/other could indicate a leak or product flowing where it should not, or composition/other might suggest unexpected proportions in a feedstock) Fluctuation The design intention is achiev ed only part of the time (e.g. an airlock in a pipeline might result in flow/fluctuation) Early Usually used when studying sequential operations; this would indicate that a step i s started at the wrong time o r done out of sequence Late Usually used when studying sequential operations; this would indicate that a step i s started at the wrong time o r done out of sequence b) HazOp Study Methodology In simple terms, the HazOp study process involves systematically applying all rel- evant keyword combinations to the system in question, in an effort to uncover po- tential problems. The results are recorded in columnar format under the following headings: node, attributes/parameters, deviations, causes, consequences, safeguards, action. 602 5 Safety and Risk i n Engineering Design Fig. 5.21 Example of part of a cooling w ater system In considering the information to be recorded in each of these columns, an example of part of the cooling water system depicted in Fig. 3.18 of Sect. 3.2.2.6 dealing with fault-tree analysis is illustrated in the simple schematic below (Fig. 5.21). HazOp Study for Part of Cooling Water System Process from–to nodes X 1 → X 2 . Attributes Pump P 1 flow, pressure Dosing tank T 1 flow, level Strainer S 1 flow Cooling water tank T 2 flow, level. Deviation The keyword combination being applied (e.g. no/flow). Cause Potential causes that would result in the deviation occurring (e.g. ‘strainer S 1 block- age due to impurities in dosing tank T 1 ’ might be a cause of flow/no). Consequence The consequences that would arise, both from the effect of the deviation (e.g. ‘loss of dosing results in incomplete precipitation in T 2 ’) and, if appropriate, from the cause itself (e.g. ‘cavitations in pump P 1 , with possible damage if prolonged’). The recording of consequences should be explicit. An important point to note, particularly for hazard and operability modelling (inclu ded later in this paragrap h), is that when assessing the consequences,credit for protective systems or instruments that are already included in the design should not be considered. 5.2 Theoretical Overview of Safety and Risk in Engineering Design 603 Safeguards Any existing protective devices that either prevent the cause or safeguardagainst the adverse consequences must be recorded. For example, the recording ‘local pressure gauge in discharge from pump might indicate problem was arising’ might be con- sidered. Safeguards need not be restricted to hardware but, where appropriate, credit can be taken for procedural aspects such as the use of a standard work instruction (SWI) and job safety instructions (JSI). Action Where a credible cause results in a negative consequence, it must be decided whether some action should be taken. It is at this stage that consequences and as- sociated safeguards are considered. If it is deemed that the protective measures are adequate, then no action need be taken, and words to that effect are recorded in the ‘action’ column. Actions fall into two groups: • Actions that remove the cause. • Actions that mitigate or eliminate the conseq uences. Whereas the former is to be preferred, it is not always possible, especially when dealing with equipment malfunction. However, removing the cause first should al- ways take preference and, only where necessary, the consequences mitigated. For example, to retu rn to the example cause ‘strainer S 1 blockage due to impurities etc.’, the problem might be approached in a number of specific remedial ways: • Ensure that impurities cannot get into T 1 , by fitting a strainer in the offloading line. Consider carefully whether a strainer is required in the suction to pump P 1 . Particulate matter might pass through the pumpwithout causing any damage, and it might be necessary to ensure that no such matter g ets into T 2 . If the strainer can be dispensed with altogether, the cause of the problem might be removed. • Fit a differential pressure gauge across the strainer, with perhaps a high alarm to give clear indication that a total blockage is im minent. • Fit a strainer, with a regular schedule of changeover and cleaning of the standby unit. Having gone through the steps involved in recording a single deviation, the tech- nique can now be inserted in the context of a qualitative hazard and operability computational model. Such a model is quite feasible, as the HazOp study method is an iterative process, applying in a structured and systematic way the relevant key- word (guideword-parameter) combinations in order to identify potential problems. The example serves to highlight several points of caution when formulating actions: Thus, it is not always advisable to automatically opt for an engineered solution, adding additional instrumentation, alarms, trips, etc. Due regard must be taken of the reliability of such devices, and their potential for spurious operation causing unnecessary downtime. In addition, the increased operational cost in terms of main- tenance, regular calibration, etc. should also be considered. It is not unknown for . action. 602 5 Safety and Risk i n Engineering Design Fig. 5.21 Example of part of a cooling w ater system In considering the information to be recorded in each of these columns, an example of part of the. Purge Open Inspect Clean Maintain Start-up Shutdown 5.2 Theoretical Overview of Safety and Risk in Engineering Design 601 item of equipment of ine for repairs because no secure means of isolation has been provided 2001). 596 5 Safety and Risk i n Engineering Design In general, risk of loss L of throughput capacity has two components, namely the probability of a hazard occurring, and the size of the loss in throughput