Handbook of Reliability, Availability, Maintainability and Safety in Engineering Design - Part 68 ppsx

654 5 Safety and Risk i n Engineering Design The actual degree of safety—incidents: This is evaluated according to the contri- bution of the actual physical condition of the equipment to its safety, the actual downtime frequency,aswellastheactual reportable incident frequency,arising from the functional failure history of the equipment resulting in an asset loss consequence o f failure. Besides safety operational and physical consequences of failure, the other consequences (economic, environmental, systems and maintenance) are typically measured as the cost of losses plus the cost of repair to the failed item and to any consequential damage (although, in reality, all safety consequences are eventually also measured as a cost risk). These cost risks of failure are also defined as the result of multiplying the consequence of failure (i.e. the cost of losses plus the cost of repair), by the probability of its occurr ence. Reliability analysis in engineering design tends, h owever, to simplify these risks to the point of impracticality where, for example, consideration is given only to sin- gle modes of failure, or only to random failure occurrences, or to maintenance that results in complete renewal and ‘as new’ conditions. In reality, the situation is much more complicated with interacting multiple failure modes, variable failure rates, as well as maintenance-induced failures that influence the rates of deterioration, and subsequent failure (Woodhouse 1999). It is somewhat unrealistic to assume a specific failure rate of equipment within a complex integration of systems with complex failure processes. At best, the intrin- sic failure characteristics of components of equipment are determined from quantitative probability distributions of failure data obtained in a somewhat clinical en- vironment under certain operating conditions. The true failure process, however, is subject to many other factors, including premature or delayed preventive maintenance activities conducted during shutdowns of process plant. It is generally accepted that shutdowns affect the failure characteristics of equipment as a whole, although it is debatable whether the end result is positive or nega- tive from a residual life point of view, where residual life is defined as the remaining life expectancy of a component, given its survival to a specific age. This is a concept of obvious interest, and one of the most important notions in process reliability and equipment aging studies for safety criticality analysis. Safety criticality analysis is thus always faced with combinations of interacting failure modes and variable failure r a tes, where the cumulative effects are much more important than estimates of specific probabilities of failure. Qualitative estimates of how long equipment might last in certain engineering processes, based on operating conditions and failure characteristics, are much more easily made than quantitative estimates of the chancesoffailure ofindividual equipment.These cumulative effects are represented in equipment survival curves where a best-fit curve is matched to specific survival data, and a pattern of risks calculated that would be necessary for these effects to be realised. In analysing survival data, there is often the need to determine not only the survival time distribution but also the residual survival time (or residual life) distribution. A typical equipment survival c urve and hazard curve are illustrated in Fig. 5.41a and 5.41b (Smith et al. 2000). Typical impact, risk exposure,lost performance, and direct cost patterns based on shutdown maintenance intervals for rotating equipment, as well as risk-based main- 5.2 Theoretical Overview of Safety and Risk in Engineering Design 655 Fig. 5.41 a Kaplan–Meier survi val curve for rotating equipment, b estimated hazard curve for rotating equipment tenance patterns based on shutdown maintenance intervals for rotating equipment are illustrated in Fig. 5.42a and 5.42b (APT Maintenance 1999). b) Risk-Based Maintenance Risk-based maintenance is fundamentally an evaluation of maintenance tasks, particularly scheduled preventive maintenan ce activities in shutdown programs. It con- siders the impact of bringing forwards, or d elaying, activities that are directed at preventing cost risks to coincide with essential activities that address safety risk s.If the extent of these risks were known, and what they cost, the optimum amount of risk to take, and planned costs to incur, could be calculated. Similarly, better decisions could be made if the value of the benefits of improvedperformance, longerlife and greater reliability was known. These risks and benefits are, however, difficult to quantify, and many of the factors are indeterminable. Cost/risk optimisation in this 656 5 Safety and Risk in Engineering Design Fig. 5.42 a Risk exposure pattern for rotating equipment, b risk-based maintenance patterns for rotating equipment context can thus be defined as the minimal total impact, and represents a trade-off between the conflictin g interests of the need to reduce costs at the same time as th e need to reduce the risks of failure. Both are measured in terms of cost, the former being the planned downtime cost plus the cost of preventive maintenance in an at- tempt to increase perf ormance and reliability, and the latter being the cost of losses due to forced shutdowns plus the cost of repair and consequential damage. The total impact is the sum of the planned costs and failure costs. When this sum is at a minimum, an optimal combination of the costs incurred and the failure risks is reached, as illustrated in Fig. 5.43. Cost/risk trade-off decisions determine optimal preventive maintenance intervals for plant shutdown strategies that consider component renewal o r replacement criteria, spares requirements planning, etc. Planned downtime costs plus the costs of preventive maintenanceare traded-off against the risk consequences of premature or deferred component renewals or replacements, measured as the cost of losses plus 5.2 Theoretical Overview of Safety and Risk in Engineering Design 657 Fig. 5.43 Typical cost optimisation curve the cost of repair. In each of these areas, cost/risk evaluation techniques are applied to assist in the application of a safety-critical maintenance approach. Component renewal/replacement criteria are directly determined by failure modes and effects criticality analysis (FMECA), whereby appropriate maintenance tasks are matched to failure modes. In applying FMECA, the criticality analysis establishes a priority rating of components according to the consequences and mea- sures of their various failure modes, which helps to prioritise the preventive maintenance activities for scheduled shu tdowns. An example of an FMECA for process criticality of a control valve, based on failure consequences (downtime) and failure rate (1/MTBF), is given in Table 5.16. Reliability, availability, maintainability and safety (RAMS) studies establish the most effective combination of the different types of maintenance (i.e. a maintenance strategy) for operational systems and equipment. The deliverable results are opera- tions and maintenance procedures and work instructions in which the different types of maintenance are effectively combined for specific equipment. Failure modes and effects criticality analysis (FMECA), as given in Table 5.16, is one of the most commonly used techniques for prioritising failures in equipment. The analysis at systems level involves identifying potential equipment failure modes and assessing the consequences of these for the system’s performance. Table 5.17 shows the designation of maintenance activities, the a ppropriatemain- tenance trade, and the recommended maintenance frequency for each failure mode, based on MTBF. It is evident that some activities need to be delayed to coinc ide with others. Different types and levels of maintenance effort are applied, depending upon the process or functional criticality (Woodhouse 1999): • Quantitative risk and performance analysis (such as RAM and FMECA) is war- ranted for about 5–10% of the most critical failure modes. This is where cost/risk optimisation is applicable for significant costs or risks that are sensitive to high- impact strategies. 658 5 Safety and Risk in Engineering Design Table 5.16 Typical FMECA for process criticality Component Failure description Failure mode Failure consequences Failure causes D/T (h) (plus damage) MTTR (h) (repair time) and damage MTBF (months) Process criticality rating Control valve Fails to open TLF Production Solenoid valve fails, failed cylinder actuator or air receiv er failure 9812Medium critical Control valve Fails to open TLF Production No PLC output due to modules electronic f ault or cabling 4 2 6 Medium critical Control valve Fails to seal/close TLF Production Valve disk damaged due to corrosion wear (same ‘ fails to open’) 5 4 6 Medium critical Control valve Fails to seal/close TLF Production Valve stem cylinders seized due to chemical deposition or corrosion 5 4 4 Medium critical Instrument loop (press. 1) Fails to provide accurate pressure indication TLF Maint. Restricted sensing port due to blockage of chemical or physical accumulation 013Low critical Instrument loop (press. 2) Fails to detect low pressure condition TLF Maint. Low pressure switch fails due to corrosion or mechanical damage 023Low critical Instrument loop (press. 2) Fails to detect low pressure condition TLF Maint. Pressure switch relay or cabling failure 084Low critical Instrument loop (press. 2) Fails to provide output signal for alarm TLF Maint. PLC alarm function or indicator fails 084Low critical 5.2 Theoretical Overview of Safety and Risk in Engineering Design 659 Table 5.17 FMECA with preventive maintenance activities Component Failure description Failure causes D/T (h) (plus damage) MTTR (h) (repair time) and damage MTBF (months) Maintenance activity Maintenance trade Maintenance frequency Control valve Fails to open Solenoid valve fails, failed cylinder actuatororair receiv er failure 9 8 12 Service control valve. Replace components and test PLC interface Instr. tech. 12 monthly Control valve Fails to open No PLC output due to modules electronic fault or cabling 4 2 6 Covered by control valv e service as above Instr. tech. 12 monthly Control valve Fails to seal/close Valve d isk d amaged due to corrosion wear (same causes as ‘f ails to open’) 5 4 6 Remove control valv e and check valv e stem, seat and disk or diaphragm for deterioration or corrosion and replace with overhauled valv e if required Fitter 6 monthly Control valve Fails to seal/close Valve stem cylinders seized due to chemical deposition or corrosion 5 4 4 Covered by control valve condition assessment and replace components Instr. tech. 6 monthly 660 5 Safety and Risk in Engineering Design Table 5.17 (continued) Component Failure description Failure causes D/T (h) (plus damage) MTTR (h) (repair time) and damage MTBF (months) Maintenance activity Maintenance trade Maintenance frequency Instrument loop (press. 1) Fails to provide accurate pressure indication Restricted sensing port due to blockage of chemical or physical accumulation 0 1 3 Remove pressure gauge and check for blocked sensing lines and gauge deterioration. Replace with new gauge if required Instr. tech. 3 monthly Instrument loop (press. 2) Fails to detect low pressure condition Low p ressure switch fails due to corrosion or mechanical damage 0 2 3 Verify correct operation of pressure switch and wiring. Test alarm’s operation Instr. tech. 3 monthly Instrument loop (press. 2) Fails to detect low pressure condition Pressure switch relay or cabling failure 0 8 4 Covered by switch operation verification Instr. tech. 3 monthly Instrument loop (press. 2) Fails to provide output signal for alarm PLC alarm function or indicator f ails 0 8 4 Covered by switch operation verification Instr. tech. 3 monthly 5.2 Theoretical Overview of Safety and Risk in Engineering Design 661 • Rule-based analysis methods (such as RCM and RBI) are more appropriate for about 40–60% of the critical failure modes, particularly if supplemented with economic analysis of the resulting impact strategies. This is where cost/risk optimisation is applicable for the costs or risks for setting preventive maintenance intervals. • Review of existing maintenance (excluding simple FMEA studies) provides a simple check at the lower levels of criticality to verify that there is a valid reason for the maintenance activity, and that the cost is reasonable compared to the consequences. c) Safety Criticality Analysis and Risk-Based Maintenance Safety criticality analysis was previously considered as the assessment of failure risks. In this context, safety criticality analysis is applied to determine the essential maintenance intervals, and the impact of premature or delayed preventive maintenance activities where failure risks are considered to be safety critical. A safety/risk scale is applied, based on a specific cost benchmark (usually computed as the cost of output per time interval) related to the cost of losses and the likelihood of failure. A safety criticality model to determine the optimal main tenance interval, and the impact of premature or delayed preventive maintenance activities consider s the following: • A quantified description of the degradation process, using estimates wherever data ar e not available, as well as identification of failure mo des and related causes. • Cost calculations for material and maintenance labour costs for each failure mode, including possible consequential damage. • Cost/risk calculations for alternative preventive maintenance intervals based on a specific cost benchmark related to the cost of losses and the likelihood of failure. • Cost criticality rating of failure modes, and sensitivity testing to the limits of the likelihood of failure under uncertainty of unavailable or censored data. • Identification of key decision drivers (which assumptions have the greatest effect upon the optimal decision), for review of the preventive maintenance program. In many cases, there are several interacting failure modes, causes and effects, all in the same evaluation. The preventive maintenance program or, in the case of continuous processes, the shutdown strategy thus becomes a compromise of scheduled times and costs. Some activities will be performed ahead of their ideal timing, whilst others will be delay ed to share the downtime opportunity determined by safety-critical shuts. The r isks and perfor mance impact of delayed activities, and the additional costs of deliberate over-maintenance in others, both contribute to the costs for a partic- ular shutdown program. The degree of advantage, on the other hand, is controlled 662 5 Safety and Risk in Engineering Design by the costs involved. The downtime impact (the cost of losses due to forced shutdowns as a result of failure, plus the cost of repair to the failed item and to any consequential damage) often dominates the direct cost advantage (planned shutdown lost opportunity costs, use of facilities, materials and labour costs, etc.) of shutting down and starting up again. Such a cost criticality analysis also reveals the scope for de-bottlenecking improperly evaluated reliability constraints by eliminat- ing frequent interim shutdowns and extending operational run lengths. The analysis process is also able to calculate the net payback f or such de-bottlenecking. The grouping and re-grouping of activities as well as re-programming the preventive maintenance program (i.e. combining activities in different bundles and moving the bundles to shorter or longer intervals) are fundamentally a scheduling problem, re- quiring the application of formalised risk analysis and d ecision criteria based on assessment scales, and the use of computer automated computation. Table 5.18 shows the application of cost criticality analysis to the FMECA for process criticality of the control valve given in Table 5.17. It indicates the cost criticality rating of each failure mode related to the cost of losses and the cost risk based on estimates of the likelihood of failure. Table 5.19 shows a comparison between the process criticality rating and the cost criticality rating of each failure mode of the control valve. In this case, the ratings correspond closely with one another. The maintenance freque ncies of the preventive maintenance activities that were typically based on the mean time between failures (MTBF) are, however, not rela- tive to either the process criticality rating or the cost criticality rating . The maintenance frequencies thus require review to determine the optimal maintenance intervals wher eby the impact of premature or delayed preventive maintenance activities is considered. This example of a r e latively important item of equipment, such as a process control valve, is typical of many such equipment in p rocess plant where RAM, FMECA or RCM analysisdo notprovidesufficient information for decisive decision-making, as the equipment’s failure modes are not significantly high risk but rather medium risk. Where the criticality ratings are not significant (i.e. eviden ce of high criticality), as in this case of the control valve, maintenance optimisation becomes difficult, necessitating a review of the risk analysis and decision criteria according to qualitative estimates. d) Risk Analysis and Decision Criteria In typical process plant shutdown programs, decisions concerning the extent and timing of component renewal/replacement activities are generally determined by the dominant failure modes that, in effect, relate to less than a third of the program’s total preventive maintenance activities. Criticality ranking or prioritising of equ ip- ment according to the consequences of failure modes is essential for a risk-based maintenance approach, though comparative studies have shown that qualitative risk ranking is, in many cases, just as effective in identifying the key shutdown drivers, often at a fraction of the cost. Typically, these risks can be ranked by designating 5.2 Theoretical Overview of Safety and Risk in Engineering Design 663 Table 5.18 FMECA for cost criticality Component Failure description Failure mode Failure causes Defect. MATL & LAB ($)/failure (incl. damage) Econ. $/failure (prod. loss) Total $/failure (prod. and repair) Risk Cost criticality rating Control valve Fails to open TLF Solenoid valve fails, failed cylinder actuator or air receiver failure $5,000 $68,850 $73,850 6.00 Medium cost Control valve Fails to open TLF No PLC output due to modules electronic fault or cabling $2,000 $30,600 $32,600 6.00 Medium cost Control valve Fails to seal/close TLF Valve disk damaged due to corrosion wear (same causes as ‘ fails to open’) $5,000 $38,250 $43,250 6.00 Medium cost Control valve Fails to seal/close TLF Valve stem cylinders seized due to chemical deposition or corrosion $5,000 $38,250 $43,250 6.00 Medium cost Instrument loop (press. 1) Fails to provide accurate pressure indication TLF Restricted sensing port due to blockage of chemical or physical accumulation $500 $0 $500 2.00 Low cost . 654 5 Safety and Risk i n Engineering Design The actual degree of safety incidents: This is evaluated according to the contri- bution of the actual physical condition of the equipment to its safety, . to quantify, and many of the factors are indeterminable. Cost/risk optimisation in this 656 5 Safety and Risk in Engineering Design Fig. 5.42 a Risk exposure pattern for rotating equipment, b risk-based. effective in identifying the key shutdown drivers, often at a fraction of the cost. Typically, these risks can be ranked by designating 5.2 Theoretical Overview of Safety and Risk in Engineering Design

Định dạng
Số trang	10
Dung lượng	152,62 KB