Handbook of Reliability, Availability, Maintainability and Safety in Engineering Design - Part 57 ppt

544 5 Safety and Risk i n Engineering Design Consequence Consequence Consequence No Condition Time delay Initiating event Fault tree Fault tree Yes No Condition Yes Fig. 5.3 Cause-consequence diagram man et al. 1994), and programmable user modelling applications (Blandford et al. 1999) have emerged to reconcile deficiencies in the tree-based analysis techniques. Furthermore, although the use of techniques are adequately suitable in designing for safety of process engineering designs, their use in designing for systems control is complicated by the large number of ways that computational control can address, or even contribute to, hazardous system states. This problem is solved by the use of a relatively new forward analysis technique called deviation analysis (Leveson 1995). Deviation analysis (DA) is based on the underlying assumption that m any accidents or incidents are the result of deviations in system variables, where a deviation is the difference between the actual and correct values appropriate for system control. The method originates from the forward analysis technique of software deviation analysis (SDA) in which hazardous behaviour in system control software is analysed. DA is an extensionof the technique to system control hardware. Deviation analysis determines whether hazardous systems behaviour can result from a class of input deviations inclusive in the broad range of process characteristics such as ca- pacity, input, throughput, output and quality. I t is a means of determining system component robustness (or, in safety terminology, its survivability), or how it will behave in an imperfect environment. Hazardous operability studies (HAZOP, short for hazard and operability), was first introduced by engineers from ICI Chemicals in the UK, in the 1970s. The method entails the investigation of deviations from the design intent for a process engineering installation by a design team with expertise in different areas such as engineering, operations, maintenance, safety and chemistry. The team is guided in a structured process, by using a set of guidewords to examine deviations from normal process conditions at various key points (nodes) throughout the process. The guidewords are applied to the relevant process parameters—for example, flow, tem- 5.2 Theoretical Overvie w of Safety and Risk in Engineering Design 545 perature, pressure, comp osition, etc.—in order to identify the causes and consequences of deviations. Typical terms used in a HAZOP are the following (Kletz 1999): • Node: a specific location in the process in which (the deviations of) the process intention are evaluated. • Intention: description of how the process is expected to behave at the node; this is qualitatively describe d as an activity (e.g. feed, reaction, sedimentation) and/or quantitatively in the process parameters, like temperature, flow rate, pressure, composition, etc. • Deviation: a way in which the process conditions may d epart from their intention. • Parameter: the relevant parameter for the condition(s) of the process; e.g. pressure, temperature, composition, etc. • Guideword: a short word to describe a deviation of the intention. The mostly used guidewords are NO, MORE, LESS, AS WELL AS, PART OF, OTHER THAN and REVERSE. In addition, guidewords like TOO EARLY, TOO LATE, INSTEAD OF, etc. are used, the latter mainly for batch-like processes. The guidewords are applied, in turn, to all parameters, in order to identify unex- pected and yet credible deviations from the intention. • Cause: the reason(s) why the deviation could occur. Many causes could be identified for one deviation. • Consequence: the results of the deviation, in case it occurs. Consequences may comprise both process hazards an d operability problems, like plant shutdown or quality decrease of the product. Many consequences can follow from one cause and, in turn, one consequence can have several causes. • Safeguard: facilities that help to reduce the occurrence frequency of the deviation or to mitigate its consequen ces. There are five types of safegu ards: a) Facilities th a t identify the deviation. These comprise, among others, alarm instrumentation and human operator detection. b) Facilities that compensate the deviation, e.g. an automatic con trol system that redu ces the f eed to a vessel in case of overfilling (increase of level). These usually are an integrated part of the process control. c) Facilities that avoid the deviation from occurring. d) Facilities that prevent deviation from escalating (e.g. trips). These facilities are often interlocked with several units in the process, and controlled by logical computers. e) Facilities that relieve the process from the hazardous deviation. These comprise, for instance, pressure safety valves (PSV) and vent systems. • Recommendation: activities identified during a HAZOP study for follow-up. These may comprise technical improvements in the design, modifications in the status of drawings and process descriptions, procedural measures to be developed or further in-depth studies to be carried out. 546 5 Safety and Risk in Engineering Design 5.2.1.1 Fault-Tree Analysis for Safety in Engineering Design The concept of fault-tree analysis (FTA) was o riginated by Bell Telephone Labora- tories in the 1960s as a technique to perform a safety evaluation of the Minutemen Intercontinental Ballistic Missile Launc h Control System. A fault tree is a logical diagram that shows the relation between system failure, i.e. a specific undesirable event in the system, and failures of the components of the system. It is a technique based on deductive logic. An undesirable event is first defined and causal relationships of the failures leading to that event are then identified. Fault trees can be used in qualitative or quantitative risk analysis. The difference between the two is that the qualitative fault tree is linguistic in structure and does not require use o f the same rigorous logic as does the formal quantitative fault tree (cf. Fig. 5.4). FTA is a deductive technique that focuses on a particular accident or failure, and provides a method for determining causes of that event. Fault-tree diagrams use logical operators, principally the OR and AND gates. The terminology is derived from electrical circuits, the term ‘gate’ referring to the control of a signal or electrical cur- rent. The term OR denotes a choice between two or more signals, either of which can ‘open’ the gate. The AND term refers to the requirement that both signals are necessary before there is an output from the gate. Figure 5.4 shows the logic and event symbols used in FTA. Fault-tree analysis for safety in engineering design is conducted in several steps, from defining the problem to constructing the fault tree, analysing the fault tree, and documenting the results, specifically: OR gate The output event occurs if any of the input events occur AND gate The output event occurs only when all input events occur Intermediate event A fault that results from the interactions of other fault events Basic event A component failure that requires no further development Undeveloped event A fault that is not examined further because information is unavailable Transfer IN/OUT symbols IN indicates the tree is developed further at a corresponding OUT symbol IN AND OR Undeveloped event Basic event Intermediate event Fig. 5.4 Logic and event symbols used in F TA 5.2 Theoretical Overvie w of Safety and Risk in Engineering Design 547 Step 1. Defining the Problem The engineering design team selects: • the top event, • the boundary conditions, • system physical bounds, • the level of systems resolution, • initial conditions, • events that are not allowed, • existing conditions, • conditional assumptions. Defining the top event is one of the most important aspects of the first step. The top event is the accident (or undesired event) that is the subject of the FTA. The top event is often identified through other hazard analysis studies (such as HAZID). Top events should b e precisely defined for the system or plant being evaluated, because analysing broadlyscoped or poorly defined top eventscan oftenleadto an inefficient analysis. For example, a top event of ‘gas leaks in the plant’ is too general. Instead, an appropriate top event would be ‘gas leak in the HC piping of the acid separation plant precipitation tank B’. The physical system boundaries encompass the system’s equipment, the equipment’s interfaces with other processes, and the utility/support systems that are to be included in the FTA. The design team should also specify the level of systems resolution for the fault-tree events. For example, a motor-operated valve can be included as a single item of equipment (i.e. component) or it can be described as several hardware items (i.e. parts, e.g. the valve body, valve internals, and motor operator). The systems resolution of the FT should be limited to the detail needed to satisfy the analysis objective, and should parallel the resolution of the available information. The initial equipment configuration or initial operating conditions describe the system in its normal, unfailed state. Events that are not allowed are, for the purposes of the FTA, events that are considered to be unlikely or that are not to be considered in the analysis, for some exclusive reason. For example, wiring failures may be excluded from the analysis of an instrument system, or cabling may be excluded from the analysis of power gen erating units. Existing conditions within which the system functions are estimates (and assumptions) of the possible operational conditions that may arise within the system and its equipment, either as a result of the system’s inherent complexity, or as a result of the complex integration of various systems. Step 2. Constructing the Fault Tree The FTA begins at the top event and proceeds, level by level, until all fault events have been traced to their basic contributing causes (i.e. basic events). At each level, 548 5 Safety and Risk in Engineering Design Fig. 5.5 Safety control of cooling w ater system the immediate, necessary and sufficient causes are defined that would result in the intermediate or top event under consideration. The analysis continues at each level, until basic causes or the analysis boundary conditions are reached. Returning to the simple fault tree of a cooling water system depicted in Fig. 3.19 of Sect. 3.2.2.6 dealing with fault-tree analysis in reliability assessment, assume that the systems design included provision for a back-up surge tank with an appropriate control alarm in the event the tank over-flowed,indicating problems with the cooling water feed. These problems would typically be: Excess inflow. Low surge outflow. Control alarm failure. Operator error. Figure 5.5 shows an example of the cooling water surge tank fault tree with two levels below the top event. Step 3. Analysing the Fault Tree The analysis ‘solves’ the fault tree by identifying combinations of failures that can lead to accidents. These are called minimal cut sets (MCS). The minimal cut sets for the example shown in Fig. 5.5 would be: 5.2 Theoretical Overvie w of Safety and Risk in Engineering Design 549 • ‘No surge control’ and ‘No alarm control’ • ‘Excess inflow’ and ‘Alarm failure’ • ‘Excess inflow’ and ‘Operator error’ • ‘Low surge outflow’ and ‘Alarm failure’ • ‘Low surge outflow’ and ‘Operator error’. If the states of each of the control valves (CV1 and CV2) are in failure mode (i.e. failed closed and failed open), then further low-level cut sets can be defined, and the fault tree needs to be modified (additional rectangular boxes above each CV circular box) to include the failed states: • ‘CV1 fails open’ and ‘Alarm failure ’ • ‘CV1 fails closed’ and ‘Alarm failure’ • ‘CV2 fails open’ and ‘Alarm failure ’ • ‘CV2 fails closed’ and ‘Alarm failure’. Failure probabilities can now be assigned. The probabilities that are allocated to the events can be combined to estimate the probability of the top event. The probability of two events, the one with probability p 1 and the other with probability p 2 , occurring together are: P(AND) = p 1 × p 2 (5.1) and q 1 and q 2 are the complements of p 1 and p 2 respectively: q 1 = 1− p 1 q 2 = 1− p 2 Then: q 1 is ‘NOT p 1 ’ and: q 2 is ‘NOT p 2 ’. The probability of event 1 not occurring is thus q 1 and the probab ility of event 2 not occurring is q 2 . Thus, for event 1 OR event 2 to occur, the probability of the combination that either does not occur—that is, that one of the two occurs—is given by the following expression: P(OR) = 1 −(q 1 ×q 2 ) (5.2) The concept of this expression can be clarified by the followingexample. In Fig. 5.5, the probabilities of the equipment failures in the circles are derived from expert judgement, and the activities in the rectangular boxes are calculated from frequen- cies further down the tree. The pro bability for no surge contro l is calculated as: P(OR) valves = 1−[(1−0.025) ×(1−0.025)] = 0.050 The pro bability for no alarm control is calculated as: P(OR) alarm = 1 −[(1 −0.025) ×(1−0.052)] = 0.075 550 5 Safety and Risk in Engineering Design The probability for the top event shown in the figure (tank overflow) is: P(AND) tank = 0.050×0.075 = 0.00375 Although the example is hypothetical, it closely resembles a real-world scenario in which it is interestin g to note that the safety alarm control system’s reliability is lower than that of the surge system it is meant to control! This is due to operator error where operator judgement is jeopardised by failure in the operator control panel (OCP)–which, in many processes, is often the case. The failure of an item o f equipment will result in its replacement, which reduces the failure frequency, and which then changes the risk probabilities all the way up the tree. The use of computer models is necessar y to maintain the fault- tree analysis up to date. It is common in large process plants, however, for the maintenance group not to communicate these improvements to the reliability engineers who continue to use outdated high-risk numbers. Similarly, experiences of ineffective operation will usually initiate improved training, so that operator e rrors are less frequent and the reliability of the whole system is improved. Step 4 . Documenting the Results The analysis should provide a description of the system, a discussion of the problem definition, a list of assumptions, the fault-tree model(s) that were developed, lists of minimal cut sets, and an evaluation of the significance of the MCSs and any recommendations that arise from the FTA. Probability evaluation of fault trees is considered in most technical papers and books about safety and hazard analysis. However, some approximation discrepan- cies are evident, especially in the basic theory of assigning probabilities to the fault- tree gates—specifically, the OR gate. The p robability expression for the statistically independent input events for the OR gate has been given as, (Dhillon 1983): P(OR) = P(a+b+ c+ etc.) (5.3) P(OR) = P(a)+P(b)+P(c)+ etc. a,b,c, etc. = input events . In the example of Fig. 5.5, this is equivalent to: P (OR) = p 3 + p 4 or p 5 + p 6 = 0.050 or 0.077 Considering the complements of p 1 and p 2 , namely q 1 and q 2 , results in: P (OR) = 1−(q 3 ×q 4 ) or 1−(q 5 ×q 6 ) = 0.049375 or 0.0757 5.2 Theoretical Overvie w of Safety and Risk in Engineering Design 551 5.2.1.2 Root Cause Analysis for Safety in Engineering Design Root cause analysis is predominantly a technique for determining the origin of causes of failure in engineered installations after completion of their design. How- ever, the approach can also be used to identify potential root causes of failure, particularly failures with critical safety consequences, during the engineering design process before systems manufacture, installation and/or construction. The funda- mental need for design engineers to consider how their designs operate in the field and, more importantly, how they fail is imperativeto successfully achieving integrity in engin eering design. This will ultimately result in engineering designs that satisfy both functional and integrityrequirements, using sound engineering judgement, rather than ‘crystal ball’ prediction techniques. Although there is a wealth of knowledge and data concerning systems performance of existing engineered installations, in general this is not utilised to the ex- tent that information may be obtained for use in new designs, especially in complex integrations of designs. To this end, more formal and systematic methods should be introduced during the engineering design process. Although specific methods and tools are available to facilitate designing for reliability, for example, their use is often limited to reliability engineers, with the design engineers of other disciplines frequently adopting an intuitive approach to considering reliability in design. As the d esign p ro cess becomes increasingly sophisticated with higher-level design tasks of complex integrations of similarly complex systems, it has become essential that design engineers formally investigate the integrity of these designs, particularly at each interface of the integrated systems. Examining and understanding the root cause of failure of a design’s functional operatio n can aid in design ing for safety and designing-out unreliability. In select- ing equipment from an existing design to meet a new requirement within different systems integration, it is important that design engineers look beyond the standard reliability metric of the existing d esign, and review in particular the root causes of failure and significant factors affecting the equipment’s reliability and safety. In the past, there has been an over-reliance on the use of prediction methods. For example, the original reliability prediction handbook of the USA Department of Defence (DoD), MIL-HDBK-217, contained failure rate models for the various part types used in electronic systems, and concentrated mainly on the use of prediction methods that did not provide engineers with any knowledge of what might fail in service (MIL-HDBK-217F 1998). A methodology aimed at integrating reliability enhancement p r actices into the engineering d esign p rocess h as been developed as part of a UK government and aerospace industry initiative. As a result, the Reliability Enhancement Methodol- ogy and Modelling ( REMM) project was funded in part by the UK’s Department of Trade and Industry through the Civil Aviation Research and Development program and by industrial partners involved (Marshall et al. 1998). The main objectives of the project are to develop a methodology that supports reliability enhancement in engineering design and to develop a model that facilitates reliability assessment throughout a system’s life cycle. REMM is primarily used within the aerospace 552 5 Safety and Risk in Engineering Design environment but the methodology and model developed are equally applicable to other high-reliab ility system designs, such as in pro cess, chemical and mechanical engineeringdesignprojects.A numberofsimplepracticalanalyses for use by design engineers, during the early stages of systems realisation, have been developed as part of the REMM methodology. These analyses are aimed at improving high-level decision-making using simple graphical representations of reliability data, such as analyses of root causes, trends, and manufacturing d ata. These graphical representation analyses include: • Root cause analysis and classification of events into high-level failure categories, providing the means to determine those factors that have most effect on the system’s service reliability and, hence, which elements should be tackled as a prior- ity. • Root cause and trend data across specific criteria such as equipmenttype, periods of time (e.g. particular manufacturing time-line points), application or use, providing further understanding of the nature of the failure that may be characteristic of the environment in which it is operating. • Manufacturing data analysis, providing valuable insight into the factors that af- fect service reliability. Correlation between manufacturing methods and service requirements can often illuminate small changes in design and manufacturing process that r esult in sig nificant effects on service reliability. Root cause analysis also utilises the deductive logic tree approach, similar to fault- tree analysis (FTA), in establishing the root causes o f functional failure or of a system state. Such an approachto problem solving is particularly usefulfordetermining safety in engineering designs. The approach of establishing the root causes of functional failure in systems design is intended to achieve the following: • To organise and control design integrity problem identification. • Provide a visual checklist to ensure all pertinent areas are covered. • Allow for a standard ised approach to saf ety pro blem identification. • Serve as a documented guide for design integrity problem reviews. The most common root cause analysis methods cover topics from events and causal factor analysis to change analysis, barrier analysis, management oversight and risk assessment, human performance evaluation, standard problem solving and basic decision-making. These methods are considered in the common root cause analysis approach developed by the Office of Nuclear Energy, US Department of Energy in their DOE guideline DOE-NE-STD-1004-92, and ‘Root cause analysis: guidance document’ (DOE-NE-STD-1004-92. 1992). 5.2 Theoretical Overvie w of Safety and Risk in Engineering Design 553 Common Root Cause Analysis Methods • Events and causal factor analysis identifies the time sequence of a series of tasks and/or actions and the surrounding conditions that can lead to a failure occurrence. The results are displayed in an events and causal factor chart that gives a picture of the relationships of the events and causal factors. • Change analysis is used when the problem is obscure. It is a systematic process that is generally used for a single failure occurrence and focuses on elements that change. • Barrier analysis is a systematic process that can be used to identify physical, and procedural barriers or controls that should prevent the occurrence of failure. • Management oversight and risk tree (MORT) analysis is used to identify inad- equacies in barriers/controls, specific barrier and support functions, as well as management functions. It identifies specific factors relating to a possible failure occurrence and identifies factors that permit these factors to exist. • Human performance evaluation identifies those factors that influence task performance. The focus o f this analysis method is on opera bility, work environment, and m anagement factors, as well as man-machine interface studies to improve performance. • Problem solving and decision-making provides a systematic framework for gath- ering, organising and evaluating information, and applies to all phases of a possible failure occurrence investigation (Kepner et al. 1981). By organising problem analysis results in an orderly manner as the design pro- gresses, the time spent to find the root causes of possible problems is minimised. The method consists of using factor trees to guide the course of the analysis. Factor trees diagrammatically present the major areas to be considered in the various stages of an engineering design project, such as: • Systems and equipment design. • Manufacturing and installation. • Process start-up and ramp-up. • Operations and maintenance. To conduct a root cause analysis specifically in the systems and equipment design stage, a series of charts can be developed representing those functional areas to be investigated, and the various factors to be considered when investigating the functional areas for causes of potential failure problems. These root cause factors for the systems and equipment design area include the following: • Origin of design criteria. • Utility inputs prior to design. • Equipment specifications. • Constraints on the design. • Actual design solution and test. . integrity of these designs, particularly at each interface of the integrated systems. Examining and understanding the root cause of failure of a design s functional operatio n can aid in design ing. 0.049375 or 0.0 757 5.2 Theoretical Overvie w of Safety and Risk in Engineering Design 551 5.2.1.2 Root Cause Analysis for Safety in Engineering Design Root cause analysis is predominantly a technique. considered in the various stages of an engineering design project, such as: • Systems and equipment design. • Manufacturing and installation. • Process start-up and ramp-up. • Operations and maintenance. To

Định dạng
Số trang	10
Dung lượng	98,23 KB