Handbook of Reliability, Availability, Maintainability and Safety in Engineering Design - Part 42 doc

4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 393 off-system repair, and logistics capability, to maintain the engineered installation. However, because of the peculiar nature of these parameters, none are considered in this research, although it is essential to have a complete set of on-system and off-system indices to adequately assess system maintainability and the total maintenance burden. b) Diagnostic Systems and Built-In Testing One aspect of maintainability that has received significant attention in recent system designs is the use of automatic diagnostic systems. These systems include both internal or integrated diagnostic systems, referred to as built-in-test (BIT) or built- in-test-equipment (BITE), and external diagnostic systems, referred to as automatic test equipment (ATE), or offline test equipment. The following concepts focus on BIT but apply equally to other diagnostic systems. Need for automatic diagnostic systems—BIT As technology advances continue to increase the capability and complexity of modern engineer ing processes, particularly in space and military systems, more reliance is being placed on th e use of automatic diagnostics as a means of attaining the required level of failure detection capability. The need for BIT is driven by oper ational availability requirements, which cannot allow for lengthy MTTRs associated with detecting and isolating failure modes in engineering designs, especially in microcircuit technology equipment. Because BIT is applied within a system’s function, and at the same functioning speed, it affords the capability to de tect and isolate failures that conventional test equipment and techniques cannot provide. A well-designed BIT system can sub- stantially reduce the need for trained field-level maintenance personnel by permit- ting less skilled personnel to locate failures and channel suspect equipment to cen- tralised workshop repair facilities that are equipped to repair defective equipment. However,BIT is not a comprehensivesolution to all system maintenance requirements but, rather, a necessary tool for maintaining complex integrated systems. Specifying BIT performance One of the more difficult tasks inherent in the design and development of process engineering systems is the development of realistic and meaningful operational requirements and their subsequent conversion into un- derstandable and achievable contractual specifications. This is equally applicable to BIT, particularly with respect to typical performance measures or figures-of-merit that are used to specify BIT performance. Typical BIT performance measures, or figures-of-merit • Percent detection—the percent of all faults or failures that the BIT system must detect. • Percent isolation—the percent of detected faults or failures that the system must isolate to a specified assembly level. • Automatic fault isolation capability (AFIC)—the percent detection multiplied by the percent isolation. 394 4 Availability and Maintainability in Engineering Design • Percent of false alarms—the minimum tolerable percent of indicated faults where, in fact, no failure is found to exist. For each of the above parameters, there is a considerable span of interpretation. For example, does the percent detection refer to failure modes or to the percentage of all failures that could potentially occur? Furthermore, does the detection capability apply across the failure spectrum, i.e. mechanical systems, instrumentation, connections and software, or is its diagnostic capability applicable only to certain hardware such as electronic systems? Also, to what sy stems hierarchy level will the BIT system isolate failures? Early BIT systems were designed to isolate faults at component level. This resulted in BIT systems being as complex as, and frequently less reliable than, the basic system. The current trend is to isolate faults to the sub-system or assembly level based on the BIT system’s ability to detect abnormal output signal patterns. Large industry workshop maintenance facilities frequently apply external diagnostic equipment to isolate to the component or part level. A major engineering design issue (as well as contractual issue) relates to the definition of failure. Should BIT performance be viewed in terms of only BIT ad- dressable failures, which normally exclude system interface components such as exchangers, crossover ducts, pipelines, connectors, cables, etc., and which are usually the failure critical components in complex integrated systems? An important consideration thus relates to exactly what failures BIT can detect. Often, BIT systems operate ineffectively if 80% of detectable failures occur infreq uently while the remaining 20% occur with predictable regularity.It therefore becomes important to specify BIT performance measures in relation to overall system availability requirements. The percent of false alarms is a difficult parameter to specify or to measure ac- curately because initial fault detection followed by analysis indicating that no fault exists can signify different possible occurrences, such as: • The BIT system erroneously detected a fault. • An intermittent out-of-tolerance condition exists. • A failure exists but cannot be readily reproduced in a maintenance environment. From a logistic viewpoint, false alarms can often lead to false removals creating unnecessary demands on supply and maintenance systems. A potentially greater concern is the fact that false alarms and removals may create a lack of confidence in the BIT system to the point where maintenance or operations personnel may ignore certain fault d etection indications. Under these conditions, the BIT system in particular and the maintenance concept in general can neither mature nor provide the support required to meet design requirements. The specification of BIT performance must therefore be tailored to the type of system being designed, as well as to the system design c riteria. Designing for maintainability must include a comprehensive definition of BIT capability based upon the figures-of-merit presented above. 4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 395 Characteristics external to BIT There are two important considerations, external to BIT, which must be addressed in the concept of BIT and diagnostics in designing for maintainability. Initially, reliable per formance of the designed system determines, to a large extent, the criticality of BIT performance. If the basic system is designed to be very reliable (in the region of 0.995 and 0.999), a shortfall in the BIT performance may have limited impact on the system’s operational utility. Moreover, it is obvious that generally all system faults that can be corrected through maintenance action must initially be detected and isolated. Therefore, design for maintainability requirements such as maintenance methods, tools, manuals, test equipment and personnel required to detect and isolate non- BIT detectable faults can be a major consideration in the detail design phase of engineered installations. BIT is inherently an aspect of design for maintainab ility. The following example illustrates the impact of BIT on the overall maintenance effort. It further attempts to illustrate the effect of external factors on BIT performance (DoD 3235.1-H 1982). Description: a r adar installation is composedof five line replaceableunits (LRUs) with the following BIT and system performance characteristics: System: Five (5) LRUs MTTR (w/BIT): 2h (includesfailures that have been both d etected and isolated) MTTR (no/BIT): 5 h (includes failures that have been detected but not isolated) MTBF: 50 operational hours Period of interest: 2,500 operational hours BIT specified: percent detection = 90% percent isolation = 90% (to the LRU level) false alarm rate = 5% (of all BIT indications) In this example of a sophisticated military engineered installation, a relatively high- capability BIT system has been specified, where industrial installations with BIT would be less rigorously specified. Upon cursory examination, this extensive BIT coverage would appear to require minimal additional maintenance. The problem is to determine what total corrective maintenance time would be required for 2,500 operating hours. Thus: • How many total failures could be expected? 2,500 total hours at 50 MTBF = 50 failures • How many of these failures (on average) will BIT detect? 50 failures × 90% = 45 BIT detected failures • How many detected failures on average will be isolated to an LRU? 45 detected failures × 90% isolation = 40 failures • What is the automatic fault isolation capability (AFIC)? % detection × % isolation (LRU) = AFIC 0.9 × 0.9 = 0.81 = 81% • How many false alarm indications are expected to occur during the 2,500 operational hours? Total BIT indications (I BIT ) = true failure detections + false alarms 396 4 Availability and Maintainability in Engineering Design I BIT = (BIT detection rate) × (total failures) + (false alarm rate) × (total BIT indications) I BIT = (0.90) × (50) + (0.05) × (I BIT ) (1−0.05) I BIT = 45 I BIT = 47.36 and: False alarms = total BIT indications − rue indications False alarms = 47.36 − 45 = 2.36 ≈ 2 With this information, the total corrective maintenance time can now be calculated (DoD 3235.1-H 1982): • What is the total corrective maintenance time (on average) required to repair the detected/isolated failures? TC (w/BIT) = 40 failures × 2 h (MTTR w/BIT) = 80 h • What is the total corrective maintenance time (on average) required to repair the remaining no/BIT detected/isolated failures? TC (no/BIT) = 10 failures × 5 h (MTTR no/BIT) = 50 h • If it is assumed that no/BIT maintenance time is required to sort out false alarm indications, what total no/BIT corrective maintenance time is required f or the 2,500 flying hour period? TC (no/BIT) = no/BIT repair time + false alarm maintenance time = (10) × (5) + (2) × (5) = 60 h • What is the total corrective maintenance time TC (total) anticipated during the 2,500 hours? TC (total) = BIT maintenance + no/BIT maintenance = 80 + 60 = 140 h Thus, even with a relatively high AFIC o f 81%, the no/BIT-oriented corrective maintenance represents 43% of the total anticipated corrective maintenance hours. Furthermore, the impact of scheduled/preventive maintenance has not been considered. This additional maintenance is generally not associated with BIT. The information presented in th is example is greatly simplified in that it is assumed that the BIT AFIC (% detection × % isolation) will be 81%. If the AFIC is 81%, then 57% of the maintenance effort will be oriented towards BIT detected/isolated failures. If the true AFIC is found to be lower, it will be necessary to re-evaluate the overall effectiveness of the maintenance strategy and logistics pro- gram, as well as total system effectiveness (DoD 3235.1-H 1982). c) Basic System and BIT Concurrent Design and Evaluation Considerations In designing for maintainability, the difficulty involved in the design and evaluation of BIT that must perform in accordance with specific basic system specifications 4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 397 and design criteria is a problem of concurrent design. The development and evaluation of BIT and fault diagnostics has traditionally followedbasic system engineering design. The argument usually presented is that the basic system has to be designed and evaluated before determining what the BIT is intended to test. This argument has some basis, in fact, but there are significant drawbacks associated with lengthy design schedule differentials between the system’s design and BIT design and testing. For example, design considerations relating to a systems breakdown structuring (SBS), such as partitioning and sub-system/assembly/component configuration, determine to a large extent the required BIT design. BIT design is also driven by the essential prediction of various system failure modes in an FMEA, which BIT is expected to address. Consequently, the two design efforts cannot be conducted in isolation from one another, and must therefore be concurrent. Determination of basic system failure modes and f requency of occurrence The design of BIT is based upon two assumptions regarding the integrity of the basic engineering design: first, accurate identification of failure modes and effects (FMEA) and, second, correct estimation of the frequency of occurrence of the failure modes. If either of these assumptions is proven incorrect by test or operational experience, the resultant BIT performance is likely to be inadequate or, at least, less effective than anticipated. The following two situations, based on the previous example, will illustrate the impact of FMEA and of the frequen cy of occurrence of the failure modes on a maintenance strategy (i.e. preventive versus corrective maintenance): Situation 1: An unforeseen failure mode is observed in the radar installation every 250 operational hours. What impact does this have on the no/BIT maintenance? New failures = 2,500h×1 failure per 250h = 10 failures (new) TC (no/BIT) new = 10×5 hours/failure = 50h Thus, total maintenance hours will be: TC (total) = 80+ 60+ 50 = 190h Total no/BIT maintenance will be: TC (no/BIT) total = 60+50 = 110h TC (no/BIT) total represents 58% of total maintenance. 398 4 Availability and Maintainability in Engineering Design For the BIT detected/isolated maintenance: TC (w/BIT) = 80h = 42% of total (190h) TC (w/BIT) represents 42% of total maintenance. It is evident that the discovery of one unforeseen, no/BIT detectable failure has a relatively significant impact on the comparable magnitude of the two maintenance percentages. Previous estimate: TC (w/BIT) = 57% TC (no/BIT) = 43% Current estimate: TC (w/BIT) = 42% = 26% decrease TC (no/BIT) = 58% = 35% increase Situation 2: One of the original BIT detectable failures is predicted to have a very low frequency of occurrence. BIT detection for this failure was considered unnecessary, and was therefore not included in the original BIT design to detect 90% of the failures. It is now found that the failure occurs five times as often as expected. This is a realistic situation, and one that directly impacts upon the no/BIT maintenance hours. d) Evaluation of BIT Systems The test and evaluation of BIT systems and the prediction of BIT performance present some controversy. BIT systems are hardware and software logic networks designed to detect the presence of an unwanted signal, or the absence of a desired signal, each representing a failure mode. Each failure mode is detected by a specific logic-network. While the same network may be d esigned to detect a specific failure in several components, there is no assurance that the logic is correct until verified by testing. It is possible to validate BIT performance using statistical techniques, assuming a sufficiently large, representative sample of failures is available. Unlike typical reliability evaluation, though, which h as been established over the past five decades, BIT testing and BIT system design represent less established technologies and are only r ecently beginning to receive increased attention. This limited attention has resulted in the lack of gather ing an adequate representative database needed to support accurate and defendable estimates of BIT performance. A certain lack of confidence in BIT performance evaluation has therefore resulted because of these circumstances. Since it is not economically feasible to wait for an engineering 4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 399 system to experience random failures, failures are induced through synthetic fault insertion. These faults are generally selected from a list of possible faults, all of which are presumed to be detectable. The faults are synthetically inserted, and BIT detects and isolates, for example, 93% of these. This does not mean that the BIT system is a 93% AFIC BIT system, because the data are not a representative random sample of the entire failure population and, therefore, cannot be used to make statistically valid predictions of future performance. While synthetic fault in sertion has certain recognised limitations in predicting operational BIT performance, it is a valuable methodology in designing for maintainability during the preliminary and detail engineering design phases. Also, fault insertion can be used to simulate random failures that may occur but cannot be detected. These include effects of poor operation or maintenance. Because of the lack of adequately established BIT technologies, requiring use of fault insertion, there are normally insufficient data available to support accurate estimations of BIT performance. It generally requires several years of operational exposure to develop an adequate database to support a BIT performance analysis. Current trends support early reliability testing during design and development, to facilitate identification of failure modes and timely incorporation of design im- provements. These pilot tests provide a database to support preliminary estimates of system reliability. Wh at is most frequently overlooked is that these data, after minimal screening, could also be used to monitor, verify and upgrade BIT performance, to support preliminary estimates of system maintainability—assuming that the BIT system is functional at the appropriate stage in the basic system’s design and development. This action requires a disciplined approach towards the use of BIT in failure detection early in the system’s life cycle that has not been prevalent in previous engineering design projects (DoD 3235.1-H 1982). In summary,there is an essential requirementto evaluate BIT p erformance during the system design and development stages, inclusive of initial operational test and evaluation (IOT&E). This includes combining random failure detection data with data from pilot plant tests and fault insertion trials. Early emphasis on BIT design will generally result in accelerated BIT system establishment and more accurate early projections of BIT performance. BIT evaluation should be actively pursued throughout the ramp-up/operational stages, to assure that the necessary software and hardware changes are incorporated. 4.2.3.4 Specific Application Modelling of Availability and Maintainability When considering a system that is not only in one of the two standard states of operability, i.e. an up-state (the sy stem is capable of full operational performance) or a down-state (the system is totally inoperable and under repair); but may also perform its function at one or more levels of reduced efficiency, the conventional concepts of system integrity are found to be unsuitable and inadequate. The integrity of the system remains unresolved (there exist situations when the system is 400 4 Availability and Maintainability in Engineering Design neither fully operable nor fully inoperable, so that reliability and availability cannot be discretely determined), or it gets a value that contradicts empirical observation. If operation with reduced efficiency is regarded as normal, too high a value f or system integrity (reliability and availability) is obtained, whereas if a reduction in efficiency is regarded as not achieving total operability, too low a value for system integrity is obtained. a) Equivalent Availability (EA) The concept of equivalent availability affords a means of determining system integrity when the system is operating with reduced efficiency and is neither fully operable nor fully inoperable. From the definition of operational availability given previously,the general measure of availability of a system as a ratio is a comparison of the system’s usable time or operational time, to a total given period or cycle time Availability = Operational Time Time Period . (4.124) To be able to relate system operation with reduced efficiency to an integrity measure such as system availability (specifically to the concept of equivalent availability), it is necessary to first review the relationships of the various process functional characteristics with one another, such as maximum capacity, rated capacity, efficiency, utilisation and availability. Thus, referring back to Eq. (4.28), the efficiency measurement of an engineering process is a comparison of the process output quantity to its process throughput Process efficiency (X p )= Process output Process throughput . (4.125) According to Eq. (4.30), process utilisation is the ratio of process output to the con- strained ability to receive and/or hold the result or product inherent to the process (i.e. rated capacity) Process utilisation (U p )= Process output Rated Capacity . (4.126) The maximum ability to receive and/or hold the result of the process, or product inherent to the process, is expressed as maximum process capacity or design capacity. According to Eq. (4.20), this is defined in terms of the average output rate and the average utilisation rate expressed as a percentage Max. Capacity (C max )= Average Output Rate Average Utilisation/100 . (4.127) Furthermore, rated capacity (C r ) is maximum throughput. It is the throughput ac- tually achieved from operational constraints placed upon the ability of a series of 4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 401 operations to receive and/or hold the result or product inherent to the process. Re- ferring back to Eq. (4.23), we have Rated capacity (C r )= Material in process Processing time (4.128) = Process throughput  T C proc  max Maximum dependable capacity is achieved when a process system is operating at 100% utilisation or at maximum efficiency for a given operational time. Asystem’smaximum dependable capacity is equivalent to process output at 100% utilisation. Thus Output (100% utilisation) = Max. dependable capacity (4.129) The operational time during which a system is achieving a process output that is equivalent to its maximum dependable capacity is termed the equivalent operational time. Equivalent operational time is defined as “that operational time during which a system achieves process output which is equivalent to its maximum dependable capacity” Equiv. Operational time= Process Operational time× Process output Max. Dependable Capacity . (4.130) If Process output (100% utilisation) = Max. dependable capacity then Equiv. operational time = Process operational time . From Eq. (4.123), the general measure of availability of a system (or equipment) as a ratio is a comparison of the system’s operational time to a total given period. Similarly, the quantifiable measure of eq uivalent availability of a system is a comparison of the system’s equivalent operational time to a total given period. The system’s process operational time is equal to the equivalent operational time when its process output (at 100% utilisation) is equal to the maximum dependable capacity or, alternatively, when its process output is equal to the rated capacity (and rated capacity = maximum dependable capacity). From Eqs. (4.14) and (4.15), the difference between process utilisation, U p ,and process efficiency, X p , is the difference between a system’s rated capacity and process throughput respectively. From Eqs. (4.126) and (4.128), rated capacity C r is equivalent to maximum throughput. Thus, at 100% process utilisation, a system’s rated capacity is equal to maximum pr ocess throughput, and 100% process utilisation is e quivalent to maximum efficiency. Equivalent availability can be defined as “the comparison of the equipment’s equivalent operational time to a total given period, during which a system achieves 402 4 Availability and Maintainability in Engineering Design process output that is equivalent to its maximum dependable capacity”(Nelson 1981). Thus Equivalent Availability = Equivalent Operational Time Time Period (4.131) EA = ∑ (ET o ) T Equivalent Availability = Operational Time Time Period × Process output MDC (4.132) EA = ∑ [(T o ) ·n(MDC)] T ·MDC where: MDC = maximum dependable capacity n = fraction of process output. Thus Equivalent Availability (EA) at 100% utilisation or max. efficiency = Operational Time Time Period . The measure of equivalent availability can be graphically illustrated in the following example. A power generator is estimated to be in operation for 480 h at maximum dependable capacity. Thereafter, its output is estimated to diminish (derate) with an efficiency reduction of 50% for 120 h, after which the generator will be in full outage for 120 h. What is the expected availability of the generating plant over the 30-day cycle? What is the generator’s expected equivalent availability during this cycle? Full power at 100% Xp Half power at 50% Xp MDC MDC/2 120 hours 120 hours480 hours Time period = 720 hours Measure of equivalent availability of a power generator Full outage Expected Availability (A) = Operational Time Time Period = ( ∑ T o ) T =(480+120)/ 720 = 0.83 or 83% . Overview of Availability and Maintainability in Engineering Design 393 off-system repair, and logistics capability, to maintain the engineered installation. However, because of the peculiar nature of. Availability and Maintainability in Engineering Design • Percent of false alarms—the minimum tolerable percent of indicated faults where, in fact, no failure is found to exist. For each of the above. (DoD 3235.1-H 1982). c) Basic System and BIT Concurrent Design and Evaluation Considerations In designing for maintainability, the difficulty involved in the design and evaluation of BIT that

Định dạng
Số trang	10
Dung lượng	99,56 KB