4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 353 The following assumptions are associated with this multi-state model: • System failures are statistically independent. • A partially, or fully failed system is restored to a ‘good as new’ state. • System failure rates are constant. • System component failure times are random. • The partially failed system repair rate is constant. • Failed system repair times are arbitrarily distributed. As with the two-state Markov model, the mathematical expressions for the multi- state Markov model, including supplementary variables indicating partial operation or a r educed efficiency of the system, are given in the following Markov multi-state model equations, according to Fig. 4.8: P 0 (t + Δt)=P 0 (t)(1 − λ 1 Δt)(1 − λ 2 Δt)+P 1 (t) μ p Δt (4.70) + ⎡ ⎣ ∞ 0 P 2 (x,t) μ f (x)dx ⎤ ⎦ Δt P 1 (t + Δt)=P 1 (t)(1 − λ 3 Δt)(1 − μ p Δt)+P 0 (t) λ 1 Δt (4.71) P 2 (x+Δt;t +Δt)=P 2 (x,t)[1 − μ f (x)Δt] (4.72) λ j is the jth constant failure rate of the system with j = 1 (normal–partial transition), j = 2 (normal to failed), j = 3 (partial to failed), μ p is the system constant repair rate from the partial operating state 1 to the normal operating state 0, μ f (x) is the repair rate when the system is in the failed state and has the elapsed repair time of x, P 0 (t + Δt) is the probability that the system is in an operating state 0 at timet+Δt, P 1 (t + Δt) is the probability th at the system is in a partially failed state 1 at time t + Δt, P 2 (x+Δt;t +Δt) is the probability that at time t, the system is in a failed state 2 and the elapsed repair time lies in the interval (x,x + Δx), P 0 (t) is the probability that the system is in an operating state 0 at time t, P 1 (t) is the probability that the system is in a partially failed state 1 at time t, P 2 (x,t) is the probability that the system is in a failed state 2 after an elapsed repair time of x, (1− λ i Δt) is the probability of no failure in time interval Δt when the system is in state i, (1− μ p Δt) is the probability of no repair in time interval Δt when the system is in state 1, (1− μ f Δt) is the probability of no repair in time interval Δt when the system is in state 2. The respective boundary and initial conditions are: P 2 (0,t)= λ 2 P 0 (t)+ λ 3 P 1 (t) 354 4 Availability and Maintainability in Engineering Design and at t = 0 P 0 (0)=1 P 2 (0)=0 P 2 (x,0)=0 The differential-difference eq uations with variable coefficients are dP 0 (t) dt +( λ 1 + λ 2 )P 0 (t) −P 1 (t) μ p = ∞ 0 P 2 (x,t) μ f (x)dx (4.73) dP 1 (t) dt +( λ 3 + μ p )P 1 (t) −P 0 (t) λ 1 = 0 (4.74) ∂ P 2 (x,t) ∂ x + ∂ P 2 (x,t) ∂ t + μ f (x)P 2 (x,t)=0 (4.75) So far, the supplementary variable technique has been used to obtain the model’s partial differential-difference equations, or state equations, which describe the be- haviour of the system. With the help of Laplace transforms, both transient and steady-state solutions for these state equations may now be found. The Laplace transform of a function is given by the expression E(t)= ∞ 0 e −S t f(t) dt . (4.76) Using Laplace transforms, and initial conditionP 0 (0)=1, the differentialEqs. (4.73) to (4.75) are transformed into steady-state solutions for these state equations, with the boundary condition of: P 2 (0,s)= λ 2 P 0 (s)+ λ 3 P 1 (s) Then sP 0 (s) −1+( λ 1 + λ 2 )P 0 (s) −P 1 (s) μ p = ∞ 0 P 2 (x,s) μ f (x)dx (4.77) and sP 1 (s)+( λ 3 + μ p )P 1 (s) −P 0 (s) λ 1 = 0 (4.78) and ∂ P 2 (x,s) ∂ x +[s+ μ f (x)]P 2 (x,s)=0 . (4.79) The steady-state values for P 0 (s), P 1 (s) and P 2 (s) can now be found through inte- grating. The steady-state solutions are independent of the type o f waiting time and repair time distributions, and only the expected values of these distributions become 4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 355 apparent. Furthermore, steady state is achieved under general conditions, and the solutions for steady state can be found without any exact knowledge about the dis- tributions of the system (Virtanen 1975). 4.2.2.2 Achieved Availability Modelling Subject to Maintenance Achieved availab ility (A ´ s ) is frequently used during development testing and initial production testing when a system or its equipment is not operating in its intended support environment. Excluded are operator before-and-after maintenance checks and standby periods. Achieved availability is much more of a system hardware- oriented measure than is operational availability, which considers operating envi- ronment factors. It is, h owever, dependent on a preventive maintenance policy, which can be greatly influenced by non-hardware considerations. The mathematical model for achieved availability, according to the USA Department of Defence, is given by the following expression (Eq. 4.80), (Conlon et al. 1982): A ´ s = OT OT + TCM+ TPM (4.80) where: OT = operating time TCM = total corrective maintenance TPM = total preventive maintenance. An alternative approach to mode lling achieved availability is to consider the prob- ability that a system or its equipment, when used under designed conditions in an ideal support environment, will perform according to the specifications formulated during the preliminary design phase. The most significant characteristic of achieved availability for both alternatives is that it includes maintenance time (corrective and preventive),and excludes logistic delay times. The mathematical model for achieved availability in this context is given as (Dhillon 1999b): A ´ s = MTBM MTBM+TCM+ TPM (4.81) where: MTBM is the mean time between maintenance. This differs from inherent availability, A i , only in its inclusion of the considera- tion for total preventive maintenance. The measurement base for MTBM must be consistent when calculating achieved availability A ´ s . MTBM is represented by the 356 4 Availability and Maintainability in Engineering Design following expression MTBM = 1 MTBF + 1 MTBM-LD + 1 MTBPM (4.82) where: MTBF is the mean time between failures MTBM-LD is the mean time between maintenance less logistic delays MTBPM is the mean time between preventive maintenance. The measurement base for MTBF, MTBM–LD and MTBPM must be consistent when calculating the MTBM parameter. Consider further the values TCM and TPM MDT = TCM+ TPM (4.83) where: MDT = mean active maintenance downtime TCM = total corrective maintenance TPM = total preventive maintenance and MDT = ∑ m i=1 CM i CF i ∑ m i=1 CF i + ∑ n j= 1 PM j PF j ∑ n j= 1 PF j (4.84) where: n = total corrective tasks performed m = total p reventive tasks performed CM i = elapsed time for corrective task i PM j = elapsed time for preventive task j CF i = estimated frequency for task i PF j = estimated frequency for task j. 4.2.2.3 Maintainability Assessment with Maintenance Modelling Maintainability and maintenance are closely interrelated, yet they are not the same. Maintainability refers to the measures taken during the design, development and installation o f a system or its equipment that will reduce the required maintenance effort, logistics and costs and, thus, also the operational downtime. Maintenance refers to the measures taken to restore and keep the system or its equipment in an operable condition. Maintenance is, in effect, the care of the physical and opera- tional condition of the system or its equipment. Many mathematical models have been developed for both maintaina bility and for ma intenance. However, maintenance models have mainly been developed to better define and predict certain aspects of maintenance, such as scheduled downtime, scheduled replacement, and optimal warranty periods, for installed systems and equipment. 4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 357 These models are usually based on certain probability distributions, predominantly the exponential distribution for representing corrective maintenance times, and the lognormal distribution for representing minimum operating times. a) Impact of Maintenance Assessment on Systems Design A widely used probability distribution in predicting the impact of designing for maintainability on systems design, based upon defining constraints on the minimum operating time below which no maintenance activity will result in downtime, is the lognormal distribution. The lognormal distribution probability density function is defined by the follow- ing relationship f r (t)= 1 (t − θ ) σ √ 2 π e −{1/2[ln(t− θ )− β ]} (4.85) where: t = maintenance time θ = minimum operating time β = mean time for maintenance σ = standard deviation of the maintenance times. An estimate of the mean time for maintenance, β , is based on an estimate of the number of shutdowns (i.e. planned downtimes that have an impact on production) that are required over a specific period, such as one year. This is best approached from a calculation of the average of the sum of the natural logarithms of the indi- vidually estimated downtimes, where m is the number of shutdowns over a specific period. The relationship for the mean time for maintenance, β , considering the estimated downtimes and the number of shutdowns, is defined as β =(lnt 1 + lnt 2 + lnt 3 + + lnt m )/m . (4.86) The standard deviation, σ , of the estimated mean time for maintenance, β ,isgiven by σ = m ∑ i=1 (lnt i − β ) 2 /(m−1) 1/2 . (4.87) For the lognormal distribution, the equation for the maintainability function M(t) is given as the following expression M(t)= ∞ 0 tf r (t)dt (4.88) M(t)=1/ σ √ 2 π ∞ 0 e −1/2(lnt− β ) 2 dt 358 4 Availability and Maintainability in Engineering Design This maintainability function serves primarily as a design parameter in designing for maintainability, whereby it defines the expected downtime over a specified period. The measures used in m a intainability analysis, besides the widely used mean time to repair (MTTR), include concepts related mainly to maintenance, such as the expected mean preventive maintenance downtime,themedian corrective mainte- nance downtime,theexpected maximum corrective maintenance downtime and the expected mean maintenance downtime. b) Maintainability Measures and Maintenance Assessment The expected mean preventive maintenance downtime, T pm , is a useful parameter in design for maintainab ility, in that it gives an indication of the expected scheduled downtime of a system over its life cycle. The objective o f defining the expected mean preventive maintenance downtime is to estimate the impact of a preventive maintenance program on the system, whereby the system and its equipment (as- semblies and components) are to be kept at a specified design performance level. Such a preventive maintenance program is to affect the point in time at which the equipment wears out or fails, resulting in system downtime. A carefully planned preventive maintenance program can help to reduce system downtime and improve its performance. On the other hand, a poorly established preventive maintenance program can have a negative impact on system operations. The expected mean preventive maintenanc e downtime, T pm , is expressed by the mathematical model (Dhillon 1999b): T pm = ∑ k i=1 (T pti )(F pti ) ∑ k i=1 (F pti ) (4.89) where: T pti = the estimated lapse time for preventivemaintenancetask i fori= 1,2,3, ,k F pti = the estimated frequency o f preventive maintenance task i for i = 1,2, 3 , ,k k = number of preventive maintenance tasks. The median corrective maintenance downtime, T cm , is a measure of the time within which 50% of all corrective maintenance can be com pleted. Calculation of the me- dian corrective maintenance downtime depends upon the distribution of the times for corrective maintenance. For a lognormal distribution of repair time, the median corrective maintenance downtime, T cm , is expressed as T cm = MTTR/e σ 2 /2 (4.90) σ 2 = the variance around the mean value of the natural logarithm of repair times. 4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 359 For an exponential distribution of corrective maintenance repair times, the median corrective maintenance downtime, T cm , is expressed as T cm = 0.69/ μ (4.91) μ = the r epair rate, which is the reciprocal of MTTR. The expected maximum corrective maintenance downtime T cm is a measure of the time required to complete corrective maintenance repairs at the 90th or 95th per- centiles. This implies that, for example, in the case of the 95th percentile, the ex- pected maximumcorrectivemaintenance downtime is the time within which 95% of all corrective maintenance can be completed. It indicates an estimation level of sig- nificance where no more than 5% of the expected corrective maintenance will take longer than the expected maximum corrective maintenance downtime. Calculation of the expected maximum corrective maintenance downtime also depends upon the distribution of the times for corrective maintenance. The expected maximum corrective maintenance downtime with a lognormal dis- tribution of corrective maintenance times is expressed as T cm = antilog(t m + k σ ) (4.92) where: t m = the mean of the logarithms of repair times k = the value 1.28 or 1.65 for the 90th or 95th percentiles σ = the standard deviation of the logarithms of repair times. The expected maximum corrective maintenance downtime with an exponential dis- tribution of corrective maintenance times is expressed as T cm = 3×(MTTR) (4.93) where: MTTR = the mean time to repair, given by the following formula. MTTR = ∑ m i=1 λ i T i ∑ m i=1 λ i (4.94) λ i = the constant failure rate of item i = 1, 2, 3, ,m T i = the corrective maintenance or repair time needed to restore item i = 1,2,3, ,m. The expected mean maintenance downtime, MDT, is the total time needed to restore the system or its equ ipment to a specified level of performance, and to maintain it at that level of performance. It includes preventive and corrective maintenance times but not administrative and logistic delay times. In this regard, it is synonymouswith achieved availability that includes maintenance time (corrective and preventive) but excludes administrative and log istic delays. After determining T pm and T cm ,the 360 4 Availability and Maintainability in Engineering Design expected mean maintenance downtime, MDT, is given b y the following relation- ship MDT = T pm + T cm (4.95) Substituting Eqs. (4.89) and (4.93) with (4.94) into Eq. (4.95) gives MDT = ∑ k i=1 (T pti )(F pti ) ∑ k i=1 (F pti ) + 3 ∑ m i=1 λ i T i ∑ m i=1 λ i (4.96) where: T pti = the estimated lapse time for preventivemaintenancetask i fori= 1,2,3, ,k F pti = the estimated frequencyof preventivemaintenancetask i for i = 1,2, 3 , ,k. To determine the expected mean total downtime, DT, estimates of delays (adminis- tration and logistic) need to be added to MDT. These delays are usually estimated as fractions of MDT. 4.2.2.4 Maintenance Strategies and Cost Optimisation Modelling So far, the interrelationships of maintainability and maintenance have been consid- ered with respect to measures used in maintainability analysis that include mainte- nance concepts, such as preventive maintenance, corrective maintenance and down- time. In d esigning for maintainability, it is important to understand the concepts of maintenance strategies. In designing for maintainability, the up-front establishment of cost-effective maintenance strategies has a significant impact on the final outcome of the engi- neering design, particularly in con sidering built-in-testing (BIT), online fault di- agnostics, and the application of condition monitoring. A proper understanding of the basic principles of maintenance thus becomes extremely important (in fact, it becomes essential) in the engineering design process, and includes not only mainte- nance and production people but design engineers as well. Once the basic principles of maintenance are fully understood, then the more sophisticated and complex as- pects essential to cost-effective maintenance strategies can be considered. These aspects include an understanding of condition monitoring, condition measurement, fault diagnostics and predictive maintenance, and how and when they should be car- ried out in order to effectively care for the physical and operational condition of the system or its equipment. Designing for maintainability is not only a consideration of the measures taken during the design, development and installation of a system that will reduce the required maintenance effort and, thus, also the operational downtime, as well as lo- gistics and costs, but it is also a provision of the required maintenance strategies that complement these measures in order to ensure the as-designed system perfor- mance and related warranty. All these aspects thus need to be carefully considered and placed in their correct perspective for establishing cost-effective maintenance strategies in designing for maintainability. 4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 361 a) The Basic Principles of Maintenance Maintenance can be defined as “the continuous action of caring for the condition of equipment”. By definition, the concept of condition has been brought into the understanding of maintenance. Equipment condition is the operational and physical state of equipment on which the functions of the equipment depends. In order to understand equipment condition, it thus becomes necessary to under- stand the concept of equipment function. The function of equipment is the work and properties that the equipment is designed to perform and to have. There are two basic types of equipment functions: • Operational function • Physical function. The operational functions can be grouped into p rimary and secondary functions. The primary operational function of equipment is described by defining what work the equipment primarily does. The secondary operational functions of equipment are the other activities that the equipment also does. As an example, the primary op- erational function of a heat exchanger would be to transmit h eat through conduction from a hot fluid to a cooler fluid, thereby decreasing the temperature of the hot fluid, and increasing the temperature of the cooler fluid. A secondary function of a heat exchanger is to reduce the occurrence of flash vapour in the liquid line (sometimes called flash gas, arising from a sudden change of the fluid to a vapour). The physical functions of equipment are described by defining the design con- figuration and physical properties of the equipment. Referring to the previous ex- ample, the most significant physical function of a heat exchanger is the ability to provide efficient heat transfer at high temperature through a heat transfer surface that is large enough to transfer the heat sufficiently, and that is also able to resist expansion stresses that may cause cracks and dangerous leakages. Thus, the condition of equipment as described in the definition of maintenance can now be reviewed. It can be seen that the condition of equipment is directly re - lated to the equipment’s functions. There are two types of equipment conditions, related to the functions of the equipment and called the functional states of condi- tion. The two types of equipment conditions are: • Operational condition • Physical condition. The operational condition of equipment relates to its operational functions, and the physical condition of equipment relates to its physical functions. Maintenance can now be redefined as “the continuous action of caring for the operational and physical conditions of equipment”. The next concept to con sid er in th is defin ition of maintenance is the “continuous action”. There are predominantly two actions in maintenance: • Corrective action • Preventive action. 362 4 Availability and Maintainability in Engineering Design Corrective action, by definition, is “that action necessary to rectify or set right de- fects according to a standard”. Corrective action is thus that maintenance work that fixes or repairs equipmentafter it has failed. Preventive action, by definition, is “that action serving to hinder or stop defects”. Preventive action is thus that maintenance work that prevents or stops defects from occurring in equipment before it has failed. By progressive definition, the concept of failure has been brought into the under- standing of maintenance action. Thus, in order to fully understand maintenance, it is essential to understand the concept of failure. Equipment failure has already been defined as “the inability of the equipment to function within its specified limits of performance”. There are thus two descriptions o f failure: • Functional failure • Potential failure. Functional failure in equipment is “the inability of the equipment to carry out the work that it was designed to perform within specified limits of performance”. This inability has qualitative gradation, depending upon the severity of functional failure. There are two degrees of severity in function al failure: • A complete or total loss of function, where the equipment cannot carry out any work that it was designed to perform. • A partial loss of function, where the item is unable to function within specified limits of performance. Potential failure in equipment is “the identifiable condition o f the equipment, indi- cating that functional failure can be expected”. Potential failure is a condition or state of condition of the equipment. Functional failure is an occurrence or incident. The definition of preventive action in maintenance can now be reviewed. From the point of view of the two descriptions of failure, preventive action in mainte- nance is “that action serving to hinder or to stop functional or potential failures”. Thus, preventive action in maintenance is that action serving to hinder or stop the occurrences of defects in the function of equipmentthroughthe detection of an iden- tifiable condition arising in the equipment, indicating that it is unable to carry out the work that it was desig ned to perform within specified limits of performance. Maintenance can thus be comprehensively defined as “the continuous correc- tive and preventive action of caring for the operational and physical conditions of equipment”. The different types of maintenance In o rder to convert the definition of mainte- nance into practice, it is necessary to define how corrective and preventive action in maintenance is implemented. These actions in maintenance are practically imple- mented through different types of maintenance. There are three basic types o f maintenance: • Defect maintenance. • Routine maintenance. • Preventive maintenance. . (4.88) M(t)=1/ σ √ 2 π ∞ 0 e −1/2(lnt− β ) 2 dt 358 4 Availability and Maintainability in Engineering Design This maintainability function serves primarily as a design parameter in designing for maintainability, whereby it defines. corrective maintenance and down- time. In d esigning for maintainability, it is important to understand the concepts of maintenance strategies. In designing for maintainability, the up-front establishment. considered and placed in their correct perspective for establishing cost-effective maintenance strategies in designing for maintainability. 4.2 Theoretical Overview of Availability and Maintainability in