4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 343 e) Sizing Maximum or Design Capacity The effective capacities of multiple system operations or processes within the same engineering design installation are usually different. A bottlene ck is a process that has the lowest effective capacity of any process in the designed installation and, thus, limits total output. Expansion of maximum or design capacity occurs only when bottleneck capacity is increased. However, flexible flow processes may have floating bottlenecks due to widely varying workloads on different processes at d if- ferent times. The theory of constraints (TOC) in designing for availability focuses on design alternatives that im pede m a ximum capacity (i.e. bottlenecks) , with the objective of maximising total product or materials process flow (Goldratt 1990). Also, the focus on bottlenecks is the means to increasing throughput and, consequently, the mass-flow rate of product and materials. The performance of the overall process design is a function of minimum bottleneck operations or processes. TOC provides the ability to descriptively characterise the functional relationships responsible for a typical complex process environment. Basically, through the application of system dynamics (SD) models, which are developedfrom TOC logic diagrams, insights into the dynamics of design alternatives that impede maximum capacity are obtained. The application of TOC in designing for availability involves the following steps: • Identification of system bottleneck(s). • Exploitation of the b ottleneck(s) (i.e. maximising throughput). • Elevating the bottleneck(s) (i.e. considering increasing capacity at the bottleneck(s)). Criteria for sizing design capacity Besides increasing the capacity of system bot- tlenecks in ordertoexpanddesign capacity,furthercriteriaforsizing design capacity are concerned with predictedprocess u tilisation rates that are close to 100%, indicat- ing the need to increase capacity because of the probability of declining productivity over time (i.e. diminished output against constant input). Process utilisation tends to be high er in capital-intensive processes, where prediction of utilisation between 90 and 100% is not uncommon. In such cases, occurrences of bottlenecks in the total process are in evitable, resulting in the essential application of TOC in designing for availability. A further consideration is economy of scale. In designing for availability, this im- plies not only increasing a design’s size o r capacity but at the same time attempting to decrease the average unit cost through various options, such as: • Spreading fixed costs: As the system utilisation rate increases, the average unit cost is reduced. • Reducing manufacturing/construction costs: Doubling facility size usually does not double costs. • Reducing material costs: Higher volumes allow for bulk acquisition and hand ling. 344 4 Availability and Maintainability in Engineering Design • Exploiting process advantages: High volume may justify investment in more efficient technology. • Increasing inherent availabilities: Determining initial system operational characteristics. In contrast to consideration of economy of scale is the need also to consider disec- onomies of scale, whereby excessive size can bring about complexity and inefficien- cies that, in turn, can raise the average unit cost, and result in a non-linear growth of overhead. 4.2.1.3 Inherent Availability (A i ) Modelling with Uncertainty Under initial conditions of uncertainty, it is feasible to define system availability only in terms of operable time and corrective maintenance. Availability defined in this manner is termed inherent availability (A i ). Under such idealised conditions, standby and delay times associated with scheduled or preventive maintenance,as well as administrative and logistics downtime are ignored. I nherent availability is thus useful in determining initial system operational characteristics under specified conditions, such as testing in a contractor’s facility, or any other controlled test en- vironm ent. Likewise, inhere nt availability becomes a useful term to describ e com- bined reliability and maintainability characteristics or to define the one in terms of the other during the early conceptual phase of the engineering design process when, generally,these terms cannotbe defined individually and are rather related to system performance. Since such a definition of availability is easily measured, it is frequently used as a contract-specified requirement. Inherent availability is primarily the concern of the design engineer during the establishment of functional interface with the contrac- tor and manufacturers in the early phases of the engineering design process. Inher- ent availability looks at availability from a design perspective; thus, reliability and maintainability are co nsidered complemen tary measures in the inherent availability equation. Inherent availability is in effect a m odel of reliability and maintainability measures. The inherent availability equation is given as (Eq. 4.46), (DoD 3235.1-H 1982): A i = MTBF (MTBF+ MTTR) (4.46) where: MTBF is the mean time between failure MTTR is the mean time to repair. A i is the largest availability value that can be achieved because only the times re- lated to operational disruptions due to breakdowns and their repair are considered, whereas downtime associated with planned maintenance as well as administrative and logistics downtime are ignored. If the expected design reliability measure of mean time between failures (or, more particularly, mean time to breakdowns) is very large compared to the related 4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 345 mean time to repair (or mean time to replace), then the inherent availability is high. Similarly, if the design maintainability measure of mean time to repair (MTTR) is a minimum, the inherent availability A i will be a maximum. It is obvious from the inherent availability equation that if design reliability de- creases (i.e. MTBF becomes smaller), then better design maintainability (i.e. shorter MTTR) is needed to achieve the same inherent availability. Conversely,as engineer- ing design reliability increases, design maintainability is not so importan t in being able to ach ieve the same inher ent availability. An important design integrity principle is thus obtained: Trade-offs can be made between reliability and maintainability to achieve the same availability in the engineering design process. a) The Exponential Function for Inherent Availability If λ is designated the failure rate (1/MTBF) and μ is designated the repair rate (1/MTTR), and both r ates are exponential, then the probability density function (p.d.f.) of a failure at time x is f(x)= λ e − λ x . (4.47) The probability density function that a subsequent repair will be completed at time t, the end of the availability cycle, t > x,is f(t −x)= μ e − μ (t−x) . (4.48) The availability cycle can be construed to have two consecutive periods; the first period is when operation is terminated by a failure, and the second period is when downtime ends with a completed repair. Inherent availability is the ratio of the aver- age time for the first period, to the average time for the cycle, which includes oper- ation an d downtime. The probability density function of a failure before t, followed by a repair completed at t, is the convolution (accumulated product) of Eqs. (4.47) and (4.48) f(t)= t 0 f(x) f(t −x)dt (4.49) f(t)= λμ e − μ t μ − λ e − λ t − e − μ t with μ > λ . The average period of an availability cycle E(t) is E(t)= t 0 tf(t)dt (4.50) E(t)= λ + μ λμ . 346 4 Availability and Maintainability in Engineering Design The average period of an availability cycle E(t) is expressed in terms of mean time between failure (MTBF) and mean time to repair (MTTR): E(t)= 1 λ + 1 μ E(t)=MTBF+MTTR Thus, inherent availability A i is the fraction of the availability cycle A i = MTBF (MTBF+ MTTR) = 1/ λ 1/ λ + 1/ μ = μ λ + μ (4.51) b) Confidence Determination of Inherent Availability Predictions Equation (4.51) indicates that if both the MTBF and MTTR distributions are expo- nential, then the inherent availability A i is a function of the failure rate λ and the repair rate μ . Since both λ and μ can readily be used for Bayesian prior and poste- rior analysis, random values can be generated in repeated trials in order to simulate avalueforA i . The percentage values of the resulting distribution on A i are the con- fidence limits of the inherent availability prediction. In predicting the value of A i , the ratio of th e mean operating period (MTBF) to that of the availability cycle (MTBF + MTTR) can be established by known or es- timated distributions for these values. However, establishing confidence levels on different values o f A i (i.e. quantitative assessment of A i ) can be done only by using known failure and repair data to establish distributions on MTBF and MTTR param - eters. For example, if both the time between failures and time to repair are exponen- tial, then the values for MTBF and MTTR can be determined from Bayesian prior distributions, which are functions of the prior data. Beyond such relatively simple analysis, establishing confidence levels on different values of A i is very difficult. Thus, predictions of A i are feasible under initial conditions of uncertainty, as with conceptual design, if it is possible to define system availability with respect to estimates of operable time and downtime due to corrective maintenance. Standby and delay times associated with scheduled or preventive maintenance, as well as administrative and logistics downtime are ignored. A major problem arises, though, when these estimates cannot be based on obtained data, and p redicting the value of A i cannot be quantitative. However, as indicated in Sect. 3.3.3.3 o n reliability eval- uation, a statistically acceptable qualitative methodology to determine the integrity of engineering d esign in the situation where data are not available or not mean- ingful is included in the concept of information integration technology (IIT).The concept of IIT includes a combination of methods and tools for collecting, organ- ising and analysing diverse information, and for utilising that information to guide optimal decision-making, based on Bayesian prior and posterior analysis (Booker et al. 2000). 4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 347 4.2.1.4 Preliminary Maintainability Modelling Probability theory and statistics have an important role in designing for main- tainability, as much as they have in engineering design integrity methodology as a whole. Various probability distributions may be used to quantify repair time data, and even uncertainty of repair times. Where repair time data are not available, in- cluding any data representing failure rates or expected time to failure, qualitative methods involving possibility theory need to be used, similar to the prediction of reliability considered in the previous section. However, in the case of data being available, even censored data, repair time distributions may be identified and the corresponding maintainability function may be obtained. The maintainability func- tion is used to predict the probability that a repair, beginning at time t = 0, will be accomplished in a time t.Themaintainability function M(t), for any distribution, is expressed by the following relationship (Dhillon 1999b): M(t)= t 0 f r (t) dt (4.52) f r (t) is the probability density function of the maintenance (repair) time. This maintainability function may be represented by various distribution functions, depending upon the statistical characteristics of the data and the function p aram- eters. The exponential distribution is particularly usefu l in presenting maintenance times that are random in duration. The exponential distribution probability density function is defined by the fol- lowing relationship f r (t)=(1/MTTR)e −(t/MTTR) (4.53) where: t is the variable repair time, and MTTR is the mean time to repair. By substituting Eq. (4.53) into Eq. (4.52), the following relationship is obtained M(t)= t 0 (1/MTTR)e −(t/MTTR) dt (4.54) M(t)=1− e −(t/MTTR) M(t)=1− e − μ t . The fundamental parameter is the repair rate, μ , the reciprocal of MTTR, rather than the failure rate, λ , the reciprocal of MTBF. The treatment of ‘time to an event’ is also reversed, in that the objective should be to make μ as high as possible, so that repairs are completed quickly, and to make λ as low as possible, so that the time between failures as long as possible. 348 4 Availability and Maintainability in Engineering Design In the maintainability relationship given in Eq . (4.54), let t denote a specified or required ‘standard’ time to repair. Since t is specified, it is necessary only to evaluate μ . Furthermore, suppose that available data consist of estimates of repair times t 1 ,t 2 , ,t r . The total estimated time, T, on repair status is then T = r ∑ i=1 t i . (4.55) Because the repair events are all independent, the joint probability, or likelihood L, of the first r repair times, t 1 ,t 2 , ,t r is the product of their respective probabilities L = r ∏ i=1 f r (t) . (4.56) From Eq. (4.53) we get L = μ exp − μ r ∑ i=1 t i . (4.57) The maximum-likelihood estimate, E,isavalue μ that maximises the natural loga- rithm of L E = lnL (4.58) E = rln μ − μ T ∂ E ∂μ = r μ −T . Setting the derivative to zero, the maximum-likelihood estimate of μ is μ = r T . (4.59) The best estimate m (t) of the maintainability fun ction, M(t), with standard mainte- nance time t, is then obtained where m (t)=M, in the case o f 0 ≤ M < 1, may be viewed as having a Bayesian prior or posterior distribution with parameters that are valid statistics for r repair actions and T total repair time (Eq. (4.60)). If these esti- mates cannot be based on obtained data, the methodology of information integra- tion technology (IIT) is applicable, in which Bayesian prior and posterior analysis is utilised. M(t)=1− e − μ t (4.60) m (t)=1− e − μ t = 1−e −r t /T = M . 4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 349 4.2.2 Theoretical Overview of Availability and Maintainability Assessment in Preliminary Design Availability and maintainability assessment attempts to estimate the expected us- age of equipment over a period of operational time subject to both planned and unplanned maintenance downtime or, alternatively, the expected utilisation over a specified period of each individual item of equipment at the upper systems lev- els of the systems breakdown structure. System availability is an important mea- sur e of repairable systems, since it considers both reliability and maintainability, whereas availability and maintainability modelling takes into account both the fail- ure and repair states of a system. Mor e specifically, availability and maintainability assessment takes into account not only the failure and repair states of a system but downtime due to preventive maintenance as well. Availability and maintainability assessment in this context is considered during the preliminary or schematic de- sign phase of the engineering design process. The most applicable methodology for availability and maintainability assessment in the preliminary design phase includes basic concepts of mathematical modelling such as: i. Markov modelling for design availability and maintainability ii. Achieved availability modelling subject to maintenance iii. Maintainability assessment with maintenance modelling iv. Maintenance strategy and cost optimisation modelling. 4.2.2.1 Markov Modelling for Design Availability and Maintainability Markov modelling is a powerful engineering design analysis tool, and it can be used in most cases of designing for reliability and designing for maintainabil- ity. The method is useful in modelling systems, especially large complex sys- tems, with dependent failure and repair modes. Markov models are particularly useful to model repairable systems with random failure occurrences (i.e. constant or time-independent failure rates) and random repair times (i.e. constant or time- independent repair rates). The method becomes unreliable for systems with time- dependent failure and repair rates. a) The Two-State Markov Model Several initial assumptions are important when applying Markov modelling to en- gineering design analysis (Dhillon 1999b): 350 4 Availability and Maintainability in Engineering Design Up State 0 System operating Down State 1 System failed λ μ Fig. 4.7 Markov model state space diagram • All events are independent of each other. • The probability of transition from the system operating state to the system failed state (state 0 to state 1) is given by λ Δt,whereΔt is a finite time interval, and λ is the constant failure rate, or the transition rate. • The probability of transition from the system failed state to the system operating state (state 1 to state 0) is given by μ Δt,whereΔt is a finite time interval, and μ is the constant repair rate, or the transition rate. • The probability of more than one transition from one state to another in Δt is very small. The transition states can be represented in the following diagram (Fig. 4.7). From Fig. 4.7, the followingmathematicalmodel can be derived(Dhillon 1999b): P 0 (t + Δt)=P 0 (t)(1 − λ f t)+P 1 (t) μ r t (4.61) and P 1 (t + Δt)=P 1 (t)(1 − μ r t)+P 0 (t) λ f t . (4.62) Status variables and probabilities The various statu s variables an d probabilities of these two equations need to be evaluated: λ f is the system constant failure rate, μ r is the system constant repair rate, P 0 (t + Δt) is the probability that the system is in an operating state 0 at the time t + Δt, P 1 (t + Δt) is the probability that the system is in a failed state 1 at the time t +Δt, P 0 (t) is the probability that the system is in an operating state 0 at time t, P 1 (t) is the probability that the system is in failed state 1 at time t, (1− λ f t) is the probability of no failure in time interval t when the system is in state 0, (1− μ r t) is the probability of no repair in time interval t when th e system is in state 1, λ f t is the probability of system failure in time interval t, μ r t is the pro bability of accomplishing system repair in time interval t. 4.2 Theoretical Overview of Availability and Maintainability in Engineering Design 351 In the limiting case, Eqs. (4.61) and (4.62) are represented by lim Δt→0 P 0 (t + Δt) −P 0 (t) Δt = dP 0 (t) dt = P 1 (t) μ r −P 0 (t) λ f (4.63) lim Δt→0 P 1 (t + Δt) −P 0 (t) Δt = dP 1 (t) dt = P 0 (t) λ f −P 1 (t) μ r (4.64) In order to solve Eqs. (4.63) and (4.64) at time t = 0, the values for the following probabilities are: P 0 (0)=0, and P 1 (0)=0. Then P 0 (t)= μ r λ f + μ r + λ f λ f + μ r e −( λ f + μ r )t (4.65) and P 1 (t)= λ f λ f + μ r + λ f λ f + μ r e −( λ f + μ r )t . (4.66) Thus, at any point in time t, the system’s availability may be represented by the following A(t)=P 0 (t) (4.67) and P 0 (t)= μ r λ f + μ r + λ f λ f + μ r e −( λ f + μ r )t where: A(t)= the system’s availability at a specified time t. For engineering design availability assessment, estimate of availability for the sys- tem would be a steady-state availability, A s ,wheret → ∞. Thus A s = lim t→∞ A(t) (4.68) and A s is A(steady state). Substituting Eq. (4.67) into Eq. (4.68) gives the steady-state availability for the system. Thus, A s = A(steady state) is given by A s = lim t→∞ μ r / λ f + μ r + λ f / λ f + μ r (e −( λ f + μ r ) t) (4.69) A s = μ r λ f + μ r . b) Multi-State Markov Models—Method of Supplementary Variables The components of most systems are assumed to fail with constant failure rates (i.e. failure times are governed by exponential distributions). However, though re- pair times of components are often non-exponentially distributed, they usually have 352 4 Availability and Maintainability in Engineering Design general distributions (i.e. repair rates of the components are arbitrary functions of time). Multi-component repairable systems with general failure and/or repair time distributions are difficult to analyse mathematically. These systems are known as non-Markovian systems, as the stochastic process is non-Markovian.However, with the inclusion of the method of supplementary variables, the Markov process ap- proach provides a sufficient level of analysis that can be used to model complex systems with constant failure rates and non-exponential repair times. Inclusion of sufficient supplementary variables in the specification of the state of the system can make a process Markovian (Dhillon 1983). To enable the system to be characterised as a Markov system, a mathematical model is constructed with concise definitions of the various states for the system, together with a set of supplementary variables that include the concept of efficiency (or, rather, reduced efficiency) in the state definition of the system. Because the state at time t is an exact description of the circumstances prevailing in the system at that time, the behaviour of the system over the passage of time Δt may be found by determining the state p robabilities of the system. A complex system can thus be characterised as a Markov system by employing a set of supplementary variables with which a part of the system’s history is included in the state definition of the sys- tem. With the inclusion of supplementary variables, the Markov model represents a multi-state stochastic system with modes of normal operation and total failure, as well as operation at several different levels of performance (i.e. with reduced efficiency). The system has thus three operation modes: ‘normal operation’, ‘operation with reduced efficiency’ and ‘non-operation’. The supplementary variable technique en- ables a dynamic model of the behaviour of the system to be set up in the form of a set of differential-difference equations with variable coefficients, and respective boundary and initial conditions (Virtanen 1977). As an illustration of the method of supplementary variables, consider the system transition diagram in Fig. 4.8 (Dhillon 1983). The diagram represents a complex system that operates partially when so me of system’s components fail and, if a catastrophic failure occurs, the system in its en- tirety fails. When the system is operating partially, a repair process is expected to be initiated to restore the system to its fully operational state. However, the system may have a catastrophic failure from the partially operating state. Once the system fails co mpletely, it is expected to be resto red to its normal operating state. System Operating Normally System Operating Partially System Operating Failed λ λ 2 λ 1 μ p μ f (x) Fig. 4.8 Multi-state system transition . Overview of Availability and Maintainability in Engineering Design 349 4.2.2 Theoretical Overview of Availability and Maintainability Assessment in Preliminary Design Availability and maintainability. acquisition and hand ling. 344 4 Availability and Maintainability in Engineering Design • Exploiting process advantages: High volume may justify investment in more efficient technology. • Increasing inherent. is a powerful engineering design analysis tool, and it can be used in most cases of designing for reliability and designing for maintainabil- ity. The method is useful in modelling systems, especially