For an embeddedrealtimeprocesscontrol system incorporating artificialintelligence programs, the system reliability is determined by both the softwaredriven response computation time and the hardwaredriven response execution time. A general model, based on the probability that the system can accomplish its mission under a time constraint without incurring failure, is proposed to estimate the softwarelhardware reliability of such a system. The factors which influence the proposed reliability measureare identified, and the effectsof mission time, heuristics and realtime constraintson the system reliabilitywith artificialintelligence planning procedures are illustrated. An optimal search procedure might not always yield a higher reliability than that of a nonoptimal search procedure. Hence, design parameters and conditions under which one search procedure is preferred over another, in terms of improved softwarehardware reliability, are identified.
364 IEEE TRANSACTIONS ON RELIABILITY, VOL 40,NO 3, 1991 AUGUST Effect of Artificial-Intelligence Planning-Procedures on System Reliability Ing-Ray Chen, Member IEEE Pr { mission accomplished University of Mississippi, University = lo lo Farokh B Bastani, Member IEEE University of Houston, Houston Reader Aids Purpose: Analyze a reliability model Special math needed for explanations: Probability Special math needed to use results: None Results useful to: Computer designers and reliability analysts Abstract - For an embedded real-timeprocess-control system incorporatingartificial-intelligence programs, the system reliability is determined by both the software-drivenresponse computation time and the hardware-driven response execution time A general model, based on the probability that the system can accomplish its mission under a time constraint without incurring failure, is proposed to estimate the softwarelhardwarereliability of such a system The factors which influence the proposed reliability measure are identified, and the effects of mission time, heuristics and real-time constraints on the system reliability with artificialintelligence planning procedures are illustrated An optimal search procedure might not always yield a higher reliability than that of a non-optimal search procedure Hence, design parameters and conditions under which one search procedure is preferred over another, in terms of improved software/hardwarereliability, are identified INTRODUCTION In embedded computer systems, the computer is a part of a larger system such as an automated manufacturing system, a robot, or a defense system The computer usually provides control functions and must operate in real-time to cope with deadlines Typically, it executes an infinite loop in which it first reads the sensor values, then spends time Tp to compute or plan a response, and time T, to execute the response The reliability of an embedded computer system during time t can be determined by viewing it as a series connection of statistically independent hardware and software components, so that: (1) However, a more appropriate reliability measure is the probability that the system accomplishes its mission: Pr {mission accomplished} Tp= tp, T, = t,} dF(t,,t,) Key Words - Real-time process-control system, Artificial intelligence, System reliability, Search heuristics Rsystem ( t , = Rhardware ( t ) Rsoftware ( t , I (2) Notation F joint Cdf of the time to compute and execute the response; it depends on both the hardware and software parts of the system The integrand in (2) can be expressed as: Hardware failures are due to many things such as wear and tear on components, while software failures are due to: residual faults in the program use of suboptimal algorithms, such as heuristics failure to meet real-time constraints We make the following general assumptions concerning the reliability of an embedded computer system + Mission time The mission time is Tp T,; the longer it is, the more likely that the software or the hardware will fail Tp & T, are inversely correlated, since generally the more time spent in planning (computing a response) the more likely that the strategy is optimal, and vice versa Hardware-component reliability Hardware components have a variety of failure modes & mechanisms, at least some of which can be affected by the software, eg, imposing excessive “stress” on some hardware components When software affects the reliability of hardware components, then software & hardware failures are only conditionally statistically independent Hardware-system reliability The software can affect the form of Rhardware ( t ) For example, consider a 2-component hardware system One plan might require that both components be active in order to react to the sensor inputs, while another plan might require only one component to be active When software affects the reliability of hardware in this way, then software & hardware failures are only conditionally statisticallyindependent Residual software faults If software is not modified, its failure rate due to faults that remain undetected is constant Intrinsic sofrware faults Such faults, if any, are due to fundamental limitations of the algorithm used in the software For example, the use of heuristics can result in occasional failures even though the algorithm is devoid of any residual faults This is modeled by the Pr{algorithm works correctly during planning time} 0018-9529/91/08OO-0364$01.WO1991 IEEE I CHEN/BASTANI: EFFECT OF ARTIFICIAL-INTELLIGENCE PLANNING-PROCEDURES ON SYSTEM RELIABILITY Failures due to real-time constraints In embedded computer systems, the environment can change dynamically Hence, if the planning time is too long then, when the response is executed, the environment might have changed too much for the response to have any effect These failures are often characterized by the inability of the system to meet real-time constraints This is modeled by the Pr(p1an is correct for sensor readings during the planning & execution times I plan is based on observations at time O} All elements can be combined to yield: Pr {mission succeeds} = :1 :1 Rh(tp+te;a) Pc(tp)Pv(tp,te)dF(tp,te) 365 (5) Section discusses the system reliability for two search techniques: generate and test heuristic pruning [8] Example illustrates the effect of mission time on the system reliability while example considers failures due to the use of heuristics Section illustrates the tradeoff between the response computation time and the response execution time for some search procedures in real-time situations and identifies conditions under which a non-optimal search strategy can provide a better system reliability than an optimal strategy SEARCH HEURISTICS Notation 2.1 Effect of Mission Time planning time execution time Assumptions f e ( f ) , F e ( t ) pdf, Cdf of te PC( t p ) = 1, ie, there are no software errors due to the reliability of component i during time te use of heuristics or other approximate algorithms A, constant software-failure rate due to residual faults Pv(rp,te) = , ie, there are no real-time constraints Ah constant hardware-failure rate (used only in examples) fe (re) = ( r e ) , ie, the response execution time is zero; Rh,gen ( times;parameters ) general hardware-reliability ( r ) is the standard impulse function: it has unit area concenfunction trated in the immediate vicinity of t Rh( t ; a ) hardware-reliability function with parameter a, when hardware & software failures are statistically Then (5) becomes: independent Pc( t ) Pr {heuristicsused in the planning method work dur(6) ing time t } Pr{mission succeeds} = P V ( t l , r ) Pr{plan is valid during time tl +rz I plan is based on observations at time O} Example I gauf(.) standard normal (Gaussian) Cdf Consider a Generate & Test search procedure where each Other, standard notation is given in “Information for Readers step & Authors” at the rear of each issue In this paper we focus on the application of some artificialtakes T time units to process intelligence search strategies for fault-tolerant process-control has a probability p of passing the test systems and address the design tradeoff in the optimality of This implies that: search strategies vs the satisfaction of real-time constraints tt? + Detailed Assumptions Hardware-system failure is statistically independent of software; this allows us to replace Rh,genby Rh, the effective hardware reliability In all the examples, Rh = exp ( - A h t ) , for simplicity and tractability The software has been debugged completely, ie, As = Then, Then (4)becomes: Same as example 1, except: To obtain a closed-form solution, make the additional simplistic assumption of constant hardware-failure rate Then (8) becomes: Pr {mission succeeds} Pr {Search procedure terminates and the system is alive} W p ( l - p ) i - l Rh(iT;CX) = (8) i=l Example 1‘ I 366 IEEE TRANSACTIONS ON RELIABILITY, VOL 40,NO , 1991 AUGUST Pr{Search procedure terminates and the system is alive} When h is the uniform distribution over ( 2bd12,bd),then (1 1) becomes: R(r,d) = rd exp( -hh2bd12T) -exp[-Xh(bd+l)n Thus for this simplistic case, the reliability of the Generate & Test search procedure is good when the hardware failure rate is small and is poor when the hardware failure rate is large a reasonable result D (bd-2bdI2+1)(1-exp( -hhT)) '(12) The optimal value of d can be obtained by solving: 2.2 Effect of Heuristics Assumptions Pv(tp,t,) = 1, ie, there are no real-time constraints f,(t,) = (t,) , ie, the response execution time is zero Then ( ) becomes: Pr{mission succeeds} = Rh(tp;a) Pc(tp)fp(tp)dtp (10) Example Same as example except: Examine each node up to depth d without using any heuristics Thus, r = 1, and the system reliability is: R(d) = (d/D)exp( -bdXhT) (14) The optimal value of d is obtained by solving: Consider the Branch & Bound method [8] which uses heuristics to limit the search for an optimal solution The Generate & Test technique discussed in section 2.1 is not suitable where optimal or reasonably optimal solutions are required, since all solutions must be inspected Notation b D r branching factor of the resulting tree maximum depth of the tree Pr {heuristic does result in a correct answer}, viz, the reliability of the heuristic depth of the search tree d Pr { i nodes are examined} h(i) time duration to analyze each node T U(a,b) uniform Cdf over ( a , b ) Assumption d/D = Pr{correct answer is found Example 2" I d) Now, if the depth of the search tree is limited to d, then between 2bd12and bd nodes have to be examined [8] The results are: - Tp U(2bd12T,bdT) Pc(iT) = rd/D Example 2' Same as example 2, except: To obtain a closed-form solution, make the additional simplistic assumption of constant hardware-failure rate Then Pr { system completes the search successfully} = The optimal value of d (for this example ) is independent of the maximum depth of the tree D ( Id ID) Compare this with the heuristic search (example 2'); there is a value of r below which a complete search results in a more reliable system I' REAL-TIME CONSTRAINTS In section we assumed that the system state does not change while the response is being computed While this is sometimes true such as playing chess, it is not true for real-time systems Nonnally, in real-time processcontrol systems, there is a stringent real-time constraint which must be satisfied When a real-time situation arises, there is a response computation period in which an optimal or a near-optimal strategy must be formulated This period is followed by a response execution time which activates the underlying hardware mechanisms and carries out the strategy Very frequently, if the system spends too much time in the response computationperiod for formulating an optimal strategy, then there is a high risk that the response is not completed within the real-time constraint - since there is not enough time left for response execution On the other hand, if a non-optmal strategy is selected in order to meet the real-time constraint, the reliability of the resulting strategy might not be acceptable due to the poor hardware reliability associated with the strategy selection Thus, there is a tradeoff between response computation-time and response execution-time under a real-time situation Assumptions Pc(t,,) = 1, ie, there are no planning errors PV(tp3te) + 1, if teItR, the real-time constraint 0, otherwise (16) I CHEN/BASTANI: EFFECT OF ARTIFICIAL-INTELLIGENCE PLANNING-PROCEDURES ON SYSTEM RELIABILITY Then - Example Pr {mission succeeds = 1“ Let F,(t, planning method} rlTp=rpfp(tplplanning method) dtp, ‘ITP=tp I planning method) = gaufi(t, - p)/a] + Rh( t,;cr) f,(t, I planning method)dt, (18) E [gauf( (tR-tp+hh a - p ) / a ) - gauf( ( h h a - p ) / a ) ~ (20) Example 3’ Same as example 3, except: ( p , a) = (K,C, ) Notation K, C a constant cost of the solution path obtained using the given planning method The worst-case system reliability for A* occurs for Tp = T(k2‘ 3.1 Unique Solution Path + 1): Notation E T (1.9) (17) This models a system in which the time to execute a path has a normal (Gaussian) distribution with mean p and standard deviation a Hence, (18) becomes: This section illustrates this tradeoff by investigating some artificial-intelligence search procedures Specifically, we investigate the use of A * , which is known to be optimal [3],with some other search heuristics which, we show, can provide a better system reliability (hardware & software) under certain conditions Our intention is not to explore findings of heuristics which would lead to better real-time performance of search procedures [4]; rather, we are interested in identifying the conditions under which a search strategy can provide a better systemreliability over the others We restrict our analysis to a very simple problem-space so as to obtain tractable results h* (n) h(n) 367 minimum path cost from a node n to a goal node an estimate of h*(n) heuristic error, a non-negative number (usually small & positive) time duration to analyze each node It is well known that in the context of a graph problem, A* (or its corresponding algorithms such as IDA* [2])is optimal when its admissibility condition is satisfied That is, it is guaranteed that A* always finds a minimal cost (optimal) path to a goal when h (n) Ih* (n) for all nodes n However, in real situations, it is not practical to rely on the statement, “a heuristic always satisfies the inequality condition”, other than for trivial cases such as h ( n ) = Pohl [6] has analyzed the worst case for A* when - 4kt = exp[ - A,, (k2‘T+ T+ K,C) , if tR> T( k2‘ otherwise + 1) +K,C Lo9 (21) The interpretation of (21) is intuitively clear - in order for A* to satisfy the real-time constraint, tR On the other hand, if the pure Branch & Bound heuristic is used to guide the search, then in the worst case the real-time constraint is met when k < lOg[(tR - K,C + T ) / T ] - (23) The system reliability is: h*(n) - E I h(n) I h*(n) + E exp[ - Ah ( 2k+’T - T +K,C)] when the inequality is true In the problem space of infinite binary trees with unit cost on From this example ’ we see that strategies that not all the arcs of the search graph, Pohl concluded that, in the worst use heuristic functions are not necessarily worse than those that case, k2‘ nodes have to be visited before the unique goal use heuristics - especially if certain information is given a node, which resides at level k of the tree, can be located When priori For example, if the depth of the unique goal state is this result is compared with the worst case of other search known, then (23)can be used as a criterion for using the pure strategies (eg, pure Branch & Bound) which requires 2k+’ Branch & Bound heuristic to enhance the reliability of the 1, we can perform a worst case tradeoff analysis for the two system This is particularly important when the heuristic error, search strategies as follows E , is unknown In the infinite binary-tree problem-space when A* is used to search for the solution path, = (k2‘ )Tin the worst 3.2 Multiple Solution Paths case The f,( t, I planning method) can be any probability funcIn section 3.1, we assumed that there is a unique solution tion obtained from a large representative sample of problem inpath Here, we consider the possibility of multiple paths, all stances pertaining to a system + + I 368 IEEE TRANSACTIONS ON RELIABILITY, VOL 40, NO 3, 1991 AUGUST of which could lead to the same goal state Certainly, of these multiple solution paths, some are optimal whereas others are not Therefore, there is a difference in the quality of the solution Generally, the more time one invests in finding a solution, the more likely that solution is close to optimal However, investing more time might not be permitted in real-time and a near-optimal solution might be desired as a compromise between the real-time constraint and the maximum system-reliability This section illustrates this compromise by comparing two search strategies using two heuristics: A* and hill-climbing [8] A* is well known in the domain of artificial-intelligencesearch Hill-climbing represents an extreme case where search efficiency, rather than solution quality as used in A * , is used to guide the search for a solution path Notation chc cost of the solution path associated with hill climbing cost of the solution path associated with A* non-negative number indicating the degradation of the solution quality PrG nodes are expanded} CA Ehc PrG} The mean system-reliabilitycan be calculated by conditioningon Pr { Tp=j T ) : Assumptions for Hill Climbing The search space is an infinite binary tree (as in the analysis in section 3.1) Arcs no longer have the same weight Unlike A*, which uses arc weight as the search heuristic, hill climbing uses the remaining number of nodes to guide the search for a solution path To obtain maximum search efficiency at the expense of solution quality, the search is streamlined from level to level without checking whether there are other nodes at the same level that may lead to a more optimal solution path To ensure that there is at least one solution a All leaf nodes are solution nodes b The search tree has the monotonic and admissible property [41: h(n) The gain in search efficiency in hill climbing is associated with the decline in the quality of the solution: Ih * ( n ) (24) h ( n ) Ih ( n ' ) + C ( n , n ' ) (25) Notation tR -j T = j RhUT+ te;a) ( te1 planning method)dt, J U PrG} (27) The last equality in (27) follows from (18) Example Assumptions Pr G} is the uniform distribution over [k+1,2k+' 11 for A* and over [2,k+ 11 for hill climbing The hardware failure rate is a constant, Ah F,(te) is given by (19) r(A*) = e (AZ,u:.-2pA.X,)/2 any successor node of n n' C ( n , n ' ) actual distance between n and n' depth of an optimal solution-path in the tree k (2k+'-k-1) x "+'-' e-XJ' j=k+l Assumptions for A* [same as Hill Climbing assumptions 1,2 and 51 Thus, C ( n , n ' ) = forhillclimbingandC(n,n')' > OforA* Number of Nodes to Be Visited A Hill Climbing: k + in the worst case, in the best case, and ( k + ) / for the average (assuming a uniform distribution over [2, ( k ) /2] B A * : Between k + and k + 1- 1, and 2k k/2 for the average (assuming the same uniform distribution as in A) The worst-case upper bound occurs when the probability that the relative error exceeds some fixed positive quantity is greate1 than 1/2 [l] + + Example 4' Same as example 4, except: (p,,, U,,) = (KcC,,,O)where y = A * , hc - similar to example ' Then (28) & (29) simplify to: CHEN/BASTANI: EFFECT OF ARTIFICIAL-INTELLIGENCE PLANNING-PROCEDURES ON SYSTEM RELIABILITY ACKNOWLEDGME~T 2k+1-1 e - ’hKcCA* r(A*) = e-’hjr(Sj(A*) 2(2k+1-k-1) 369 + 1) This work was supported in part by the National Science Foundation under grant CCR-9110816 j=k+l (30) Sj(A*) = I 1, if fR>JT - 1, otherwise REFERENCES + K,CA* (31) e -’,,Kc( r(hc) = Sj(hC) = +ehC)CA k + 2k I e -’”‘T(sj(hC) 4- ) (32) j=2 1, if tR>jT+K,( +Ehc)CA* - 1, otherwise (33) Eq (30) - (33) imply: [l] N Huyn, R Dechter, J Pearl, “Probabilistic analysis of the complexity of A * ” , Artificial Intelligence, vol 15, 1980, pp 241-254 [2] R E Korf, “Depth-first iterative-deepening: an optimal admissible tree search”, Artijcial Intelligence, vol 27, 1985, pp 97-109 [3] J Pearl, “Some recent results in heuristic search theory”, IEEE Trans Pattem Analysis and Machine Intelligence, vol PAMI-6, 1984 Jan, pp 1-12 [4] J Pearl, Heuristics, 1984; Addison-Wesley [5] J Pearl, J H Kim, “Studies in semi-admissible heuristics”, ZEEE Tram Pattem Analysis and Machine Intelligence, vol PAMI-4, 1982 Jul, pp 392-399 [6] I Pohl, “First results on the effect of error in heuristic search”, Machine Intelligence, vol , 1970, pp 219-236 [7] N Viswanadham, V V S Sarma, M G Singh, Reliability of Computer and Control Systems, 1987; North Holland [8] P H Winston, Artijcial Intelligence, 2nd edition, 1984; Addison-Wesley If the real-time constraint can be satisfied for both A* and hill climbing, ie, for all j , S j ( A * ) = Sj(hc) = 1, then A* will have a better system reliability than that obtained from hill climbing, ie, exp[- Ah ( f p +&CA )] exp[ - Ah ( tp+ K, ( + eh,) CA.)] This is so because Ehc The advantage of hill-climbing strategy is that it has a higher probability of satisfying the real-time constraint as can be Seen by the condition required by it during the response computation period; however, this advantage disappears as Ehc becomes larger, ie, when: €hc > fR- (k+ ) T - (34) KccA* From these results, we see that there is a tradeoff between the system reliability and the satisfaction of the real-time constraint especially when the constraint, tR, is stringent AUTHORS Dr Ing-Ray Chen; Department of Computer and Information Science; Weir 302; University of Mississippi, University, Mississippi 38677 USA Ing-Ray Chen (S’86, M’90) received the BS from the National Taiwan University in 1978, and the MS & PhD in Computer Science from the University of Houston, University Park in 1985 & 1988 He is an Assistant Professor of Computer and Information Science at the University of Mississippi His research interests include distributed systems, fault-tolerant systems, performance & reliability evaluation, and application of artificial intelligence to industrial process-contro~ Dr Farokh B Bastani; Department of Computer Science; University of Houston; Houston Texas 77004 USA Farokh B Bastani (M’82): For biography, see IEEE Trans Reliability, vol 39, 1990 Jun Manuscript TR89-101 received 1989 July 10; revised 1990 April 2; revised 1991 February IEEE Log Number 00174 T RF ... illustrates the effect of mission time on the system reliability while example considers failures due to the use of heuristics Section illustrates the tradeoff between the response computation time and... Notation 2.1 Effect of Mission Time planning time execution time Assumptions f e ( f ) , F e ( t ) pdf, Cdf of te PC( t p ) = 1, ie, there are no software errors due to the reliability of component... path Notation chc cost of the solution path associated with hill climbing cost of the solution path associated with A* non-negative number indicating the degradation of the solution quality PrG