704 5 Safety and Risk i n Engineering Design close to reality) and associative (i.e. include typ ical profiles) but not descriptive. Ex- amining the artificial neural network itself only shows meaningless numeric values. The ANN model is fundamentally a black box. On the other hand, being contin- uous and derivable, one can explore ANN models beyond simple statistical inter- rogation to determine typical profiles, explicative variables (network inputs), and apply example data to determine their associated probabilities. Artificial neural net- works have the ability to account for any functional dependency by discovering (i.e. learning and then modelling) the nature of the dependency without needing to be prompted. The process goes straight from the data to the model without intermedi- ary interpretation or problem simplification. There are no inherent conditions placed on the predicted variable, which can be a yes/no output, a continuous value, or one or more classes among n,etc.However,artificial neural networks are insensitive to unreliability in the data. Artificial neural networks have been applied in engineering design in predictive modelling of system behaviour using simulation augmented with ANN model in- terpolation (Chryssolouris et al. 1989), as well as in interpolation of Taguchi robust design points so that a full factorial design can be simulated to search for optimal design parameter settings (Schmerr et al. 1991). An artificial neural network is a set of elements (i.e. neurodes or, more com- monly, neurons) linked to one another, and that transmit information to each other through connected links. Example data (a to i) are given as the inputs to the ANN model. Various values of the data are then transmitted through the connections, be- ing modified during the process until, on arrival at the bottom of the network, they have become the predicted values—for example, the pair of risk pr obabilities P1 and P2 indicated in Fig. 5.53. a) The Building Blocks of Artificial Neural Networks Artificial neural networks are highly distributed interconnections of adaptive non- linear processing elements (PEs), as illustrated below in Fig. 5.54. The connection strengths, also called the network weights, can be adapted so that the network’s output matches a desired response. A m ore detailed view of a PE is showninFig.5.55. An artificial neural network is no more than an interconnection of PEs. The form of the interconnectionprovides one of the key variables for dividingneuralnetworks into families. The mostgeneralcase is the fully connectedneuralnetwork.By defini- tion, any PE can feed or receive activations of any other, including itself. Therefore, when the weights are represented in matrix form (the weight matrix), it will be fully populated. A (6×6) PE fully connected network is presented in Fig. 5.56. This network is called a recurrent network. In recurrent networks, some o f the connections may be absent but there are still feedback connections. An input pre- sented to a recurrent network at time t will affect the networks output for future time steps greater than t. Therefore, recurrent networks need to be operated over time. If the interco nnection matrix is restricted to feed-forward activations (no feedback 5.3 Analytic Development of Safety and Risk in Engineering Design 705 Fig. 5.53 Schematic layout of a complex artificial neural network (Valluru 1995) Fig. 5.54 The building blocks of artificial neural networks, where σ is the non-linearity, x i the output of unit i, x j the input to unit j,andw ij are the weights that connect unit i to unit j Fig. 5.55 Detailed view of a processing element (PE) nor self connections), the ANN is defined as a feed-forward network. Feed-forward networks are instantaneous mappers, i.e. the output is valid immediately after the presentation of an input. A special class of feed-forward networks is the layered 706 5 Safety and Risk in Engineering Design Fig. 5.56 A fully connected ANN, and its weight matrix Fig. 5.57 Multi-layer percep- tron structure class,alsotermedamulti-layer perceptron (MLP). This describes a network that consists of a single layer of non-linear PE s without feedback connections. Multi- layer perceptrons have PEs arranged in layers whereby the layers that receive input are called the input layers, layers in contact with the outside world are called output layers, and layers without direct access to the outside world, i.e. connected to the input or output, are called hidden layers (Valluru 1995). The weight matrix of a multi-layer perceptron can be developed as follows (Figs. 5.57 and 5.58): from the example MLP in Fig. 5.57, the input layer contains PEs 1, 2 and 3, the hidden layer contains PEs 4 and 5, and the output layer contains PE 6. Figure 5.58 shows the MLP’s weight matrix. Most entries in the weight matrix of an MLP are zero. In particular, any feed-forward network has at least the main diagonal, and the elements below it populated with zeros. Feed-forward neural net- works are therefore a special case of recurrent n etworks. Implementing partially connected topologies with the fully connected system and then zeroing weights is inefficient but is sometimes done, depending on the requirements for the artificial neural network. A case in point would be the weight matrix of the MLP below: 5.3 Analytic Development of Safety and Risk in Engineering Design 707 Fig. 5.58 Weight matrix structure for the multi-layer perception b) Structure of Artificial Neural Networks A basic artificial neural network (ANN) structure thus consists of three layers: the input layer, the hidden layer, and the output layer, as indicated in Fig. 5.59 (Haykin 1999). This MLP works in the following manner: for a given input vector [(x 0 ) \vec]={a 0 , a i } (5.104) the following output vector is computed [(o 0 ) \vec]={c 0 , c i } (5.105) The ANN im plements the function f where f([(x 0 ) \vec]) = [(o 0 ) \vec] (5.106) The basic processing element (PE ) group of the MLP is termed the artificial perceptron (AP). The AP has a set of input connections from PEs of another layer, as indicated in Fig. 5.60 (Haykin 1999). Fig. 5.59 Basic structure of an artificial neural network 708 5 Safety and Risk in Engineering Design Fig. 5.60 Input connections of t he artificial perceptron (a n ,b 1 ) a0 a1 b1 an Wan b1 Wa1 b1 Wa0 b1 An AP computes its output in the following fashion: the output is usually a real number, and is the function of the activation, z i ,where b i = σ i (Z i ) (5.107) The activation is computed as follows z i = σ j w j a j (5.108) δ = the activation function There are many different activation functions ( σ ) in use. ANNs that work with binary vectors usually use the step-function: σ (z)=1 z ∈[ θ ,∞) else 0 (usually θ = 0) These activation functions ( σ ) are called threshold logic units (TLUs), as indi- cated in the binary step-function illustrated in Fig. 5.61. Fig. 5.61 The binary step- function threshold logic unit (TLU) 1 z σ(z) θ 5.3 Analytic Development of Safety and Risk in Engineering Design 709 Fig. 5.62 The non-binary sigmoid-function threshold logic unit (TLU) 1 z σ(z) Graphic examples of threshold logic units (TLU) (Fausett 1994): Non-binary ANNs often use the sigmoid function as activation function where the parameter ρ determines the shape of the sigmoid, as indicated in Fig. 5.62 and in Eq. 5.109 σ (z)=[1/(1 + e z/p )] (5.109) The most significant advantage of an MLP is that the artificial neural network is highly parallel. The MLP is also robust in the presence of noise (i.e. deviations in input) where a small amount of noise will not drastically affect the output. Fur- thermore, it can deal with unseen output, through generalisation from the learned input-output combinations. The threshold function ensures that the activation value will not go beyond certain values (generally, between 0 and 1) and prevents against catastrophic evolutions (loop effect where values become higher and higher). c) Learning in Artificial Neural Networks The basic operation o f each AP is to multiply its input values by a weight (one per input), add these together, place the result into a threshold function, and then send the result to the neurodes downstream in the following layer. The learning mechanism of artificial neural networks is as follows: each set of example data is input to the ANN, then these values are propagated towards the output through the basic operation of each AP. The prediction obtained at the ANN’s ou tput(s) is most prob a blyerroneo us, espe- cially at the beginning. The error value is then computed as the difference between the expected value, and the actual output value. This error value is back-propagated by going upwards in the network and modifying the weights proportionally to each AP’s contribution to the total error value. This mechanism is repeated for each set of example data in the learning set, while performance on the test set improves. This learning mechanism is called error back propagation. The method is not unique to artificial neural nets, and is a general method (i.e. gradient method) appli- cable to other evolutionary computation objects. 710 5 Safety and Risk in Engineering Design Fig. 5.63 Boolean-function input connections of the artificial perceptron (a n ,o 0 ) Table 5.26 Boolean-function input values of the artificial perceptron (a n ,o 0 ) a 0 a 1 zo 0 0000 0110 1010 1121 For example, consider the input connections of the AP of an artificial neural network implementing the Boolean AND function ( θ = 2), as illustrated in Fig. 5.63 (Haykin 1999). Consider all the possible values of the ANN implementing the Boolean AND function ( θ = 2) for a 0 , a 1 , z,ando 0 . The two-dimensional pattern space of the AP can now be developed according to the values given in Table 5.26. This is illustrated in Fig. 5.64. The TLU groups its input vectors into two classes, one for which the output is 0, the other for which the output is 1. The pattern space for an n input unit will be n-dimensional (Fausett 1994). If the TLU uses threshold θ , then for the [(x 0 )\vec] input vector, the output for the decision plane ∑ ∀i w i a i ≥ θ will be 1, else 0. The equation for the decision plane is ∑ ∀i w i a i = θ , which is a diagonal line, as illustrated in Fig. 5.64. Thus, in the case of the previous example: w 0 a 0 + w 1 a 1 = θ ⇔ a 1 = −(w 0 /w 1 ) ·a 0 +( θ /w 1 ) Learning rules Several learning rules are used to train threshold logic units (TLUs), such as the gradient descent technique and the delta learning rule. Fig. 5.64 Boolean-function pattern space and TLU of the artificial perceptron (a n ,o 0 ) 5.3 Analytic Development of Safety and Risk in Engineering Design 711 Fig. 5.65 The gradient de- scent technique Suppose y is a function of x (y = f (x)), f(x) is continuous, and the derivative dy/dx can be found at any point. However, if no information is available on the shape of f(x), local or global minimums cannot be found using classical methods of calculus. The slope of the tangent to the curve at x 0 is [ dy/ dx] x 0 . For small values of Δx, Δy can be approximated using the expression y 1 −y 0 =[dy/ dx] x 0 (x 1 −x 0 ) (5.110) where: Δy = y 1 −y 0 Δx = x 1 −x 0 . Let: Δx = dy/dx · α ⇒ Δy = α (dy/dx) 2 where: α is a small parameter not to overshoot any minimums or maximums. Starting from a given point (x 0 ) in Fig. 5.65, the local minima of the function f(x) can be found by moving down the curve (Δx = dy/ dx · α ), until Δy becomes neg- ative (at that point, the curve has already started moving away from the local min- ima). This technique is termed the gradient descent. The gradient descent technique is used to train TLUs. d) Back Propagation in Artificial Neural Networks Consider the ANN of Fig. 5.66. Assume the neurodes are the TLUs α (x)=0, ∀x (Haykin 1999). The back-propagation (BP) algorithm accounts for errors in the output layer us- ing all the weights of the ANN. Thus, if a TLU in the output layer is off, it will change weights not only between the hidden and output layer but also between the input and hidden layer. The BP algorithm uses the delta learning rule expressed as Δw i = α (t j −z j ) ·a ji (Δx = dy/dx α ). 712 5 Safety and Risk in Engineering Design Fig. 5.66 Basic structure of an artificial neural network: back propagation If the training set consists of the following pairs for the TLU: [(x j )\vec],t j , j = 0, n and [(x j )\vec]=a j0 , a jm then the error for each pair is defined as E j = 1 2 (t j −o j ) 2 · j (5.111) The total er ror for the training set is E = ∑ ∀j E j (5.112) where E j ∀j is a function of the weights connected to the TLU. Thus, for all possible weight vectors,there exists an error measure (E)foragiven training set. However, since the activation function is a step function, the error mea- sure would not be a continuous function. The value o j must be changed to z j in the definition of the error E p , which means that the activation level is used, rather than the produced output to compute the error. This yields a continuous function E j = 1 2 (t j −z j ) 2 j (5.113) It can be shown that the slope of E j with respect to the ith weight is: −(t j −z j ) a ji ; the delta learning rule is thus expressed as Δw i = α (t j −z j )a ji (Δx = dy/dx α ) (5.114) when workin g with the jth training pair. Thus, for a training set defined as: [(x j )],t j , j = 0 , ,m, x j = x j0 , ,x jn , and t j = t j0 , ,t jn 5.3 Analytic Development of Safety and Risk in Engineering Design 713 i) Compute the output of the hidden layer using x j . ii) Compute the output of the output layer using the output of the hidden layer (b 0 b n ). iii) Calculate the error for each output node. For the kth output node: δ k =(t jk −z k ),wherez k is the activation of the kth output node. iv) Train the output nodes using the delta rule, assuming mth hidden node, kth out- put node is: Δw bmck = αδ k b m . v) Calculate the error for each hidden node. For the mth hidden node: δ m = ∑ k=1 n δ k w bmck where δ k is the computed error for the kth output node. vi) Train hidden nodes using the delta rule (assuming the hth input node, lth hidden node): Δw ahbm = α m x jh . These steps are repeated foreach training vector, until the ANN producesacceptable outputs for the input vectors. e) Fuzzy Neural Rule-Based Systems The basic advantage of neural networks is that the designer does not have to pro- gram the system. Take, for example, a complex ANN of which the input is an n×n bitmap, which it recognises as the process equipment model (PEM) on the AIB blackboard (assuming the ANN is capable of distinguishing between a vessel, tank and container,then the input layer has n 2 nodes, and the outputlayer has three nodes, one for each PEM). In the ideal case, the designer does not have to write any code specific, and simply chooses an appropriate ANN model and trains it. The logic of each PEM is encoded in the weights and the activation functions. However, artificial neural networks also have their drawbacks. They are funda- mentally black boxes, whereby the designer does not know what part of a large designed network is responsible for a particular part of the computed output. Thus, the network cannot be modified to improve it. ANN models are good at reaching decisions based on incomplete information (i.e. if the input vector does not match any of the training vectors, the network still computes a reasonable output in the sense that the output will probably be close to the output vector of a training vector that, in turn, is close to the input). Fuzzy rule-based systems are good at dealing with imprecise information. However, determining their membership func tions is usually difficult. The fuzzy rule-based neural network basically makes up a membershipfunction based on training vectors. Consider for example, the fuzzy rules (Valluru 1995): R 1 :IFx is F 1 THEN z is H 1 R 2 :IFx is F 2 THEN z is H 2 and R n :IFx is F n THEN z is H n . To teach this rule-base to an ANN, the training pairs are: ((F 1 ,H 1 ) (F n ,H n )). . indi- cated in the binary step-function illustrated in Fig. 5.61. Fig. 5.61 The binary step- function threshold logic unit (TLU) 1 z σ(z) θ 5.3 Analytic Development of Safety and Risk in Engineering. the presentation of an input. A special class of feed-forward networks is the layered 706 5 Safety and Risk in Engineering Design Fig. 5.56 A fully connected ANN, and its weight matrix Fig. 5.57 Multi-layer. technique and the delta learning rule. Fig. 5.64 Boolean-function pattern space and TLU of the artificial perceptron (a n ,o 0 ) 5.3 Analytic Development of Safety and Risk in Engineering Design