Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 23 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
23
Dung lượng
2,03 MB
Nội dung
Stable Adaptive Control and Estimation for Nonlinear Systems: Neural and Fuzzy Approximator Techniques Jeffrey T Spooner, Manfredi Maggiore, Ra´ l Ord´ nez, Kevin M Passino u o˜ Copyright 2002 John Wiley & Sons, Inc ISBNs: 0-471-41546-4 (Hardback); 0-471-22113-9 (Electronic) Chapter Neural 31 Networks and Fuzzy Systems Overview Few technologies have been used for such a vast variety of applications as neural networks and fuzzy systems They have been found to be truely interdisciplinary tools appearing in the fields of economics, business, science, psychology, biology, and engineering to name a few Based upon the structure of a biological nervous system, artificial neural networks use a number of interconnected simple processing elements (“neurons”) to accomplish complicated classification and function approximation tasks The ability to adjust the network parameters (weights and biases) makes it possible to “learn” information about a process from data, whether it is describing stock trends or the relation between an actuator input and some sensor data Neural networks typically have the desirable feature that little knowledge about a process is required to sucessfully apply a network to the problem at hand (although if some domain-specific knowledge is known then it can be beneficial to use it) In other words, they are typically regarded as a “black box” technique This approach often leads to engineering solutions in a relatively short amount of time since expensive system models required by many conventional approaches are not needed Of course, however, sufficient data is typically needed for effective solutions Fuzzy systems are intended to model higher level cognitive functions in a human They are normally broken into (1) a rule-base that holds a human’s knowledge about a specific application domain, (2) an inference mechanism that specifies how to reason over the rule-base, (3) fuzzification which transforms incoming information into a form that can be used by the fuzzy system, and (4) defuzzification which puts the conclusions from the inference mechanism into an appropriate form for the application at hand Often, fuzzy systems are constructed to model how a human performs a task They are either constructed manually (i.e., using heuristic 49 50 Neural Networks and Fuzzy Systems domain-specific knowledge in a manner similar to how an expert system is constructed) or in a similar manner to how a neural network is constructed via training with data While in the pa’& fuzzy systems were exclusively constructed with heuristic approaches we take the view here that they are simply an alternative approximator structure with tunable parameters (e.g., input and output membership function parameters) and hence they can be viewed as a “black box approach” in the sa’me way as neural networks can Fuzzy systems do, however, sometimes offer the additional beneficial feature of a way to incorporate heuristic information; you simply specify some rules via heuristics and tune the others using data (or even fine tune the ones specified heuristically via the da’ta) In other words, it is sometimes ea’sier to specify a good initial guess for the fuzzy system In this chapter we define some basic fuzzy systems and neural networks, in fact, ones that are most commonly used in practice We not spend time discussing the heuristic construction of fuzzy systems since this is treated in detail elsewhere In the next chapter we will provide a variety of optimization methods that may be used to help specify the parameters used to define neural networks and fuzzy systems 3.2 Neural Networks The brain is made up of a huge number of different types of neurons interconnected through a complex network A typical neuron is composed of an input region, which contains numerous small branches called dendrites The neuron body contains the nucleus and other cell components, while the axon is used to transmit impulses to other cells (see Figure 3.1) When an impulse is received at the voltage-sensitive dendrite, the cell membrane becomes depolarized If this potential reaches the cell threshold potential, via pulses received at possibly many dendrites, an action potential which lasts only a millisecond or two is triggered and an impulse is sent out to other neurons via the axon The magnitude of the impulse which is sent is not dependent upon the magnitude of the voltage potential which triggered the action In our model of an artificial neuron, as is typical, we will preserve the underlying structure, but will make convenient simplifications in the actual functional description of its action We will, for example, typically assume that the magnitude of the output is dependent upon the magnitude of the inputs Also, we will not assume the inputs and outputs are impulses Instead, a smoothly varying input will cause a smoothly varying output The brain consists of a network of various neurons through which impulses are transmitted, from the ‘axon of one neuron to the dendrites of another The impulses may be fed back to previous cells within the network Artificial neural networks in which information may be fed back to Sec 3.2 Neural Networks 51 dendrites Figure 3.1 Simple representation of a neuron previous neurons are called recurrent neural networks, whereas networks in which information is allowed to proceed in a single direction are called feedforward neural networks Examples of recurrent and feedforward neural networks are shown in Figure 3.2 where each circle represents a neuron, and the lines represent the transmission of signals along axons To simplify analysis, we will focus on the use of feedforward neural networks for the estimation and control schemesin the chapters to follow t Figure = neuron = axon 3.2 Examples of recurent and feedforward neural networks The input vector to the neuron is x = [zr , ,z,]~, where zi, i < 72, is the ith input to the neuron Though in a biological system zi represents a ’ voltage caused by an electrochemical reaction, in this artificial framework, Neural 52 21 may be a variable representing, or pressure sensor Figure 3.3 Networks and Fuzzy Systems for example, the value for a temperature Schematic of a neuron and artificial mathematical represen- tation Figure 3.3 shows a biological neuron along with our mathematical representa’tion We will consider two distinct mathematical operations within our model First, the neuron inputs will undergo an input mapping in which each of the dentrite inputs is combined After the input mapping, the signal is passedthrough an activation function to produce the output 3.2.1 Neuron Input Mappings The neuron input mapping takes a8 vector of inputs, II: E R”, and transforms these into a8 scalar, denoted by s The input mapping is dependent upon a which are selected according to some vector of weights w = [WI, , wnIT, past knowledge and may be allowed to change over time based upon new neural stimuli Using both the weights and inputs, a mapping is performed to yua.ntify the relation between w and x This relationship, for example, may describe the “colinearity” of w and x (discussedbelow) We will denote the input mapping by s=wax A number of input mappings have been used in the neural network literature, the most popular being the inner product and Euclidean input mappings Another useful but not as widely used input mapping is the weighted average Sec 3.2 Neural Networks 53 Inner Product: The inner product input mapping (also commonly refered to as the dot product), ma,y be considered to be a measure of the similarity between the orientation of the vectors w and x, defined by s=w@x=wTx (3.1) This may also be expressed as s = lw 11 1cos8 where is the angle between x w and x Thus, if w and x are orthogonal, then s(x) = Similarily, the inner product increases as x and w become colinear (the angle decreases and at = 0, x and w are colinear) We will find that for each of the techniques in this book, it is required that we know how a change in the weights will affect the output of the neural network To determine this, the gradient of each neuron output with respect to the weights will be calculated For the standard inner product input mapping, we find that 6% -=x 8W T (3.2) Notice that the gradient for the inner product input mapping with respect to the weights is simply the value of the inputs themselves Average: An input mapping that is closely related to the inner product is the weighted average defined by Weighted s=w@x= WTX c ;=I Xj (3.3) Geomet,rically speaking, the weighted average again determines to what degree the two vectors are colinear To ensure that the input mapping is well defined for all x (even x = 0), we may chooseto rewrite (3.3) as (3.4) where y > For this weighted average, the partial derivative of the neuron ouptut with respect to the weights is expressed as (3.5) This is again similar to the inner product, with the addition of the normalizing term in the denominator Euclidean: The Euclidean input mapping is defined by s = w ox = Iw - xl = JT - x) (w (w (3.6) Neural 54 Networks and Fuzzy Systems This mapping will result in s since it is the Euclidean norm of the vector w - x (recall that norms are non-negative) The Euclidean input mapping has the following gradient with respect to the weights: dS dW x-w Iw - XI (3.7) - Euclidean norm ii Figure 3.4 Graphical representation of the inner product and Euclidean input mappings A geometrical intrepretation for the inner product and Euclidean input mappings is shown in Figure 3.4 Notice that asw and x become orthogonal, the inner product vanishes while the Euclidean norm increases Adding a Bias Term: Along with the input vector 2, an artificial neuron is often also provided a constant bias as an input so that now II;’ = T with x E R”? We have chosen the bias to be since the [1,x1, - * A] input weight associated with the bias may be chosen to arbitrarily scale this value That is, the weight vector is now changed to w = [wo, Wl, - - - %I T with wg corresponding to the bias input The same neuron input mappings afsdescribed above apply to the case where a bias term is used 3.2.2 Neuron Activation Functions According to Figure 3.3, a.fter the input mapping, the neuron produces an output using an activation function This activation function transforms the value produced by the input mapping to a value which is suitable for another neuron, or possibly a value which may be understood by an external system (e.g., as an input to an actuator) A function Q : R -+ R which maps an input mapping to Definition 3.1: R is said to be an activation function if it is piecewise continuous Sec 3.2 Neural 55 Networks Definition 3.2: A function $ : R + R is said to be a squashing function if it is an activation function, limstoo $(x) = 1, and lim,+Wa $(x) = or functions for artificial neurons may be either bounded A bounded activation function is one for which I$(s)l k, where Ic E R is a finite constant, and thus it has the property that even if s + 00, one is still ensured that g(s) E L, For an unbounded activation function, on the other hand, Y/J(S) L, does not hold for all s E Activation unbounded Definition 3.3: A function $J : D -+ R is said to be unipolar on D unipolar) or if $(x) _< (negative if ii,(x) for all x E D (positive unipolar) for all x E D With $J() continuously differentiable, the change in the neuron output with respect to the neuron weights may be obtained from the chain rule as (3.8) This formula will become useful when determining how to adjust the neuron weights sothat the output of the squashing function changesin somedesired ma’nner A few of the more commonly used activation functions will now be defined Threshold: The threshold function (or Heavyside function) is one of the original activation functions studied by McCulloch and Pitts It is defined bY s>o (3.9) otherwise Since $(s) is not continuous for the threshold function, d$/ds is not well defined for all s E R Though we will not use the threshold activation function in our adaptive schemes(because its derivative is not well defined), it will prove valuable in the function approximation proofs in Chapter Linear: The simplest of the activation functions is a linear mapping from input to output defined by (3.10) tic s> =s This is a monotonic, unbounded activation function The gradient of the linear activation function is simply w = We will see that linear activa8tion functions are often used to generate the outputs of multi-layered neural networks Saturated Linear: A variant of the linear activation function is the saturated linear activation function, defined as $+) = saw, (3.11) where ifx> 1, if -l A frequently used monotonic, unipolar squashing function is the sigmoid In general a sigmoid is any such “s-shaped” function, and is often specified as Sigmoid: +( s > = I+ exp(-2s) ’ (3.14) The gradient of this function is a$!+>/as = 2$+)(1 - $+I)Note that the gradient is defined in terms of q(e) itself Radial Basis: One of the most popular non-monotonic activation functions is the radial basis function, commonly defined by $(s) = exp(-s2/y2), (3.15) where y E R Neural networks composed entirely of radial basis functions have been used in a number of applications owing to their mathematical properties Their function approximation properties will be discussed in Chapter Others: There are numerous other activation functions which may be used They tend to be limited only by one’s imagination and practical implementations Even though we can define and use rather sophisticated activation functions, the ones presented above tend to be sufficient We will discuss this point in more detail in Chapter during a discussion on universal approximation capabilities of neural networks To give you an idea of other possibilities for activation functions consider s dJ( >=I- S + (3.16) I4 Sec 3.2 Neural 57 Networks 01 Sn Q( where n > Additional homework exercises n l s > =l+,s,nSRn activation (6> functions will be considered (3.17) in the 3.1 Consider a neuron defined by g(s) = tanh(s), with an inner product input mapping and with = [l, 21T What set of weights, w = [wr, wslT, will cause $(s) = O? This is equivalent to finding w1 and wz such that s = wr + 2~2 = Any point along the line w1 = -2~2 will satisfy this Notice that with two adjustable weights, an infinite number of choices for wr and w2 exist which cause A T)(s) = Example 3.2.3 The Mulitlayer Perceptron The most common neural network found in applications today is the feedmultilayer perceptron (MLP), also known simply as the multilayer feedforward neural network It is a collection of artificial neurons in which the output of one neuron is passed to the input of another The neurons (or nodes) are typically arranged in collections called layers, such that the output of all the nodes in a given layer are passedto the inputs of the nodes in the next layer An input layer is the input vector x, while the output layer is the connection between the neural network and the rest of the world A hidden layer is a collection of nodes which lie between the input and output layers as shown in Figure 3.5 Without a hidden layer, the MLP reduces to a collection of n neurons operating in parallel forward )Il / x1 Figure 1,’ / \ 2, I “\ output layer ‘1 hidden layer -,s -; ~ ,/;,:I ’ ’ ,;- /,/ \: ,, \ , A’‘< /’ _- -,-’ ,_- /// ‘ /‘\ ’ _A’ _ ),‘- - +, ,, _ I -; /’ ‘ \ 1: \ _ ,‘._ ‘,, \(, / +4’\- j input layer \ , \ / \ / , , ‘j / /’ / / ,,’ b x2 / x x3 X n 3.5 Schematic of a mulitlayer perceptron Neural 58 Networks and Fuzzy Systems In a fully connected MLP, each neuron output is fed to each neuron input within the next 1a)yer If we let be a vector of all the adjustable parameters in the network (weights and biases, and sometimes the parameters of the activation functions), then we denote the input-ouptut mapping of the MLP by F(x,O) To better understand the functional form of this mapping, consider the following example: 3.2 Consider the MLP with one hidden layer consisting of nodes with activation functions defined by &(s) for the j’” hidden node and a single output node defined using a linear activation function The input-output mapping for this MLP is defined by Example where = [d,q, ,cq, bl, ,b4, w11, ,wnglT is a vector of adjustable parameters Notice that each bj and d are biaseswhich were included with each node There are q neurons in the hidden layer, n inputs, and one output in this neural network A Within a multilayer perceptron, if there are many layers and many nodes in each layer, there will be a large number of adjustable parameters (e.g., weights and biases) MLP’s with several hundred or thousands of adjustable weights are common in complex real-world applications 3.2.4 Radial Basis Neural Network A radial basis neural network (RBNN) is typically comprised of a layer of radial basisactivation functions with an associated Euclidean input mapping (but there are many ways to define this classof neural networks) The output is then ta)ken as a linear activation function with an inner product or weighted average input mapping A RBNN with two inputs and nodes is shown in Figure 3.6 The input-output relationship in a RBNN with x = [xl, , x,lT as an input is given by (3.19) where = [wr , ,w.,]~ when an inner product mapping is used within the output node Typically, the values of the vectors ci, i = 1, , m and the scalar y are held fixed, while the values of are adjusted so that the mapping produced by the RBNN matches some desired mapping Because the adjustable weights appear linearly, we may express (3.19) as Y-(x,0) = eTc(x>, (3.20) Sec 3.2 Neural Networks Figure 59 3.6 Radial basis neural network with nodes where [i = exp(-Ix - ci]“/y”) Wh en a weighted average mapping is used in the output node, the RBNN becomes qx,e> = c z1 Wi exp(-12 CL1 w(-Ix - Ci[2/y2) - Ci12/Y2) (3.21) ’ *whichmay again be expressed as.?(x, 0) = OTC(x), now with t3 = [WI, , w,lT and exp(-12 - c$/y”) (3.22) zl exp(-lx - Ci12/y2)’ c The Gaussian form of the activation functions lets one view the RBNN as a weighted avera,ge when using (3.21), where the value of wi is weighted heavier when x is close to ci Thus the input space is broken into overlapping regions with centers corresponding to the ci’s as shown in Figure 3.6 ) Linguistic variables change over time and hence take on specific linguistic values (typically adjectives) For instance, “speed” is “small” or “speed” is “large.” The “linguistic rules” listed above are those gathered from a human expert 3.3 As an example of a set of linguistic rules, for a cruise control example suppose that p = 3, n = 1, and ~1 is the error between the desired speed q&t) and the actual (sensed) speed w(t) (i.e., x1 (t) = vd(t) - v(t)) In this case, a word description of the fuzzy controller input variable xi(t) could be “speed-error” so that the linguistic variable is 51 =“speed-error.” Suppose that there are three linguistic values for the speed-error linguistic variable and that these are “positive,” “zero,” and “negative.” Suppose that the output of the fuzzy controller is the change in throttle angle that we denote as y(t) and use a linguistic variable g = “change-in-throttle.” Suppose it ha*slinguistic variables “increase,” (‘stay-the-same,” and “decrease.” Example 62 Neural The three rules, in the rule-base, If If If RI : RZ : R3 : Networks and Fuzzy Systems of the form listed above would be (?I is Ffl) (2, is lt;F2) (21 is Pf3) ($j is G’l) (:L/ is Ga2) (5 is GQ) Then Then Then or, more specifically, RI : Ra : R3 : (51 is “positive”) (21 is “negative”) (21 is “zero”) If If If (9 is “increase”) ($ is “decrease”) (G is “stay-the-same”) Then Then Then Rule RI says that if the vehicle is traveling at a speed less than the desired speed, then increase the amount of throttle (i.e., make g(t) positive) Rule Ra says that if the vehicle is traveling at a speed greater than the desired speed, then decreasethe amount of throttle ( i.e., make y (t> negative) Rule R3 says that if the actual speed is close to the desired speed then not move the throttle angle (i.e., let y(t) = 0) A To apply the knowledge represented in the linguistic rules we further quantify the meaning of the rules using fuzzy sets and fuzzy logic In particular, using fuzzy set theory, the rule-base is expressed as a set of fuzzy implications RI : R, : If If (@ (F$’ and - - - and F:) Then and - - - and Fk) Then Gal G”p, where Ft and Ga are fuzzy sets defined by Fb(2 = Ga = {(%cLF+b)) {b#Ga(d) : Xb E R) : Y E R} - (3.23) (3.24) The membership functions, PF,“, ~.&cL [O, I] describe how sure one is of E a8particulalr linguistic statement For example, ,QF,” quantifies how well the linguistic variable Ir;b, that represents xb, is described by the linguistic value Pt There are many ways to define membership functions [170] For instance, Ta,bles 3.1 specifies triangular membership functions with center c a,nd width w, and it specifies Gaussian membership functions with center c a’nd width It is good practice to sketch these six functions, labeling all aspects of the plots 3.4 Continuing with the above cruise control example we could quantify the meaning of each of the rule premises and consequents Example Sec 3.3 Fuzzy 63 Systems Triangular f 1,/ \ Iu-p) = \ Left Centers ifx What the the implications if we define the neural network such that I&F/aQI > for all J: (rather than lEKF/%( > 0) 3.5 (Fuzzy Cruise Control) Consider the cruise control problem where the vehicle speed is governed by Exercise mti = -Au” + u, (3.39) where m = 1200kg is the mass of the vehicle, A = 0.4Nm2/s2 is the aerodynamic drag, v is the vehicle speed in m/s, and u is the input force If v is measurable and r is the desired vehicle speed, then define a, rule base which could be used to regulate the vehicle speed so that v = r Simulate the closed-loop system Try to adjust the output membership functions to improve the system performce 3.6 (Fuzzy Control for an Inverted Pendulum) Consider the simple problem of balancing an inverted pendulum on a cart Let y denote the angle that the pendulum makes with the vertical (in Exercise Sec 3.5 Exercises and Design Problems 71 radians), I be the half-pendulum length (in meters), force input that moves the cart (in Newtons) Use r desired angular position of the pendulum The goal is pendulum in the upright position (i.e., T = 0) when it with some nonzero angle off the vertical (i.e., y # 0) the inverted pendulum shown is given by ij = : Ix 9.8 sin(y) + cos(y) [ -“-0.‘;5i2 and u be the to denote the to balance the initially starts One model for sin(Ii’] 0.5 [$ - cos2(y)] -lOOil+ (3.40) 1oou The first-order filter on u to produce ti represents an actuator In your simulations let the initial condition be y(0) = 0.1 radians, g(O) = 0, and y(O) = (a) Develop a fuzzy controller with e = r-y and e as inputs, the minimum operator to represent both the “and” in the premise and the implication, and COG defuzzification Simulate the closedloop system and plot the output y and input u to demonstrate that your fuzzy controller can balance the pendulum (b) Repeat (a) for the case where you use product to represent premise and implication and center-average defuzzification the ... an input so that now II;’ = T with x E R”? We have chosen the bias to be since the [1,x1, - * A] input weight associated with the bias may be chosen to arbitrarily scale this value That is, the... gradient with respect to the weights: dS dW x-w Iw - XI (3.7) - Euclidean norm ii Figure 3.4 Graphical representation of the inner product and Euclidean input mappings A geometrical intrepretation... vectors are colinear To ensure that the input mapping is well defined for all x (even x = 0), we may chooseto rewrite (3.3) as (3.4) where y > For this weighted average, the partial derivative of the