where u is the neuron output, u∈ 1 ; f is the nonlinear function (transfer function); W is the weighting matrix, [ ] kk wwwwW 1111211 ,,,, − =K , W∈ 1 × k ; v is the input vector (performance variables), v∈ k ; B 1 is the bias variable. It should be emphasized that W and B 1 are adjusted through the training (learning) mechanism. For a single-layer neural network of z neurons, one has ( ) BWvfu += , where the weighting matrix and bias vector are W∈ z × k and B∈ z . For multi-layer neural network of z neurons, one can find the following expression for the (i + 1) network outputs ( ) 1,2,,1,0, 1111 −−=+= ++++ MMiBuWfu iiiii K , where M is the number of layers in the neural network. For example, for three-layer network, we have ( ) 2, 32333 =+= iBuWfu , ( ) 1, 21222 =+= iBuWfu , and ( ) 0, 1111 =+= iBvWfu . Hence, one obtains ( ) ( ) ( ) [ ] 32111223332333 BBBvWfWfWfBuWfu +++=+= , where the corresponding subscripts 1, 2 and 3 are used to denote the layer variables. To approximate the unknown functions, weighting matrix W and the bias vector B must be determined, and the procedure for selecting W and B is called the network training. Many concepts are available to attain training, and the backpropagation, which is based upon the gradient descent optimization methods, are commonly used. Applying the gradient descent optimization procedure, one minimizes a mean square error performance index using the end-to-end neural network behavior. That is, using the inputs vector v and the output vector c, c∈ k , the quadratic performance functional is given as ( ) ( ) j p j T j p j jj T jj QeeucQucJ ∑∑ == =−−= 11 , where jjj uce −= is the error vector; Q∈ p × p is the diagonal weighting matrix. The steepest descent algorithm is applied to approximate the mean squire errors, and the learning rate and sensitivity have been widely studied for the quadratic performance indexes. References [1] P. J. Antsaklis and K. M. Passion (eds.), An Introduction to Intelligent and Autonomous Control, Kluwer Academic Press, MA, 1993. [2] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, Upper Saddle River, NJ, 1999. © 2001 by CRC Press LLC . and the learning rate and sensitivity have been widely studied for the quadratic performance indexes. References [1] P. J. Antsaklis and K. M. Passion (eds.), An Introduction to Intelligent and Autonomous. ) j p j T j p j jj T jj QeeucQucJ ∑∑ == =−−= 11 , where jjj uce −= is the error vector; Q∈ p × p is the diagonal weighting matrix. The steepest descent algorithm is applied to approximate the mean squire errors,. approximate the unknown functions, weighting matrix W and the bias vector B must be determined, and the procedure for selecting W and B is called the network training. Many concepts are available to