Neural Network Based Equalization In this chapter, we will give an overview of neural network based equalization. Channel equalization can be viewed as a classification problem. The optimal solution to this classifi- cation problem is inherently nonlinear. Hence we will discuss, how the nonlinear structure of the artificial neural network can enhance the performance of conventional channel equalizers and examine various neural network designs amenable to channel equalization, such as the so- called multilayer perceptron network [236-2401, polynomial perceptron network 1241-2441 and radial basis function network 185,245-2471. We will examine a neural network structure referred to as the Radial Basis Function (RBF) network in detail in the context of equaliza- tion. As further reading, the contribution by Mulgrew [248] provides an insightful briefing on applying RBF network for both channel equalization and interference rejection problems. Originally RBF networks were developed for the generic problem of data interpolation in a multi-dimensional space 1249,2501. We will describe the RBF network in general and motivate its application. Before we proceed, our forthcoming section will describe the dis- crete time channel model inflicting intersymbol interference that will be used throughout this thesis. 8.1 Discrete Time Model for Channels Exhibiting Intersymbol Interference A band-limited channel that results in intersymbol interference (ISI) can be represented by a discrete-time transversal filter having a transfer function of n=O where fn is the nth impulse response tap of the channel and L + 1 is the length of the channel impulse response (CIR). In this context, the channel represents the convolution of 299 Adaptive Wireless Tranceivers L. Hanzo, C.H. Wong, M.S. Yee Copyright © 2002 John Wiley & Sons Ltd ISBNs: 0-470-84689-5 (Hardback); 0-470-84776-X (Electronic) 300 CHAPTER 8. NEURAL NETWORK BASED EOUALIZATION t Figure 8.1: Equivalent discrete-time model of a channel exhibiting intersymbol interference and expe- riencing additive white Gaussian noise. the impulse responses of the transmitter filter, the transmission medium and the receiver filter. In our discrete-time model discrete symbols I, are transmitted to the receiver at a rate of $ symbols per second and the output ‘uk at the receiver is also sampled at a rate of per second. Consequently, as depicted in Figure 8.1, the passage of the input sequence {Ik} through the channel results in the channel output sequence {vk} that can be expressed as n=o where {qk} is a white Gaussian noise sequence with zero mean and variance 0:. The number of interfering symbols contributing to the IS1 is L. In general, the sequences {vk}, {Ik}, (7,) and {fn} are complex-valued. Again, Figure 8.1 illustrates the model of the equivalent discrete-time system corrupted by Additive White Gaussian Noise (AWGN). 8.2 Equalization as a Classification Problem In this section we will show that the characteristics of the transmitted sequence can be ex- ploited by capitalising on the finite state nature of the channel and by considering the equal- ization problem as a geometric classification problem. This approach was first expounded by Gibson, Siu and Cowan [237], who investigated utilizing nonlinear structures offered by Neural Networks (NN) as channel equalisers. We assume that the transmitted sequence is binary with equal probability of logical ones and zeros in order to simplify the analysis. Referring to Equation 8.2 and using the notation 8.2. EQUALIZATION AS A CLASSIFICATION PROBLEM 301 Vk I 1 Equaliser Decision Function l Figure 8.2: Linear m-tap equalizer schematic. of Section 8.1, the symbol-spaced channel output is defined by L where {qk} is the additive Gaussian noise sequence, { fiL}, n = 0, l!. . . ,L is the CIR, {II;} is the channel input sequence and {Vk} is the noise-free channel output. The mth order equaliser, as - illustrated in Figure 8.2, has m taps as well as a delay of 7, and it produces an estimate Ik-T of the transmitted signal IkPT. The delay T is due to the precursor section of the CIR, since it is necessary to facilitate the causal operation of the equalizer by supplying the past and future received samples, when generating the delayed detected symbol IkP7. Hence the required length of the decision delay is typically the length of the CIR's precursor section, since outside this interval the CIR is zero and therefore the equaliser does not have to take into account any other received symbols. The channel output observed by the linear mth order equaliser can be written in vectorial form as vk [ vk Vk-1 . . . VkPm+l l', (8.4) and hence we can say that the equalizer has an m-dimensional channel output observation space. For a CIR of length L + 1, there are hence n, = 2L+m possible combinations of the binary channel input sequence II, = [ II, Ik-1 . . . Ik-m-L+1 IT (8.5) that produce 71, = 2L+7n different possible noise-free channel output vectors Vk = [ Vk Vk-1 . . . Vk-m+l ] . T (8.6) The possible noise-free channel output vectors Vk or particular points in the observation space will be referred to as the desired channel states. Expounding further, we denote each of the n, = 2L+m possible combinations of the channel input sequence Ik of length Lfm symbols 302 CHAPTER S. NEURAL NETWORK BASED EQUALIZATION as si, 1 5 i 5 R, = 2L+Tn, where the channel input state si determines the desired channel output state ri, i = 1, 2, . . . , n,$ = 2L+m. This is formulated as: vk = r, if Ik = S,, i = 1,2,. . . , n,. The desired channel output states can be partitioned into two classes according to the binary value of the transmitted symbol IkPr, as seen below: and We can denote the desired channel output states according to these two classes as follows: where the quantities nf and 71.; represent the number of channel states rt and r; in the set K:,7 and V&, respectively. The relationship between the transmitted symbol I, and the channel output Uk can also be written in a compact form as: (8.10) where vk is an m-component vector that represents the AWGN sequence, is the noise-free channel output vector and F is an m x (m + L) CIR-related matrix in the form of with f3, j = 0,. . . , L being the CIR taps. Below we demonstrate the concept of finite channel states in a two-dimensional output observation space (m = 2) using a simple two-coefficient channel (L = l), assumming the CIR of: F(z) = 1 + 0.52-l. (8.12) Thus, F = [ 1, Vk = [ ijk ijk-1 ] and 11, = [ I,+ 1,-l 1k-2 ] . T T All the possible combinations of the transmitted binary symbol Ik and the noiseless channel outputs cl;, ijk-1, are listed in Table 8.1. 8.2. EQUALIZATION AS A CLASSIFICATION PROBLEM 303 2- 1- -l -3 ' -3 Figure 8.3: The noiseless BPSK-related channel states Vk = ri and the noisy channel outputs Vk of a Gaussian channel having a CIR of F(z) = 1 + 0.5~~~ in a two-dimensional observation space. The noise variance a: = 0.05, the number of noisy received Vk samples output by the channel and input to the equalizer is 2000 and the decision delay is T = 0. The linear decision boundary separates the noisy received vk clusters that correspond to IkPr = +l from those that correspond to Ik r = -1. 304 CHAPTER 8. NEURAL NETWORK BASED EOUALIZATION II, Ik,-l Ik-2 +1.5 +IS +l +l +l +1.5 +OS +l +l -1 +0.5 -0.5 +l -1 +l +0.5 -1.5 +l -1 -1 -0.5 +1.5 -1 +l +l -0.5 +0.5 -1 +l -1 -1.5 -0.5 -1 -1 +l -1.5 -1.5 -1 -1 -1 ‘(?k Table 8.1: Transmitted signal and noiseless channel states for the CIR of F(z) = 1 + 0.5~~’ and an equalizer order of m = 2. Figure 8.3 shows the 8 possible noiseless channel states VI, for a BPSK modem and the noisy channel output vk in the presence of zero mean AWGN with variance 0; = 0.05. It is seen that the observation vector VI, forms clusters and the centroids of these clusters are the noiseless channel states rz. The equalization problem hence involves identifying the regions within the observation space spanned by the noisy channel output vk that correspond to the transmitted symbol of either II, = +l or 1, = -1. A linear equalizer performs the classification in conjunction with a decision device, which is often a simple sign function. The decision boundary, as seen in Figure 8.3, is constituted by the locus of all values of vk, where the output of the linear equalizer is zero as it is demonstrated below. For example, for a two tap linear equalizer having tap coefficients (-1 and Q, at the decision boundary we have: and (8.14) gives a straight line decision boundary as shown in Figure 8.3, which divides the observa- tion space into two regions corresponding to II, = +l and 1, = -1. In general, the linear equalizer can only implement a hyperplane decision boundary, which in our two-dimensional example was constituted by a line. This is clearly a non-optimum classification strategy, as our forthcoming geometric visualization will highlight. For example, we can see in Figure 8.3 that the point V = [ 0.5 -0.5 ] associated with the II, = +l decision is closer to the de- cision boundary than the point V = [ -1.5 -0.5 ] associated with the II, = -1 decision. Therefore, in the presence of noise, there is a higher probability of the channel output centred at point V = [ 0.5 -0.5 ] to be wrongly detected as Ik = -1, than that of the channel output centred around V = [ - 1.5 -0.5 ] being incorrectly detected as I, = +l. Gibson et ul. [237] have shown examples of linearly non-separable channels, when the decision de- lay is zero and the channel is of non-minimum phase nature. The linear separability of the channel depends on the equalizer order, m, on the delay r and in situations where the channel characteristics are time varying, it may not be possible to specify values of m and r, which will guarantee linear separability. 8.3. INTRODUCTION TO NEURAL NETWORKS 305 According to Chen, Gibson and Cowan [241], the above shortcomings of the linear equal- izer are circumvented by a Bayesian approach [25 l] to obtaining an optimal equalization so- lution. In this spirit, for an observed channel output vector vk, if the probability that it was caused by IkPT = +l exceeds the probability that it was caused by IkPT = -1, then we should decide in favour of +l and vice versa. Thus, the optimal Bayesian equalizer solution is defined as [241]: (8.15) where the optimal Bayesian decision function fsayes(.), based on the difference of the asso- ciated conditional density functions is given by [85]: where p+ and pi is the a priori probability of appearance of each desired state rt E Vz,T and ri E V;,T, respectively and p(.) denotes the associated probability density function. The quantities nf and n; represent the number of desired channel states in VA,, and V;,T, respectively, which are defined implicitly in Figure 8.3. If the noise distribution is Gaussian, Equation 8.16 can be rewritten as: j=1 Again, the optima1 decision boundary is the locus of all values of Vk, where the probability Ik-T = +l given a value vk is equal to the probability IkPT = -1 for the same vk. In general, the optimal Bayesian decision boundary is a hyper-surface, rather than just a hyper-plane in the m-dimensional observation space and the realization of this nonlinear boundary requires a nonlinear decision capability. Neural networks provide this capability and the following section will discuss the various neural network structures that have been investigated in the context of channel equalization, while also highlighting the learning algo- rithms used. 8.3 Introduction to Neural Networks 8.3.1 Biological and Artificial Neurons The human brain consists of a dense interconnection of simple computational elements re- ferred to as neurons. Figure 8.4(a) shows a network of biological neurons. As seen in the 306 CHAPTER 8. NEURAL NETWORK RASED EQUALIZATION W(>- Apical dendrlte &on-# \ Basal dendrlte (Initla1 segment) (a) Anatomy of a typical biological neuron, from Kandel [252] Inputs l 4- \ *\ Activation \ \ function ’/ 2 - (b) An artificial neuron (jth-neuron) Figure 8.4: Comparison between biological and artificial neurons. figure, the neuron consists of a cell body - which provides the information-processing func- tions - and of the so-called axon with its terminal fibres. The dendrites seen in the figure are the neuron’s ‘inputs’, receiving signals from other neurons. These input signals may cause the neuron tofire, i.e. to produce a rapid, short-term change in the potential difference across the cell’s membrane. Input signals to the cell may be excitatory, increasing the chances of neuron firing, or inhibitory, decreasing these chances. The axon is the neuron’s transmission line that conducts the potential difference away from the cell body towards the terminal fi- bres. This process produces the so-called synapses, which form either excitatory or inhibitory connections to the dendrites of other neurons, thereby forming a neural network. Synapses mediate the interactions between neurons and enable the nervous system to adapt and react to its surrounding environment. In Artificial Neural Networks (ANN), which mimic the operation of biological neural networks, the processing elements are artificial neurons and their signal processing properties are loosely based on those of biological neurons. Refemng to Figure 8.4(b), the jth-neuron has a set of I synapses or connection links. Each link is characterized by a synaptic weight wiJ , i = l, 2, . . . , I. The weight wij is positive, if the associated synapse is excitatory and it is negative, if the synapse is inhibitory. Thus, signal xi at the input of synapse i, connected to neuron j, is multiplied by the synaptic weight wij. These synaptic weights that store ‘knowledge’ and provide connectivity, are adapted during the learning process. The weighted input signals of the neuron are summed up by an adder. If this summation 8.3. INTRODUCTION TO NEURAL NETWORKS 307 exceeds a so-called firing threshold e,, then the neuron fires and issues an output. Otherwise it remains inactive. In Figure 8.4(b) the effect of the firing threshold 0, is represented by a bias, arising from an input which is always ‘on’, corresponding to x0 = 1, and weighted by WO,~ = -Bj = bJ. The importance of this is that the bias can be treated as just another weight. Hence, if we have a training algorithm for finding an appropriate set of weights for a network of neurons, designed to perform a certain function, we do not need to consider the biases separately. a v 9 2 I .5 1 0.5 0 -0.5 -1 m -2 -1 0 1 2 U a v 9 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -1 0 1 2 21 (a) Threshold activation function (b) Piecewise-linear activation function h S v 9 1.5 1 0.5 0 -0.5 -1 -1.5 -10 -5 0 5 10 71 (c) Sigmoid activation function Figure 8.5: Various neural activation functions f(u). The activation function f(.) of Figure 8.5 limits the amplitude of the neuron’s output to some permissible range and provides nonlinearities. Haykin [253] identifies three basic types of activation functions: 308 CHAPTER 8. NEURAL NETWORK BASED EQUALIZATION 1. Threshold Function. For the threshold function shown in Figure 8.5(a), we have 1 ifv 20 0 if21 <O ' (8.18) Neurons using this activation function are referred to in the literature as the McCulloch- Pirrs model [253]. In this model, the output of the neuron gives the value of 1 if the total internal activity level of that neuron is nonnegative and 0 otherwise. 2. Piecewise-Linear Function. This neural activation function, portrayed in Figure 8.5(b), is represented mathematically by: i 1, v>l -1, 21 <-l f(v) = v, -1 >W > 1 , where the amplification factor inside the linear region is assumed to be unity activation function approximates a nonlinear amplifier. (8.19) . This 3. Sigmoid Function. A commonly used neural activation function in the construction of artificial neural networks is the sigmoid activation function. It is defined as a strictly increasing function that exhibits smoothness and asymptotic properties, as seen in Fig- ure 8.5(c). An example of the sigmoid function is the hyperbolic tangent function, which is shown in Figure 8.5(c) and it is defined by [253]: (8.20) This activation function is differentiable, which is an important feature in neural net- work theory [253]. The model of the jth artificial neuron, shown in Figure 8.4(b) can be described in mathe- matical terms by the following pair of equations: where: I vj = >, W,lX,. (8.22) i=O Having introduced the basic elements of neural networks, we will focus next on the as- sociated network structures or architectures. The different neural network structures yield different functionalities and capabilities. The basic structures will be described in the follow- ing section. 8.3.2 Neural Network Architectures The network's architecture defines the neurons' arrangement in the network. Various neural network architectures have been investigated for different applications, including for example [...]... propagation algorithm has a slower convergence rate than the conventional adaptive equalizer using the Least Mean Square (LMS) algorithm described in Appendix A.2 Thiswas illustrated for example by Siu et al [240] using experimental results The introduction of the so-called momentum term was suggested by Rumelhart et al [256] for the adaptive algorithm to improve the convergence rate The idea is based on... activation function introduces local minima to the error surfaceof the otherwise linear perceptron structure Thus, the stochastic CHAPTER 8 NEURAL NETWORK 316 BASED EQUALIZATION gradient algorithm [255,256] assisted the previously mentioned momentum term [256] can by be invoked in their scheme in order to adaptively train the equaliser The decision feedback in order to further structure of Figure 8.10 can... to each ! i centre provides the width parameter, g Finally, we can calculate the RBF weights w using Equation 8.76 or adaptively using the LMS algorithm [253] Note that apart from regularization, an alternative way of reducing the number of basis OLS learning functions required and thus reduce the associated complexity is to use the procedure proposed by Chen Cowan and Grant [274] This method is based... k The adaptive K-means clustering algorithm computes the new reference vector c i , k + l as Ci.lC+l = Ci,k +Mi(Xk){P(Xk - Ci,k)), (8.78) 330 CHAPTER 8 NEURAL NETWORK BASED EQUALIZATION where p is the learning rate governing the speed and accuracy the adaptation andMi (xk) of is the so-called membership indicator that specifies, whether the input pattern belongs to xk i In the traditional adaptive. .. ’feel guilty’ and therefore pulls it itself out of the competition Thus, the average rates of ’winning’ for each region is equalized and no reference vectors can get ’entrenched’in that region However, these two methods yield partitions that are not optimal with respect to the MSE cost function of Equation 8.77 The performance of the adaptive K-means algorithm depends on the learning rate p in Equation... The PP equalizer is attractive, since it has a simpler structure than that of the MLP The PP equalizer also has a multi-modal error surface - exhibiting a numberof local minimaand a global minimum - and thus still retains some problems associated with its convergence performance, although notas grave as theMLP structure Another drawback is that the number of terms in the polynomial of Equation 8.24 increases... namely X + and X - This dichotomy or binary partition the points with respect to a surface becomes successful,if the surface separates the points belonging to the class X + from those in the class X - Thus, to solve the pattern-classification problem, we need to provide this separating surface that gives the decision boundary, as shown in Figure 8.14 We will now non-linearly cast the problemof separating... constitutes the interpolation between the data points, where the interpolation is performed along the constrained surface generated by ? the fitting procedure, as the optimum approximation to the true surfacel Thus, we are led to the theory of multivariable interpolation in high-dimensional spaces Assuming a single-dimensional output space, the interpolation problem can be stated as follows: Given a set of N... coefficients w constitute a new set of weights Using radial basis functions, we set (P~(x) G ( / ~- cill), = x i = 1 , 2 , , M , (8.68) where ci, = 1 , 2 , , M , is the set of RBF centres to be determined Thus, with the aid of i Equation 8.67 and Equation 8.68 we have M F*(x) = C W ~ G ( Xi ) c ; Gill) (8.69) Now the problem we have to address is the determination the new set of weights wi, = 1 , 2 , of... yields [253]: (GTG+ XGo)w = G T d , (8.71) CHAPTER 8 NEURAL NETWORK BASED EQUALIZATION 328 where (8.75) Here, the matrix G is a non-symmetric N-by-&l matrix and the matrix Go is a symmetric M-by-M matrix Thus, upon solving Equation 8.71 to obtain the weights W, we get: W = (GTG+ XGo)-'GTd (8.76) Observe that the solution in Equation 8.76 is different from Tikhonov's solution in Equation 8.62 Specifically, . response (CIR). In this context, the channel represents the convolution of 299 Adaptive Wireless Tranceivers L. Hanzo, C.H. Wong, M.S. Yee Copyright © 2002. channel (L = l), assumming the CIR of: F(z) = 1 + 0.52-l. (8.12) Thus, F = [ 1, Vk = [ ijk ijk-1 ] and 11, = [ I,+ 1,-l 1k-2