Adaptive Filtering Part 4 pot

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	30
Dung lượng	2,86 MB

Nội dung

The Ultra High Speed LMS Algorithm Implemented on Parallel Architecture Suitable for Multidimensional Adaptive Filtering 79 Fig. 42. The clean speech obtained at the output of our proposed ANC (Fig. 30) by reducing the time scale 8. The ultra high speed LMS algorithm implemented on parallel architecture There are many problems that require enormous computational capacity to solve, and therefore the success of computational science to accurately describe and model the real world has helped to fuel the ever increasing demand for cheap computing power. Scientists are eager to find ways to test the limits of theories, using high performance computing to allow them to simulate more realistic systems in greater detail. Parallel computing offers a way to address these problems in a cost effective manner. Parallel Computing deals with the development of programs where multiple concurrent processes cooperate in the fulfilment of a common task. Finally, in this section we will develop the theory of the parallel computation of the widely used algorithms named the least-mean-square (LMS) algorithm 1 by its originators, Widrow and Hoff (1960) [2]. 8.1 The spatial radix-r factorization This section will be devoted in proving that discrete signals could be decomposed into r partial signals and whose statistical properties remain invariant therefore, given a discrete signal x n of size N  012 1 n N xxxx x        (97) and the identity matrix I r of size r  , 10 0 0 1 0 1 0 00 1 r lc f or l c II elsewhere              (98) for l = c = 0, 1, , r – 1. Based on what was proposed in [2]-[9]; we can conclude that for any given discrete signal x (n) we can write: 1 M Jaber “Method and apparatus for enhancing processing speed for performing a least mean square operation by parallel processing” US patent No. 7,533,140, 2009 Adaptive Filtering 80       ,1 , l n rn rn rn p lc xI xx x            (99) is the product of the identity matrix of size r by r sets of vectors of size N/r (n = 0,1, , N/r -1) where the l th element of the n th product is stored into the memory address location given by l = (rn + p) (100) for p = 0, 1, …, r – 1. The mean (or Expected value) of x n is given as:  1 0 N n n x Ex N     (101) which could be factorizes as:             /1 /1 1 00 /1 0 1 = = Nr Nr rn rn r nn x Nr rn p x n rn p xx Ex N x r N r r                   (102) therefore, the mean of the signal x n is equal to sum of the means of its r partial signals divided by r for p = 0, 1, …, r – 1. Similarly to the mean, the variance of the signal x n equal to sum of the variances of its r partial signals according to:                  1 2 2 2 0 /1 /1 2 2 11 00 1 2 0 = = N x n n Nr Nr rn rn rn r rn r nn r x p rn p Var x E x x xx                          (103) 8.2 The parallel implementation of the least squares method The method of least squares assumes that the best-fit curve of a given type is the curve that has the minimal sum of the deviations squared (least square error) from a given set of data. Suppose that the N data points are (x 0 , y 0 ), (x 1 , y 1 )… (x (n – 1) , y (n – 1) ), where x is the independent variable and y is the dependent variable. The fitting curve d has the deviation (error) σ from each data point, i.e., σ 0 = d 0 – y 0 , σ 1 = d 1 – y 1 σ (n – 1) = d (n – 1) – d (n – 1) which could be re-ordered as:          000 2 2 2 111 111 111 ,, , , , , , , , rrrr r r rn rn rn rn rn rn rn r rn r rn r dy dy d y dy dy d y d y                   (104) for n = 0, 1, …, (N/r) – 1. The Ultra High Speed LMS Algorithm Implemented on Parallel Architecture Suitable for Multidimensional Adaptive Filtering 81 According to the method of least squares, the best fitting curve has the property that:         2222 2 01 1 1 1 1 2 00 1 1 2 00 00 0 0 0 = a minimum rn rn rn r N r r rn j rn j jn N r r rn j jn J dy                      (105) The parallel implementation of the least squares for the linear case could be expressed as:             00 0 = = nn n nn r rn j rn j r rn j ed bwx d y Id y Ie           (106) for j 0 = 1, …, r – 1 and in order to pick the line which best fits the data, we need a criterion to determine which linear estimator is the “best”. The sum of square errors (also called the mean square error (MSE)) is a widely utilized performance criterion.           1 2 0 1 12 1 2 00 0 0 1 2 1 N n n N r r N r rn j r jn Je N Ie r               (107) which yields after simplification to:       1 1 2 00 11 00 0 0 00 00 1 1 2 11 = = r r rn j jn rr r jj jj N r JI e r N r IJ J rr                                    (108) where 0 j J is the partial MSE applied on the subdivided data. Our goal is to minimize J analytically, which according to Gauss can be done by taking its partial derivative with respect to the unknowns and equating the resulting equations to zero: 0 0 J b J w              (109) which yields: Adaptive Filtering 82         1 1 1 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 r r r j r j j r j r r r j r j j r j IJ J J bb b IJ J J ww w                                                (110) With the same reasoning as above the MSE could be obtained for multiple variables by:            2 , 0 2 1 1 ,, 00 0 1 0 0 00 0 0 0 1 2 11 = 2 1 p nknk nk N p r r rn j rnjk rnjk jn k r j j Jdwx N dwx N r r J r                               (111) for j 0 = 1, …, r – 1 and where 0 j J is the partial MSE applied on the subdivided data. The solution to the extreme (minimum) of this equation can be found in exactly the same way as before, that is, by taking the derivatives of 0 j J with respect to the unknowns (w k ), and equating the result to zero. Instead of solving equations 110 and 111 analytically, a gradient adaptive system can be used which is done by estimating the derivative using the difference operator. This estimation is given by: w J J w    (112) where in this case the bias b is set to zero. 8.3 Search of the performance surface with steepest descent The method of steepest descent (also known as the gradient method) is the simplest example of a gradient based method for minimizing a function of several variables [12]. In this section we will be elaborating the linear case. Since the performance surface for the linear case implemented in parallel, are r paraboloids each of which has a single minimum, an alternate procedure to find the best value of the coefficient  0 j k w is to search in parallel the performance surface instead of computing the best coefficient analytically by Eq. 110. The search for the minimum of a function can be done efficiently using a broad class of methods that use gradient information. The gradient has two main advantages for search.  The gradient can be computed locally.  The gradient always points in the direction of maximum change. If the goal is to reach the minimum in each parallel segment, the search must be in the direction opposite to the gradient. So, the overall method of search can be stated in the following way: The Ultra High Speed LMS Algorithm Implemented on Parallel Architecture Suitable for Multidimensional Adaptive Filtering 83 Start the search with an arbitrary initial weight  00 j w , where the iteration is denoted by the index in parenthesis (Fig 43). Then compute the gradient of the performance surface at  00 j w , and modify the initial weight proportionally to the negative of the gradient at  00 j w. This changes the operating point to  01 j w . Then compute the gradient at the new position  01 j w , and apply the same procedure again, i.e.     000 1 jjj kkk ww J     (113) where η is a small constant and  0 j k J denotes the gradient of the performance surface at the k th iteration of j 0 parallel segment. η is used to maintain stability in the search by ensuring that the operating point does not move too far along the performance surface. This search procedure is called the steepest descent method (fig 43) Fig. 43. The search using the gradient information [13]. If one traces the path of the weights from iteration to iteration, intuitively we see that if the constant η is small, eventually the best value for the coefficient w* will be found. Whenever w>w*, we decrease w, and whenever w<w*, we increase w. 8.4 The radix- r parallel LMS algorithm Based on what was proposed in [2] by using the instantaneous value of the gradient as the estimator for the true quantity which means by dropping the summation in equation 108 and then taking the derivative with respect to w yields:              1 1 2 2 00 0 0 0 0 00 1 1 11 22 1 r j kk r j kk N r r r rn j jn r rn j rn j kk JIJ r JIJ wwr eIe wNr wNr Ie x rw                                             (114) Adaptive Filtering 84 What this equation tells us is that an instantaneous estimate of the gradient is simply the product of the input to the weight times the error at iteration k. This means that with one multiplication per weight the gradient can be estimated. This is the gradient estimate that leads to the famous Least Means Square (LMS) algorithm (Fig 44). If the estimator of Eq.114 is substituted in Eq.113, the steepest descent equation becomes       000 00 1 1 rr jjj rn j rn j kk kk Iw Iw e x r                             (115) for j 0 = 0,1, …, r -1. This equation is the r Parallel LMS algorithm, which is used as predictive filter, is illustrated in Figure 45. The small constant η is called the step size or the learning rate. Fig. 44. LMS Filter Delay Adaptive l  rn d  rn x  rn y  rn e   Delay Adaptive l  1rn d   1rn x   1rn y   1rn e    Delay Adaptive l  0 rn j d   0 rn j x   0 rn j y   0 rn j e    _ _ _ Jaber Product Device  n e Fig. 45. r Parallel LMS Algorithm Used in Predictive Filter. The Ultra High Speed LMS Algorithm Implemented on Parallel Architecture Suitable for Multidimensional Adaptive Filtering 85 8.5 Simulation results The notion of a mathematical model is fundamental to sciences and engineering. In class of applications dealing with identification, an adaptive filter is used is used to provide a linear model that represents the best fit (in some sense) to an unknown signal. The LMS Algorithm which is widely used is an extremely simple and elegant algorithm that is able to minimize the external cost function by using local information available to the system parameters. Due to its computational burden and in order to speed up the process, this paper has presented an efficient way to compute the LMS algorithm in parallel where it follows from the simulation results that the stability of our models relies on the stability of our r parallel adaptive filters. It follows from figures 47 and 48 that the stability of r parallel LMS filters (in this case r = 2) has been achieved and the convergence performance of the overall model is illustrated in figure 49. The complexity of the proposed method will be reduced by a factor of r in comparison to the direct method illustrated in figure 46. Furthermore, the simulation result of the channel equalization is illustrated in figure 50 in which the blue curves represents our parallel implementation (2 LMS implemented in parallel) compared to the conventional method where the curve is in a red colour. 0 1000 2000 3000 4000 5000 6000 7000 8000 -5 0 5 original signal 0 1000 2000 3000 4000 5000 6000 7000 8000 -5 0 5 predicted signal 0 1000 2000 3000 4000 5000 6000 7000 8000 -2 0 2 error Fig. 46. Simulation Result of the original signal Adaptive Filtering 86 0 500 1000 1500 2000 2500 3000 3500 4000 -5 0 5 first portion of original signal 0 500 1000 1500 2000 2500 3000 3500 4000 -5 0 5 first portion of predicted signal 0 500 1000 1500 2000 2500 3000 3500 4000 -2 0 2 first portion of error Fig. 47. Simulation Result of the first partial LMS Algorithm. 0 500 1000 1500 2000 2500 3000 3500 4000 -5 0 5 second portion of original signal 0 500 1000 1500 2000 2500 3000 3500 4000 -5 0 5 second portion of predicted signal 0 500 1000 1500 2000 2500 3000 3500 4000 -5 0 5 second portion of error Fig. 48. Simulation Result of the second partial LMS Algorithm. The Ultra High Speed LMS Algorithm Implemented on Parallel Architecture Suitable for Multidimensional Adaptive Filtering 87 0 1000 2000 3000 4000 5000 6000 7000 8000 -5 0 5 reconstructed original signal 0 1000 2000 3000 4000 5000 6000 7000 8000 -5 0 5 reconstructed predicted signal 0 1000 2000 3000 4000 5000 6000 7000 8000 -5 0 5 reconstructed error signal Fig. 49. Simulation Result of the Overall System. 0 100 200 300 400 500 600 700 800 900 1000 10 -8 10 -6 10 -4 10 -2 10 0 10 2 Convergence of LMS Mean Square Error MSE samples Fig. 50. Simulation Result of the channel equalization (blue curve two LMS implemented in Parallel red curve one LMS). Adaptive Filtering 88 8. References [1] S. Haykin, Adaptive Filter Theory, Prentice-Hall, Englewood Cliffs, NJ, 1991. [2] Widrow and Stearns, " Adaptive Signal Processing ", Prentice Hall 195. [3] K. Mayyas, and T. Aboulnasr, "A Robust Variable Step Size LMS-Type Algorithm: Analysis and Simulations", IEEE 5-1995, pp. 1408-1411. [4] T. Aboulnasar, and K. Mayyas, "Selective Coefficient Update of Gradient-Based Adaptive Algorithms", IEEE 1997, pp. 1929-1932. [5] E. Bjarnason: "Analysis of the Filtered X LMS Algorithm ", IEEE 4 1993, pp. III-511, III- 514. [6] E.A. Wan, "Adjoint LMS: An Efficient Alternative To The Filtered X LMS And Multiple Error LMS Algorithm", Oregon Graduate Institute Of Science & Technology, Department Of Electrical Engineering And Applied Physics, P.O. Box 91000, Portland, OR 97291. [7] B. Farhang-Boroujeny: “Adaptive Filters, Theory and Applications”, Wiley 1999. [8] Wiener, Norbert (1949). “ Extrapolation, Interpolation, and Smoothing of Stationary Time Series ”, New York: Wiley. ISBN 0-262-73005-7 [9] M. Jaber “Noise Suppression System with Dual Microphone Echo Cancellation US patent no.US-6738482. [10] M. Jaber, “Voice Activity detection Algorithm for Voiced /Unvoiced Decision and Pitch Estimation in a Noisy Speech feature Extraction”, US patent application no. 60/771167, 2007. [11] M. Jaber and D. Massicottes: “A Robust Dual Predictive Line Acoustic Noise Canceller”, International Conference on Digital Signal Processing DSP 2009 Santorini Greece. [12] M. Jaber, D. Massicotte, "A New FFT Concept for Efficient VLSI Implementation: Part I – Butterfly Processing Element", 16th International Conference on Digital Signal Processing (DSP’09), Santorini, Greece, 5-7 July 2009. [13] J.C. Principe, W.C. Lefebvre, N.R. Euliano, “ Neural Systems: Fundamentals through Simulation ”, 1996. [...]... DWAFS module for R=2 103 1 04 Adaptive Filtering Fig 14 Example of the multi-input register and shift-adder Fig 15 Example of the input registers for B =4 and R=2 The (a) is for output calculation, and the (b) is for update Fig 16 Timing chart of the SMDA-ADF The sampling period is equal to the word length of the input signal The current update procedure begins after delays 4. 4 VLSI evaluations The VLSI... LSB to MSB direction at the every sampling instance, where the B indicates the word length The B partial-products 90 Adaptive Filtering used to obtain the output signal are updated from LMB to MSB direction There exist 2N partial-products, and the set including all the partial-products is called Whole Adaptive Function Space (WAFS) Furthermore, the DA-ADF using multi-memory block structure that uses... Pipelined-LMS, Pipelined-NCLMS, SMDA 128 16 bit 64 0.8 micron CMOS standard cell, 5V Carry look-ahead adder Booth’s encode algorithm, Wallace tree, Carry look-ahead adder Table 4 The condition of the VLSI evaluation Machine cycle [ns] Sampling rate [MHz] Latency [ns] Power dissipation [W] Area [mm2] Number of gates MDA 31 0.79 713 6 .40 36 175,321 SMDA 21 3.00 47 9 16 .47 54 258,321 Table 5 Comparison of the VLSI... architecture:theory and realization IEEE Trans Acoust., Speech \& Signal Process., vol.31, no.3, pp. 541 549 Peled, A & Liu, B (19 74) A new hardware realization of digital filters IEEE Trans Acoust., Speech \& Signal Process., vol.22, no.12, pp .45 6 46 2 Wei, C.H & Lou, J.J (1986) Multimemory block structure for implementing a digital adaptive filter using distributed arithmetic IEE Proc., vol.133, Pt.G, no.1, pp.19 26... Architecture of Adaptive Digital Filter Employing FIR Filter with Minimum Coefficients Delay and Output Latency IEICE Trans Fundamentals, Vol.J89-A, No.12, pp.1130-1 141 Cowan, C.F.N & Mavor., J (1981) New digital adaptive- filter implementation using distributed-arithmetic techniques IEE Proc., vol.128, Pt.F, no .4, pp.225 230 Cowan, C.F.N, Smith, S.G & Elliott, J.H (1983) A digital adaptive filter using... evaluations for the MDA and the SMDA Machine cycle [ns] Sampling rate [MHz] Latency [ns] Power dissipation [W] Area [mm2] Number of gates LMS 297 3.37 2 14 6.23 297 1,570,0 84 Pipe-DLMS 63 15.87 8,0 64 25.79 205 997,760 Pipe-LMS 131 7.63 131 27.33 42 9 2,082,3 04 Pipe-NCLMS 61 16.39 61 18.20 187 916,893 Table 6 Comparison of the VLSI evaluations for ADFs employing multipliers 5 Conclusions In this chapter, the... implementation of delayed LMS transversal adaptive filters IEEE Trans Signal Processing, vol .42 , no.8, pp.2169 2175 Harada, A., Nishikawa, K & Kiya, H (1998) Pipelined Architecture of the LMS Adaptive Digital Filter with the Minimum Output Latency IEICE Trans Fundamentals, Vol.E81-A, No.8, pp.1578-1585 NTT DATA Corporation (1990) PARTHENON User's Manual Japan Part 2 Complex Structures, Applications... HMDA Measurement Impulse Response Error Ratio(IRER) Number of taps 128 Number of address-line 1, 2, 4 for MDA, 2 and 4 for HMDA Input signal White Gaussian noise, variance=1.0, average=0.0 Observation noise White Gaussian noise independent to the input signal, 45 dB Table 2 Computer simulation conditions 4 New algorithm and architecture The new algorithm and effective architecture can be obtained by applying... High-performance Architecture of LMS Adaptive Filter Using Distributed Arithmetic Based on Half-Memory Algorithm IEICE Trans Fundamentals, Vol.J 84- A, No.6, pp.777-787 Takahashi, K., Tsunekawa, Y., Tayama, N., Seki, K (2002) Analysis of the Convergence Condition of LMS Adaptive Digital Filter Using Distributed Arithmetic IEICE Trans Fundamentals, Vol.E85-A, No.6, pp1 249 -1256 Takahashi, K., Kanno, D & Tsunekawa,... & Agrawal, 1993; Wang, 19 94) is applied 4) To reduce the pitch of pipeline, two partial-products are pre-loaded before addition in update procedure 5) The multi-memory block structure is applied to reduce an amount of hardware for higher order 6) The output calculation procedure is performed from LSB to MSB, whereas, the update procedure is performed with reverse direction 4. 1 New algorithm with delayed . in Figure 45 . The small constant η is called the step size or the learning rate. Fig. 44 . LMS Filter Delay Adaptive l  rn d  rn x  rn y  rn e   Delay Adaptive l  1rn d  . The B partial-products Adaptive Filtering 90 used to obtain the output signal are updated from LMB to MSB direction. There exist 2 N partial-products, and the set including all the partial-products. 3000 40 00 5000 6000 7000 8000 -5 0 5 original signal 0 1000 2000 3000 40 00 5000 6000 7000 8000 -5 0 5 predicted signal 0 1000 2000 3000 40 00 5000 6000 7000 8000 -2 0 2 error Fig. 46 . Simulation

Ngày đăng: 19/06/2014, 12:20

Xem thêm