Adaptive lọc và phát hiện thay đổi P3

Adaptive Filtering and Change Detection Fredrik Gustafsson Copyright © 2000 John Wiley & Sons, Ltd ISBNs: 0-471-49287-6 (Hardback); 0-470-84161-3 (Electronic) Part II: Signal estimation Adaptive Filtering and Change Detection Fredrik Gustafsson Copyright © 2000 John Wiley & Sons, Ltd ISBNs: 0-471-49287-6 (Hardback); 0-470-84161-3 (Electronic) On-line approaches 3.1 Introduction 3.2 Filtering approaches 3.3.Summary of least squaresapproaches 3.3.1 Recursiveleastsquares 3.3.2 The leastsquares over sliding window 3.3.3 Least meansquare Kalman filter 3.3.4 The 3.4.Stopping rules and the CUSUM test 3.4.1 Distance measures 3.4.2 One-sided tests 3.4.3 Two-sided tests 3.4.4 The CUSUM adaptive filter 3.5.Likelihoodbasedchangedetection 3.5.1 Likelihood theory 3.5.2 ML estimation of nuisanceparameters 3.5.3 ML estimation of a single changetime 3.5.4 Likelihood ratio ideas windows 3.5.5 Model validationbasedonsliding 3.6 Applications 3.6.1 Fuel monitoring refinery 3.6.2 Paper 3.A Derivations 3.A.l Marginalization of likelihoods 3.A.2 Likelihood ratioapproaches 57 59 59 60 61 61 62 63 64 65 67 67 70 70 71 73 75 77 81 81 82 84 84 86 3.1 Introduction The basic assumption in this part signal estimation is that the measurements gt consist of a deterministic component Ot the signal and additive white noise et ~ yt = Ot + et ~ On-line amroaches 58 For change detection,this will be labeled as a change in themeanmodel The task of determining Bt from yt will be referred to as estimation, and change detection or alarming is the task of finding abrupt, or rapid, changes in dt, which is assumed to start at time L, referred to as the change time Surveillance comprises all these aspects, anda typical application is to monitor levels, flows and so on in industrial processes and alarm for abnormal values The basic assumptions about model (3.1) in change detection are: 0 The deterministic component Bt undergoes an abrupt change at time t = k Once this change is detected, the procedure starts all over again to detect the next change The alternative is to consider Bt as piecewise constant and focus on a sequence of change times k1, kz, ,kn, as shown in Chapter This sequence is denoted P , where both Ici and n are free parameters The segmentation problem is to find both the number and locations of the change times in P In the statistical approaches, it will be assumed that the noise is white and Gaussian et E N(0, R ) However, the formulas can be generalized to other distributions, as will be pointed out in Section 3.A The change magnitude for a change at time k is defined as v a &+l - d k Change detection approaches can be divided into hypothesis tests and estimation/information approaches Algorithms belonging to the class of hypothesis tests can be split into the parts shown in Figure 3.1 For the change in the mean model, one or more of these blocks become trivial, but the picture is useful to keep in mind for the general model-based case Estimation and information approaches everything in one step, and not suit the framework of Figure 3.1 The alternative to thenon-parametric approach in this chapter is to model the deterministic component of yt as a parametric model, and this issue will Data Yt,Ut Filter e t , Et D - meas- St Stopping rule istance Alarm K (a) Change detection based on statistics from one filter H pm A] Averaging Thresholding (b) A stopping rule consists of averaging and thresholding Figure 3.1 The steps in change detection based on hypothesis tests The stopping rule can be seen as an averaging filter and a thresholding decision device 3.2 Filterina amroaches 59 be the dealt with in Part 111 It must be noted that the signal model (3.1), and thus all methods in this part, are special cases of what will be covered in Part 111 This chapter presents a review of averaging strategies, stopping rules, and change detection ideas Most of the ideas to follow in subsequent chapters are introduced here 3.2 Filtering approaches The standard approach in signal processing for separating the signal Bt and the noise et is by (typically low-pass) filtering Q, = H ( d Y t (34 The filter can be of Finite Impulse Response (FIR) or Infinite Impulse ReChebysponse (IIR) type, anddesigned by any standard method (Butterworth, shev etc.) An alternative interpretation of filters is data windowing c 00 Q, = WkYt-k, k=O where the weights shouldsatisfy C w k = This is equal to the filtering approach if the weights are interpreted as the impulse response of the (lowpass) filter H ( q ) , i.e w k = h k An important special case of these is the exponential forgetting window, or Geometric Moving Average ( G M A ) Wk = (1 - X ) P , X h (3.19) Stopping rules will be used frequently when discussing change detection based on filter residual whiteness tests and model validation 3.4.1 Distancemeasures The input toa stopping rule is, as illustrated in Figure3.3, a distance measure S t Several possibilities exist: A simple approach is to take the residuals , (3.20) S t = E t = yt - & l , where 8t-l (based on measurementsup to timet-l) is any estimatefrom Sections 3.2 or 3.3 This is suitable for the change in the mean problem, which should be robust to variance changes A good alternative is to normalize to unit variance The variance of the residuals will be shown to equal R Pt, so use instead + (3.21) This scaling facilitates the design somewhat, in that approximately the same design parameters can be used for different applications An alternative is to square the residuals (3.22) St = Et This is useful for detecting bothvariance and parameter changes Again, normalization, now to unit expectation, facilitates design n St = Ek - (3.23) p? Other options are based on likelihood ratios to be defined For general filter formulations, certain correlation based methods apply See Sections 5.6 and 8.10 1-I l pm Averaging Thresholding I l I Figure 3.3 Structure of a stopping rule 3.4 Stomina rules and the CUSUM test 65 3.4.2 One-sided tests As shown in Figure 3.3, the stopping rule first averages the inputs st to get a test statistic gt, which is then thresholded An alarm is given when gt = > h and the test statistic is reset, gt = Indeed, any of the averaging strategies discussed in Section 3.2 can be used In particular, the sliding window and GMA filters can be used As an example of a wellknown combination, we can take the squared residual as distance measure from the no change hypothesis, st = E?, and average over a sliding window Then we get a X’ test, where the distribution of gt is x’(L)if there is no change There is more on this issue in subsequent chapters Particular named algorithms are obtained for the exponential forgetting window and finite moving average filter In this way, stopping rules based on FMA or GMA are obtained The methods so far have been linear in data, or for the X’ test quadratic in data Wenow turn our attention to a fundamental and historically very important class of non-linear stopping rules First, the Sequential Probability Ratio Test (SPRT) is given Algorithm 3.1 SPRT gt = gt-1 + , St - (3.24) v gt=O, a n d k = t i f g t < a < O gt = 0, and t, = t andalarm if gt > h > (3.25) (3.26) Design parameters: Drift v, threshold h and reset level a Output: Alarm time(s) t, In words, the test statistic gt sums up its input st, with the idea to give an alarm when the sum exceeds a threshold h With a white noise input, the test statistic will drift away similar to a random walk There are two mechanisms to prevent this natural fluctuation To prevent positive drifts, eventually yielding a false alarm, a small drift term v is subtracted at each time instant To prevent a negative drift, which would increase the time to detection after a change, the test statistic is reset to each time in becomes less than a negative constant a The level crossing parameter a should be chosen to be small in magnitude, andit has been thoroughlyexplained why a = is a good choice This important special case yields the cumulative sum (CUSUM) algorithm 3.5 based Likelihood detection chanae 73 Marginal distribution for e Marginaldistributionfor R ML estimate of as a function of time ML estimate of R as afunction of time 1 rr"" 0.5 21.5 20 k 40 OO 20 40 Figure 3.7 Upper row of plots shows the marginalized likelihood using 40 observations Lower row of plots shows how the estimates converge in time study the time response of the maximum likelihood estimator, which is also illustrated in Figure 3.7 The numericalevaluation is performed using the pointmassjilter, Bergman (1999), where a grid of the parameter space is used 3.5.3 see M1 estimation of a single change time We will now extend ML to change time estimation Let p(ytlk) denote the likelihood' for measurements yt = y1, y2, ,yt, given the change time k The change time is then estimated by the maximum likelihood principle: mkaxp(ytlk) i= arg (3.36) This is basically an off-line formulation The convention is that i= t should be interpreted as no change If the test is repeated for each new measurement, an on-line version is obtained, where there could be efficient ways to compute the likelihood recursively If we assume that 6' before and after the changeareindependent,the likelihood p ( y t l k ) can be divided into two parts by using the rule P ( A ,BIC) = P(AIC)P(BIC), which holds if A and B are independent That is, P W l k ) = P(Y"k)P(YL+llk) = P(Y"P(YL+l) 'This notation is equivalent to that commonly used for conditional distribution (3.37) On-line amroaches 74 Here yh+l = y,++1, , yt The conditioning on a change at time k does not influence the likelihoods in the right-hand side and is omitted That is, all that is needed is to compute the likelihood for data in all possible splits of data into two parts The number of such splits is t, so the complexity of the algorithm increases with time A common remedy to this problem is to use the sliding window approach and only consider k E [t- L, t] Equation (3.37) shows that change detection based on likelihoods brakes down to computing the likelihoods for batches of data, and then combine these using (3.37) Section 3.A contains explicit formulas for how to compute these likelihoods, dependingonwhat is known about the model Below is a summary of the results for the different cases needed to be distinguished in applications Here MML refers to Maximum Marginalized Likelihood and MGL refers to Maximum Generalized Likelihood See Section 3.A or Chapters and 10 for details of how these are defined The parameter Q is unknown and the noise variance R is known: (3.38) -2 log Z,n/rML(R) M (t - 1)log(27rR) tR + log(t) + R (3.39) Note that k is a compact and convenient wayof writing (3.35), and should not be confused with an estimate of what is assumed to beknown The parameter Q is unknown and the noise variance R is unknown, but known to be constant Thederivation of this practically quite interesting case is postponed to Chapter The parameter Q is unknown and the noise variance R is unknown, and might alter after the change time: + t + t log(&), -2 log l,n/rML Mt log(27r) + (t 5) + log@) log -2 l,n/rGL Mt (3.40) log(27r) - log(&) (t - 3) - (t - 3) log@- 5) + (3.41) The parameter Q is known (typically to be zero) and the noise variance R is unknown and abruptly changing: + t + tlog(R), (3.42) t log(27r) + (t - 4) - (t - 2) log@- 4) + (t - 2) log@) -2logl,n/rGL(Q)M tlog(27r) -2 log l t " L ( ) M (3.43) 3.5 based Likelihood detection chanae 75 Note that the last case is for detection of variance changes The likelihoods above can now be combined before and after a possible change, and we get the following algorithm Algorithm 3.5 likelihood based signal change detection A Define the likelihood Zmzn = p ( & ) , where Zt = Zpt The log likelihood for a change at time k is given by - log Zt(k) = - log Z1:k - log Zk+l:t, where each log likelihood is computed by one of the six alternatives (3.38)(3.43) The algorithm is applied to the simplest possible example below Example3.5likelihoodestimation Consider the signal Yt= { +e(t), +e(t), for < t 250 for 250 < t 500 The different likelihood functions as a function of change time are illustrated in Figure 3.8, for the cases: e unknown Q and R, R known and Q unknown, e Q known both before and after change, while R is unknown e Note that MGL has problem when the noise variance is unknown The example clearly illustrates that marginalization is to be preferred to maximization of nuisance parameters in this example 3.5.4 likelihoodratioideas In the context of hypothesis testing, the likelihood ratios rather than thelikelihoods are used The Likelihood Ratio ( L R ) test is a multiple hypotheses test, where the different jump hypotheses are compared to the no jump hypothesis pairwise In the LR test, the jump magnitude is assumed to be known The hypotheses under consideration are H0 : no jump H l ( k ,v ) : a jump of magnitude v at time k ... get a very basic diagnosis based on the 3.4.4 The CUSUM adaptive filter To illustrate one of the main themes in this book, we here combine adaptive filters with the CUSUM test as a change detector,... One-sided tests 3.4.3 Two-sided tests 3.4.4 The CUSUM adaptive filter 3.5.Likelihoodbasedchangedetection 3.5.1 Likelihood theory ... filter constraint 3.3 Summary of least squares approaches This section offers a summary of the adaptive filterspresentedin general form in Chapter a more On-line amroaches 60 A common framework

Định dạng
Số trang	32
Dung lượng	1,05 MB