18
KALMAN FILTER REVISITED
18.1 INTRODUCTION
In Section 2.6 we developed the Kalman filter as the minimization of a
quadratic error function. In Chapter 9 we developed the Kalman filter from the
minimum variance estimate for the case where there is no driving noise present
in the target dynamics model. In this chapter we develop the Kalman filter for
more general case [5, pp. 603–618]. The concept of the Kalman filter as a
fading-memory filter shall be presented. Also its use for eliminating bias error
buildup will be presented. Finally, the use of the Kalman filter driving noise to
prevent instabilities in the filter is discussed.
18.2 KALMAN FILTER TARGET DYNAMIC MODEL
The target model considered by Kalman [19, 20] is given by [5, p. 604]
d
dt
XðtÞ¼AðtÞXðtÞþDðtÞUðtÞð18:2-1Þ
where AðtÞ is as defined for the time-varying target dynamic model given in
(15.2-1), DðtÞ is a time-varying matrix and UðtÞ is a vector consisting of
random variables to be defined shortly. The term UðtÞ is known as the process-
noise or forcing function. Its inclusion has beneficial properties to be indicated
later. The matrix DðtÞ need not be square and as a result UðtÞ need not have the
same dimension as XðtÞ. The solution to the above linear differential equation is
375
Tracking andKalmanFilteringMade Easy. Eli Brookner
Copyright # 1998 John Wiley & Sons, Inc.
ISBNs: 0-471-18407-1 (Hardback); 0-471-22419-7 (Electronic)
[5, p. 605]
XðtÞ¼Èðt; t
nÀ1
ÞXðt
nÀ1
Þþ
ð
t
t
nÀ1
Èðt;ÞDðÞUðÞd ð18:2-2Þ
where È is the transition matrix obtained from the homogeneous part of
(18.2-1), that is, the differential equation without the driving-noise term
DðtÞUðtÞ, which is the random part of the target dynamic model. Consequently,
È satisfies (15.3-1).
The time-discrete form of (18.2-1) is given by [5, p. 606]
Xðt
n
Þ¼Èðt
n
; t
nÀ1
ÞXðt
nÀ1
ÞþVðt
n
; t
nÀ1
Þð18:2-3Þ
where
Vðt; t
nÀ1
Þ¼
ð
t
t
nÀ1
Èðt;ÞDðÞUðÞd ð18:2-4Þ
The model process noise UðtÞ is white noise, that is,
E½UðtÞ ¼ 0 ð18:2-5Þ
and
E½UðtÞUðt
0
Þ
T
¼KðtÞðt À t
0
Þð18:2-6Þ
where KðtÞ is a nonnegative definite matrix dependent on time and ðtÞ is the
Dirac delta function given by
ðt À t
0
Þ¼0 t
0
6¼ t ð18:2-7Þ
with
ð
b
a
ðt À t
0
Þ dt ¼ 1 a < t
0
< b ð18:2-8Þ
18.3 KALMAN’S ORIGINAL RESULTS
By way of history as mentioned previously, the least-square and minimum-
variance estimates developed in Sections 4.1 and 4.5 have their origins in the
work done by Gauss in 1795. The least mean-square error estimate, which
obtains the minimum of the ensemble expected value of the squared difference
between the true and estimated values, was independently developed by
376 KALMAN FILTER REVISITED
Kolmogorov [125] and Wiener [126] in 1941 and 1942, respectively. Next, the
Kalman filter [19, 20] was developed, it providing an estimate of a random
variable that satisfies a linear differential equation driven by white noise [see
(18.2-1)]. In this section the Kalman filter as developed in [19] is summarized
together with other results obtained in that study. The least mean-square error
criteria was used by Kalmanand when the driving noise is not present the
results are consistent with those obtained using the least-squares error estimate,
and minimum-variance estimate given previously.
Kalman [19] defines the optimal estimate as that which (if it exists)
minimizes the expected value of a loss function Lð"Þ, that is, it minimizes
E½Lð"Þ, which is the expected loss, where
" ¼ x
Ã
n;n
À x
n
ð18:3-1Þ
where x
Ã
n;n
is an estimate of x
n
, the parameter to be estimated based on the n þ 1
observations given by
Y
ðnÞ
¼ðy
0
; y
1
; y
2
; ; y
n
Þ
T
ð18:3-2Þ
It is assumed that the above random variables have a joint probability density
function given by pðx
n
; Y
ðnÞ
Þ. A scalar function Lð"Þ is a loss function if it
satisfies
ðiÞ Lð0Þ¼0 ð18:3-3aÞ
ðiiÞ Lð"
0
Þ > Lð"
00
Þ > 0if"
0
>"
00
> 0 ð18:3-3bÞ
ðiiiÞ Lð"Þ¼LðÀ"Þð18:3-3cÞ
Example loss functions are Lð"Þ¼"
2
and Lð"Þ¼j"j. Kalman [19] gives the
following very powerful optimal estimate theorem
Theorem 1 [5, pp. 610–611] The optimal estimate x
Ã
n;n
of x
n
based on the
observation Y
ðnÞ
is given by
x
Ã
n;n
¼ E½x
n
jY
ðnÞ
ð18:3-4Þ
If the conditional density function for x
n
given Y
ðnÞ
represented by pðx
n
jY
ðnÞ
Þ is
(a) unimodel and (b) symmetric about its conditional expectation E½x
n
jY
ðnÞ
.
The above theorem gives the amazing result that the optimum estimate
(18.3-4) is independent of the loss function as long as (18.3-3a) to (18.3-3c)
applies, it only depending on pðx
n
jY
ðnÞ
Þ. An example of a conditional density
function that satisfies conditions (a) and (b) is the Gaussian distribution.
KALMAN’S ORIGINAL RESULTS 377
In general, the conditional expectation E ½ x
n
jY
ðnÞ
is nonlinear and difficult
to compute. If the loss function is assumed to be the quadratic loss function
Lð"Þ¼"
2
, then conditions (a) and (b) above can be relaxed, it now only being
necessary for the conditional density function to have a finite second moment in
order for (18.3-4) to be optimal.
Before proceeding to Kalman’s second powerful theorem, the concept of
orthogonal projection for random variables must be introduced. Let
i
and
j
be two random variables. In vector terms these two random variables are
independent of each other if
i
is not just a constant multiple of
j
.
Furthermore, if [5, p. 611]
¼
i
i
þ
j
j
ð18:3-5Þ
is a linear combination of
i
and
j
, then is said to lie in the two-dimensional
space defined by
i
and
j
. A basis for this space can be formed using the
Gram–Schmidt orthogonalization procedure. Specifically, let [5, p. 611]
e
i
¼
i
ð18:3-6Þ
and
e
j
¼
j
À
Ef
i
j
g
Ef
2
i
g
i
ð18:3-7Þ
It is seen that
Efe
i
e
j
g¼0 i 6¼ j ð18:3-8Þ
The above equation represents the orthogonality condition. (The idea of
orthogonal projection for random variables follows by virtue of the one-for-one
analogy with the theory of linear vector space. Note that whereas in linear
algebra an inner product is used, here the expected value of the product of the
random variables is used.) If we normalize e
i
and e
j
by dividing by their
respective standard deviations, then we have ‘‘unit length’’ random variables
and form an orthonormal basis for the space defined by
i
and
j
. Let e
i
and e
j
now designate these orthonormal variables. Then
Efe
i
e
j
g¼
ij
ð18:3-9Þ
where
ij
is the Kronecker function, which equals 1 when i ¼ j and equals 0
otherwise.
Let be any random variable that is not necessarily a linear combination of
i
and
j
. Then the orthogonal projection of onto the
i
;
j
space is defined
by [5, p. 612]
"
¼ e
i
Efe
i
gþe
j
Ef e
j
gð18:3-10Þ
378 KALMAN FILTER REVISITED
Define
~
¼ À
"
ð18:3-11Þ
Then it is easy to see that [5, p. 612]
Ef
~
e
i
g¼0 ¼ Ef
~
e
j
gð18:3-12Þ
which indicates that
"
is orthogonal to the space
i
;
j
. Thus has been broken
up into two parts, the
"
part in the space
i
,
j
, called the orthogonal projection
of onto the
i
,
j
space, and the
~
part orthogonal to this space. The
above concept of orthogonality for random variables can be generalized to an
n-dimensional space. (A less confusing labeling than ‘‘orthogonal projection’’
would probably be just ‘‘projection.’’)
We are now ready to give Kalman’s important Theorem 2.
Theorem 2 [5, pp. 612–613] The optimum estimate x
Ã
n;n
of x
n
based on the
measurements Y
ðnÞ
is equal to the orthogonal projection of x
n
onto the space
defined by Y
ðnÞ
if
1. The random variables x
n
; y
0
; y
1
; ; y
n
all have zero mean and either
2. (a) x
n
and Y
ðnÞ
are just Gaussian or (b) the estimate is restricted to being a
linear function of the measurement Y
ðnÞ
and Lð"Þ¼"
2
.
The above optimum estimate is linear for the Gaussian case. This is because
the projection of x
n
onto Y
ðnÞ
is a linear combination of the element of Y
ðnÞ
. But
in the class of linear estimates the orthogonal projection always minimizes the
expected quadratic loss given by E ½ "
2
. Note that the more general estimate
given by Kalman’s Theorem 1 will not be linear.
Up till now the observations y
i
and the variable x
n
to be estimated were
assumed to be scaler. Kalman actually gives his results for the case where they
are vectors, and hence Kalman’s Theorem 1 and Theorem 2 apply when these
variables are vectors. We shall now apply Kalman’s Theorem 2 to obtain the
form of the Kalman filter given by him.
Let the target dynamics model be given by (18.2-1) and let the observation
scheme be given by [5, p. 613]
YðtÞ¼MðtÞXðtÞð18:3-13Þ
Note that Kalman, in giving (18.3-13), does not include any measurement noise
term NðtÞ. Because of this, the Kalman filter form he gives is different from that
given previously in this book (see Section 2.4). We shall later show that his
form can be transformed to be identical to the forms given earlier in this book.
The measurement YðtÞ given in (18.3-13) is assumed to be a vector. Let us
assume that observations are made at times i ¼ 0; 1; ; n and can be
KALMAN’S ORIGINAL RESULTS 379
represented by measurement vector given by
Y
ðnÞ
Y
ðnÞ
Y
nÀ1
.
.
.
Y
0
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
ð18:3-14Þ
What is desired is the estimate X
Ã
nþ1;n
of X
nþ1
, which minimizes E½Lð"Þ.
Applying Kalman’s Theorem 2, we find that the optimum estimate is given by
the projection of X
nþ1
onto Y
ðnÞ
of (18.3-14). In reference 19 Kalman shows
that this solution is given by the recursive relationships [5, p. 614]
Á
Ã
n
¼ Èðn þ 1; nÞP
Ã
n
M
T
n
ðM
n
P
Ã
n
M
T
n
Þ
À1
ð18:3-15aÞ
È
Ã
ðn þ 1; nÞ¼Èðn þ 1; nÞÀÁ
Ã
n
M
n
ð18:3-15bÞ
X
Ã
nþ1;n
¼ È
Ã
ðn þ 1; nÞX
Ã
n;nÀ1
þ Á
Ã
n
Y
n
ð18:3-15cÞ
P
Ã
nþ1
¼ È
Ã
ðn þ 1; nÞP
Ã
n
È
Ã
ðn þ 1; nÞ
T
þ Q
nþ1;n
ð18:3-15dÞ
The above form of the Kalman filter has essentially the notation used by
Kalman in reference 19; see also reference 5. Physically, Èðn þ 1; nÞ is the
transition matrix of the unforced system as specified by (18.2-3). Defined
earlier, M
n
is the observation matrix, Q
nþ1;n
is the covariance matrix of the
vector Vðt
nþ1
; t
n
Þ, and the matrix P
Ã
nþ1
is the covariance matrix of the estimate
X
Ã
nþ1:n
.
We will now put the Kalman filter given by (18.3-15a) to (18.3-15d) in the
form of (2.4-4a) to (2.4-4j) or basically (9.3-1) to (9.3-1d). The discrete version
of the target dynamics model of (18.2-3) can be written as [5, p. 614]
X
nþ1
¼ Èðn þ 1; nÞX
n
þ V
nþ1;n
ð18:3-16Þ
The observation equation with the measurement noise included can be written
as
Y
n
¼ M
n
X
n
þ N
n
ð18:3-17Þ
instead of (18.3-13), which does not include the measurement noise. Define an
augmented state vector [5, p. 614]
X
0
n
¼
X
n
N
n
2
4
3
5
ð18:3-18Þ
380 KALMAN FILTER REVISITED
and augmented driving noise vector [5, p. 615]
V
0
nþ1;n
¼
V
nþ1;n
N
nþ1
2
4
3
5
ð18:3-19Þ
Define also the augmented transition matrix [5, p. 615]
È
0
ðn þ 1; nÞ¼
Èðn þ 1; nÞj0
j
0 j 0
2
4
3
5
ð18:3-20Þ
and the augmented observation matrix
M
0
n
¼ðM
n
j IÞð18:3-21Þ
It then follows that (18.3-16) can be written as [5, p. 615]
X
0
nþ1
¼ È
0
ðn þ 1; nÞX
0
n
þ V
0
nþ1;n
ð18:3-22Þ
and (18.3-17) as [5, p. 615]
Y
n
¼ M
0
n
X
0
n
ð18:3-23Þ
which have the same identical forms as (18.2-3) and (18.3-13), respectively, and
to which Kalman’s Theorem 2 was applied to obtain (18.3-15). Replacing the
unprimed parameters of (8.3-15) with their above-primed parameters yields [5,
p. 616]
X
Ã
n;n
¼ X
Ã
n;nÀ1
þ H
n
ðY
n
À M
n
X
Ã
n;nÀ1
Þð18:3-24aÞ
H
n
¼ S
Ã
n;nÀ1
M
T
n
ðR
n
þ M
n
S
Ã
n;nÀ1
M
T
n
Þ
À1
ð18:3-24bÞ
S
Ã
n;n
¼ðI À H
n
M
n
ÞS
Ã
n;nÀ1
ð18:3-24cÞ
S
Ã
n;nÀ1
¼ Èðn; n À 1ÞS
Ã
nÀ1;nÀ1
Èðn; n À 1Þ
T
þ Q
n;nÀ1
ð18:3-24dÞ
X
Ã
n;nÀ1
¼ Èðn; n À 1ÞX
Ã
nÀ1;nÀ1
ð18:3-24eÞ
where Q
nþ1;n
is the covariance matrix of V
nþ1;n
and R
nþ1
is the covariance
matrix of N
nþ1
. The above form of the Kalman filter given by (18.3-24a) to
(18.3-24e) is essentially exactly that given by (2.4-4a) to (2.4-4j) and (9.3-1) to
(9.3-1d) when the latter two are extended to the case of a time-varying
dynamics model.
Comparing (9.3-1) to (9.3-1d) developed using the minimum-variance
estimate with (18.3-24a) to (18.3-24e) developed using the Kalman filter
projection theorem for minimizing the loss function, we see that they differ by
KALMAN’S ORIGINAL RESULTS 381
the presence of the Q term, the variance of the driving noise vector. It is
gratifying to see that the two radically different aproaches led to essentially the
same algorithms. Moreover, when the driving noise vector V goes to 0, then
(18.3-24a) to (18.3-24e) is essentially the same as given by (9.3-1) to (9.3-1d),
the Q term in (18.3-24d) dropping out. With V present X
n
is no longer
determined by X
nÀ1
completely. The larger the variance of V, the lower the
dependence of X
n
on X
nÀ1
and as a result the less the Kalman filter estimate
X
Ã
n;n
should and will depend on the past measurements. Put in another way the
larger V is the smaller the Kalman filter memory. The Kalman filter in effect
thus has a fading memory built into it. Viewed from another point of view, the
larger Q is in (18.3-24d) the larger S
Ã
n;nÀ1
becomes. The larger S
Ã
n;nÀ1
is the less
weight is given to X
Ã
n;nÀ1
in forming X
Ã
n;n
, which means that the filter memory is
fading faster.
The matrix Q is often introduced for purely practical reasons even if the
presence of a process noise term in the target dynamics model cannot be
justified. It can be used to counter the buildup of a bias error. The shorter the
filter memory the lower the bias error will be. The filter fading rate can be
controlled adaptively to prevent bias error buildup or to respond to a target
maneuver. This is done by observing the filter residual given by either
r
n
¼ðY
n
À M
n
X
Ã
n;n
Þ
T
ðY
n
À M
n
X
Ã
n;n
Þð18:3-25Þ
or
r
n
¼ðY
n
À M
n
X
Ã
n;n
Þ
T
ðS
Ã
n;n
Þ
À1
ðY
n
À M
n
X
Ã
n;n
Þð18:3-26Þ
The quantity
s
n
¼ Y
n
À M
n
X
Ã
n;n
ð18:3-27Þ
in the above two equations is often called the innovation process or just
innovation in the literature [7, 127]. The innovation process is white noise when
the optimum filter is being used.
Another benefit of the presence of Q in (18.3-24d) is that it prevents S
Ã
from
staying singular once it becomes singular for any reason at any given time. A
matrix is singular when its determinent is equal to zero. The matrix S
Ã
can
become singular when the observations being made at one instant of time are
perfect [5]. If this occurs, then the elements of H in (18.3-24a) becomes 0, and
H becomes singular. When this occurs, the Kalman filter without process noise
stops functioning — it no longer accepts new data, all new data being given a 0
weight by H ¼ 0. This is prevented when Q is present because if, for example,
S
Ã
nÀ1;nÀ1
is singular at time n À 1, the presence of Q
n;nÀ1
in (18.3-24d) will
make S
Ã
n;nÀ1
nonsingular.
382 KALMAN FILTER REVISITED
[...]... now the observations y i and the variable x n to be estimated were assumed to be scaler Kalman actually gives his results for the case where they are vectors, and hence Kalman s Theorem 1 and Theorem 2 apply when these variables are vectors We shall now apply Kalman s Theorem 2 to obtain the form of the Kalman filter given by him Let the target dynamics model be given by (18.2-1) and let the observation... longer determined by X nÀ1 completely The larger the variance of V, the lower the dependence of X n on X nÀ1 and as a result the less the Kalman filter estimate à X n;n should and will depend on the past measurements Put in another way the larger V is the smaller the Kalman filter memory The Kalman filter in effect thus has a fading memory built into it Viewed from another point of view, the à larger... assumed to be a vector Let us assume that observations are made at times i ¼ 0; 1; ; n and can be 380 KALMAN FILTER REVISITED represented by measurement vector given by 2 Y ðnÞ 3 Y ðnÞ 6 - 7 6 7 6 Y nÀ1 7 6 7 6 - 7 6 7 6 7 6 7 4 - 5 ð18:3-14Þ Y0 à What is desired is the estimate X nþ1;n of X nþ1 , which minimizes E½Lð"Þ Applying Kalman s Theorem 2, we find that the optimum estimate is given... n j IÞ ð18:3-21Þ It then follows that (18.3-16) can be written as [5, p 615] 0 0 0 X nþ1 ¼ È 0 ðn þ 1; nÞX n þ V nþ1;n ð18:3-22Þ and (18.3-17) as [5, p 615] 0 0 Y n ¼ M nX n ð18:3-23Þ which have the same identical forms as (18.2-3) and (18.3-13), respectively, and to which Kalman s Theorem 2 was applied to obtain (18.3-15) Replacing the unprimed parameters of (8.3-15) with their above-primed parameters... and R nþ1 is the covariance matrix of N nþ1 The above form of the Kalman filter given by (18.3-24a) to (18.3-24e) is essentially exactly that given by (2.4-4a) to (2.4-4j) and (9.3-1) to (9.3-1d) when the latter two are extended to the case of a time-varying dynamics model Comparing (9.3-1) to (9.3-1d) developed using the minimum-variance estimate with (18.3-24a) to (18.3-24e) developed using the Kalman. .. probably be just ‘‘projection.’’) We are now ready to give Kalman s important Theorem 2 Theorem 2 [5, pp 612–613] The optimum estimate x à of x n based on the n;n measurements Y ðnÞ is equal to the orthogonal projection of x n onto the space defined by Y ðnÞ if 1 The random variables x n ; y 0 ; y 1 ; ; y n all have zero mean and either 2 (a) x n and Y ðnÞ are just Gaussian or (b) the estimate is restricted... ¼À ð18:3-11Þ Then it is easy to see that [5, p 612] ~ ~ Efe i g ¼ 0 ¼ Efe j g ð18:3-12Þ " which indicates that is orthogonal to the space i ; j Thus has been broken " up into two parts, the part in the space i , j , called the orthogonal projection ~ of onto the i , j space, and the part orthogonal to this space The above concept of orthogonality for random variables can be generalized... 19 Kalman shows that this solution is given by the recursive relationships [5, p 614] à T à T Á à ¼ Èðn þ 1; nÞP n M n ðM n P n M n Þ À1 n È Ã ðn þ 1; nÞ ¼ Èðn þ 1; nÞ À Á à M n n nþ1;n ¼ È Ã ðn þ 1; nÞX à Pà ¼ È Ã ðn þ 1; nÞP Ã È Ã ðn þ 1; nÞ T þ Q nþ1;n n Xà nþ1 n;nÀ1 þ Á ÃY n n ð18:3-15aÞ ð18:3-15bÞ ð18:3-15cÞ ð18:3-15dÞ The above form of the Kalman filter has essentially the notation used by Kalman. .. Define an augmented state vector [5, p 614] 2 3 Xn 0 ð18:3-18Þ X n ¼ 4 5 Nn 381 KALMAN S ORIGINAL RESULTS and augmented driving noise vector [5, p 615] 2 3 V nþ1;n 0 V nþ1;n ¼ 4 - 5 N nþ1 Define also the augmented transition matrix [5, p 2 Èðn þ 1; nÞ È 0 ðn þ 1; nÞ ¼ 4 -0 ð18:3-19Þ 615] 3 j 0 5 j j 0 ð18:3-20Þ and the augmented observation matrix 0 M n ¼ ðM n j IÞ ð18:3-21Þ It then follows... time A matrix is singular when its determinent is equal to zero The matrix S à can become singular when the observations being made at one instant of time are perfect [5] If this occurs, then the elements of H in (18.3-24a) becomes 0, and H becomes singular When this occurs, the Kalman filter without process noise stops functioning — it no longer accepts new data, all new data being given a 0 weight by . square and as a result UðtÞ need not have the
same dimension as XðtÞ. The solution to the above linear differential equation is
375
Tracking and Kalman Filtering. vectors, and hence Kalman s Theorem 1 and Theorem 2 apply when these
variables are vectors. We shall now apply Kalman s Theorem 2 to obtain the
form of the Kalman