24
Other Applications
In this chapter, we consider some further applications of independent component
analysis (ICA), including analysis of financial time series and audio signal separation.
24.1 FINANCIAL APPLICATIONS
24.1.1 Finding hidden factors in financial data
It is tempting to try ICA on financial data. There are many situations in which parallel
financial time series are available, such as currency exchange rates or daily returns
of stocks, that may have some common underlying factors. ICA might reveal some
driving mechanisms that otherwise remain hidden.
In a study of a stock portfolio [22], it was found that ICA is a complementary tool
to principal componentanalysis (PCA), allowing the underlying structure of the data
to be more readily observed. If one could find the maximally independent mixtures
of the original stocks, i.e., portfolios, this might help in minimizing the risk in the
investment strategy.
In [245], we applied ICA on a different problem: the cashflow of several stores
belonging to the same retail chain, trying to find the fundamental factors common
to all stores that affect the cashflow. Thus, the effect of the factors specific to any
particular store, i.e., the effect of the managerial actions taken at the individual store
and in its local environment, could be analyzed.
In this case, the mixtures in the ICA model are parallel financial time series ,
with indexing the individual time series, and denoting discrete time.
441
Independent Component Analysis. Aapo Hyv
¨
arinen, Juha Karhunen, Erkki Oja
Copyright
2001 John Wiley & Sons, Inc.
ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic)
442
OTHER APPLICATIONS
We assume the instantaneous ICA model
(24.1)
for each time series
. Thus the effect of each time-varying underlying factor or
independent component on the measured time series is approximately linear.
The assumption of having some underlying independent components in this spe-
cific application may not be unrealistic. For example, factors like seasonal variations
due to holidays and annual variations, and factors having a sudden effect on the
purchasing power of the customers, like price changes of various commodities, can
be expected to have an effect on all the retail stores, and such factors can be assumed
to be roughly independent of each other. Yet, depending on the policy and skills of
the individual manager, e.g., advertising efforts, the effect of the factors on the cash
flow of specific retail outlets are slightly different. By ICA, it is possible to isolate
both the underlying factors and the effect weights, thus also making it possible to
group the stores on the basis of their managerial policies using only the cash flow
time series data.
The data consisted of the weekly cash flow in 40 stores that belong to the same
retail chain, covering a time span of 140 weeks. Some examples of the original data
are shown in Fig. 24.1. The weeks of a year are shown on the horizontal axis,
starting from the first week in January. Thus for example the heightened Christmas
sales are visible in each time series before and during week 51 in both of the full
years shown.
The data were first prewhitened using PCA. The original 40-dimensional signal
vectors were projected to the subspace spanned by four principal components, and the
variances were normalized to 1. Thus the dimension of the signal space was strongly
decreased from 40. A problem in this kind of real world application is that there
is no prior knowledge on the number of independent components. Sometimes the
eigenvalue spectrum of the data covariance matrix can be used, as shown in Chapter
6, but in this case the eigenvalues decreased rather smoothly without indicating any
clear signal subspace dimension. Then the only way is to try different dimensions.
If the independent components that are found using different dimensions for the
whitened data are the same or very similar, we can trust that they are not just artifacts
produced by the compression, but truly indicate some underlying factors in the data.
Using the FastICA algorithm, four independent components (ICs)
were estimated. As depicted in Fig. 24.2, the FastICA algorithm has found
several clearly different fundamental factors hidden in the original data.
The factors have different interpretations. The topmost factor follows the sudden
changes that are caused by holidays etc.; the most prominent example is Christmas
time. The factor in the bottom row, on the other hand, reflects the slower seasonal
variation, with the effect of the summer holidays clearly visible. The factor in the
third row could represent a still slower variation, something resembling a trend. The
last factor, in the second row, is different from the others; it might be that this factor
follows mostly the relative competitive position of the retail chain with respect to its
competitors, but other interpretations are also possible.
FINANCIAL APPLICATIONS
443
1 20 40 8 28 48 16
1 20 40 8 28 48 16
1 20 40 8 28 48 16
1 20 40 8 28 48 16
1 20 40 8 28 48 16
Fig. 24.1
Five samples of the 40 original cashflow time series (mean remo ved, normalized
to unit standard deviation). Horizontal axis: time in weeks over 140 weeks. ( Adapted from
[245].)
If five ICs are estimated instead of four, then three of the found components stay
virtually the same, while the fourth one separates into two new components. Using
the found mixing coefficients
, it is also possible to analyze the original time series
and cluster them in groups. More details on the experiments and their interpretation
can be found in [245].
24.1.2 Time series prediction by ICA
As noted in Chapter 18, the ICA transformation tends to produce component signals,
, that can be compressed with fewer bits than the original signals, .They
are thus more structured and regular. This gives motivation to try to predict the
signals by first going to the ICA space, doing the prediction there, and then
transforming back to the original time series, as suggested by [362]. The prediction
can be done separately and with a different method for each component, depending
on its time structure. Hence, some interaction from the user may be needed in the
overall prediction procedure. Another possibility would be to formulate the ICA
contrast function in the first place so that it includes the prediction errors — some
work along these lines has been reported by [437].
In [289], we suggested the following b asic procedure:
1. After subtracting the mean of each time series and prewhitening (after which
each time series has zero mean and unit variance), the independent components
444
OTHER APPLICATIONS
1 20 40 8 28 48 16
1 20 40 8 28 48 16
1 20 40 8 28 48 16
1 20 40 8 28 48 16
Fig. 24.2
Four independent components o r fundamental f actors found from the cashflow
data. (Adapted from [245].)
, and the mixing matrix, , are estimated using the FastICA algorithm.
The number of ICs can be variable.
2. For each component , a suitable nonlinear filtering is applied to reduce the
effects of noise — smoothing for components that contain very low frequen-
cies (trend, slow cyclical variations), and high-pass filtering for components
containing high frequencies and/or sudden shocks. The nonlinear smoothing
is done by applying smoothing functions on the source signals ,
(24.2)
3. Each smoothed independentcomponent is predicted separately, for instance
using some method of autoregressive (AR) modeling [455]. The prediction is
done for a number of steps into the future. This is done by applying prediction
functions,
, on the smoothed source signals, :
(24.3)
The next time steps are predicted by gliding the window of length over the
measured and predicted values of the smoothed signal.
4. The predictions for each independentcomponent are combined by weighing
them with the mixing coefficients, , thus obtaining the predictions, ,
for the original time series, :
(24.4)
and similarly for .
FINANCIAL APPLICATIONS
445
Fig. 24.3
Prediction of real-world financial data: the upper figure represents the actual
future outcome of one of the o riginal mixtures and the lower one the forecast obt ained using
ICA prediction for an interval of 50 values.
To test the method, we applied our algorithm on a set of 10 foreign exchange
rate time series. Again, we suppose that there are some independent factors that
affect the time evolution of such time series. Economic indicators, interest rates, and
psychological factors can be the underlying factors of exchange rates, as they are
closely tied to the evolution of the currencies. Even without prediction, some of the
ICs may be useful in analyzing the impact of different external phenomena on the
foreign exchange rates [22].
The results were promising, as the ICA prediction performed better than direct
prediction. Figure 24.3 shows an example of prediction using our method. The
upper figure represents one of the original time series (mixtures) and the lower one
the forecast obtained using ICA prediction for a future interval of 50 time steps. The
algorithm seemed to predict very well especially the turning points. In Table 24.1
there is a comparison of errors obtained by applying classic AR prediction to the
original time series directly, and our method outlined above. The right-most column
shows the magnitude of the errors when no smoothing is applied to the currencies.
While ICA and AR prediction are linear techniques, the smoothing was nonlinear.
Using nonlinear smoothing, optimized for each independentcomponent time series
separately, the prediction of the ICs is more accurately performed and the results also
are different from the direct prediction of the original time series. The noise in the
time series is strongly reduced, allowing a better prediction of the underlying factors.
The model is flexible and allows various smoothing tolerances and different orders
in the classic AR prediction method for each independent component.
In reality, especially in real world time series analysis, the data are distorted by
delays, noise, and nonlinearities. Some of these could be handled by extensions of
the basic ICA algorithms, as reported in Part III of this book.
446
OTHER APPLICATIONS
Table 24.1
The prediction errors (in units of 0.001) obtained with our method and the classic
AR method. Ten currency time series were considered and five independent components were
used. The amount of smoothing in classic A R prediction was varied.
Errors
Smoothing in 2 0.5 0.1 0.08 0.06 0.05 0
AR prediction
ICA prediction 2.3 2.3 2.3 2.3 2.3 2.3 2.3
AR prediction 9.7 9.1 4.7 3.9 3.4 3.1 4.2
24.2 AUDIO SEPARATION
One of the original motivations for ICA research was the cocktail-party problem, as
reviewed in the beginning of Chapter 7. The idea is that there are sound sources
recorded by a numberof microphones, and we want to separate just one of the sources.
In fact, often there is just one interesting signal, for example, a person speaking to
the microphone, and all the other sources can be considered as noise; in this case,
we have a problem of noise canceling. A typical example of a situation where we
want to separate noise (or interference) from a speech signal is a person talking to a
mobile phone in a noisy car.
If there is just one microphone, one can attempt to cancel the noise by ordinary
noise canceling methods: linear filtering, or perhaps more sophisticated techniques
like wavelet and sparse code shrinkage (Section 15.6). Such noise canceling can be
rather unsatisfactory, however. It works only if the noise has spectral characteristics
that are clearly different from those of the speech signal. One might wish to remove
the noise more effectively by collecting more data using several microphones. Since
in real-life situations the positions of the microphones with respect to the sources
can be rather arbitrary, the mixing process is not known, and it has to be estimated
blindly. In this case, we find the ICA model, and the problem is one of blind source
separation.
Blind separation of audio signals is, however, much more difficult than one might
expect. This is because the basic ICA model is a very crude approximation of the
real mixing process. In fact, here we encounter almost all the complications that we
have discussed in Part III:
The mixing is not instantaneous. Audio signals propagate rather slowly, and
thus they arrive in the microphones at different times. Moreover, there are
echos, especially if the recording is made in a room. Thus the problem is more
adequately modeled by a convolutive version of the ICA model (Chapter 19).
The situation is thus much more complicated than with the separation of mag-
netoencephalographic (MEG) signals, which propagate fast, or with feature
AUDIO SEPARATION
447
extraction, where no time delays are possible even in theory. In fact, even the
basic convolutive ICA model may not be enough because the time delays may
be fractional and may not be adequately modeled as integer multiples of the
time interval between two samples.
Typically, the recordings are made with two microphones only. However, the
number of source signals is probably much larger than 2 in most cases, since
the noise sources may not form just one well-defined source. Thus we have
the problem of overcomplete bases (Chapter 16).
The nonstationarity of the mixing is another important problem. The mixing
matrix may change rather quickly, due to changes in the constellation of the
speaker and the microphones. For example, one of these may be moving with
respect to the other, or the speaker may simply turn his head. This implies
that the mixing matrix must be reestimated quickly in a limited time frame,
which also means a limited number of data. Adaptive estimation methods may
alleviate this problem somewhat, but this is still a serious problem due to the
convolutive nature of the mixing. In the convolutive mixing, the number of
parameters can be very large: For example, the convolution may be modeled
by filters of the length of 1000 time points, which effectively multiplies the
number of parameters in the model by 1000. Since the number of data points
should grow with the number of parameters to obtain satisfactory estimates, it
may be next to impossible to estimate the model with the small number of data
points that one has time to collect before the mixing matrix has changed too
much.
Noise may be considerable. There may be strong sensor noise, which means
that we should use the noisy ICA model (Chapter 15). The noise complicates
the estimation of the ICA model quite considerably,even in the basic case where
noise is assumed gaussian. On the other hand, the effect of overcomplete bases
could be modeled as noise as well. This noise may not be very gaussian,
however, making the problem even more difficult.
Due to these complications, it may be that the prior information, independence
and nongaussianity of the source signals, are not enough. To estimate the convolutive
ICA model with a large number of parameters, and a rapidly changing mixing
matrix, may require more information on the signals and the matrix. First, one
may need to combine the assumption of nongaussianity with the different time-
structure assumptions in Chapter 18. Speech signals have autocorrelations and
nonstationarities, so this information could be used [267, 216]. Second, one may need
to use some information on the mixing. For example, sparse priors (Section 20.1.3)
could be used.
It is also possible that real-life speech separation requires sophisticated modeling
of speech signals. Speech signals are highly structured, autocorrelations and nonsta-
tionarity being just the very simplest aspects of their time structure. Such approaches
were proposed in [54, 15].
448
OTHER APPLICATIONS
Because of these complications, audio separation is a largely unsolved problem.
For a recent review on the subject, see [429]. One of the main theoretical problems,
estimation of the convolutive ICA model, was described in Chapter 19.
24.3 FURTHER APPLICATIONS
Among further applications, let us mention
Text document analysis [219, 229, 251]
Radiocommunications [110, 77]
Rotating machine monitoring [475]
Seismic monitoring [161]
Reflection canceling [127]
Nuclear magnetic resonance spectroscopy [321]
Selective transmission, which is a dual problem of blind source separation. A
set of independent source signals are adaptively premixed prior to a nondis-
persive physical mixing process so that each source can be independently
monitored in the far field [117].
Further applications can be found in the proceedings of the ICA’99 and ICA2000
workshops [70, 348].
. 24
Other Applications
In this chapter, we consider some further applications of independent component
analysis (ICA), including analysis of financial. time.
441
Independent Component Analysis. Aapo Hyv
¨
arinen, Juha Karhunen, Erkki Oja
Copyright
2001 John Wiley & Sons, Inc.
ISBNs: 0-4 7 1-4 0540-X (Hardback);