x CONTENTS9.2 Algorithms for maximum likelihood estimation 207 10.1.2 Mutual information as measure of dependence 222 10.4 Algorithms for minimization of mutual information 224 11.2 Tens
Trang 1Independent Component Analysis
Independent Component Analysis Aapo Hyv¨arinen, Juha Karhunen, Erkki Oja
Copyright 2001 John Wiley & Sons, Inc ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic)
Trang 2Independent Component Ana&sis
Aapo Hyvtirinen Juha Karhunen
Erkki Oja
A Wiley-Interscience Publication
JOHN WILEY & SONS, INC
New York / Chichester / Weinheim / Brisbane / Singapore / Toronto
Trang 3Designations used by companies to distinguish their products are often claimed as trademarks In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration.
Copyright 2001 by John Wiley & Sons, Inc All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system
or transmitted in any form or by any means, electronic or mechanical, including uploading, downloading, printing, decompiling, recording or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008, E-Mail: PERMREQ@WILEY.COM.
This publication is designed to provide accurate and authoritative information in regard to the subject matter covered It is sold with the understanding that the publisher is not engaged in rendering professional services If professional advice or other expert assistance is required, the services of a competent professional person should be sought.
ISBN 0-471-22131-7
This title is also available in print as ISBN 0-471-40540-X.
For more information about Wiley products, visit our web site at www.Wiley.com.
Trang 41.2.1 Observing mixtures of unknown signals 4 1.2.2 Source separation based on independence 5
Trang 5vi CONTENTS
2.8.2 Stationarity, mean, and autocorrelation 45
Trang 6CONTENTS vii
3.2 Learning rules for unconstrained optimization 63
3.2.3 The natural gradient and relative gradient 67
3.2.5 Convergence of stochastic on-line algorithms * 71 3.3 Learning rules for constrained optimization 73
4.4.2 Nonlinear and generalized least squares * 88
4.6.3 Maximum a posteriori (MAP) estimator 97
5.2.2 Definition using Kullback-Leibler divergence 110
Trang 7viii CONTENTS
5.3.2 Maximality property of gaussian distribution 112
5.5.2 Using expansions for entropy approximation 114 5.6 Approximation of entropy by nonpolynomial functions 115
5.6.2 Choosing the nonpolynomial functions 117
6.1.3 Choosing the number of principal components 129
6.2.1 The stochastic gradient ascent algorithm 133
6.2.4 PCA and back-propagation learning * 136 6.2.5 Extensions of PCA to nonquadratic criteria * 137
Trang 8CONTENTS ix
8.2.1 Extrema give independent components 171
8.2.3 A fast fixed-point algorithm using kurtosis 178
8.5.1 Searching for interesting directions 197
Trang 9x CONTENTS
9.2 Algorithms for maximum likelihood estimation 207
10.1.2 Mutual information as measure of dependence 222
10.4 Algorithms for minimization of mutual information 224
11.2 Tensor eigenvalues give independent components 230
11.4 Joint approximate diagonalization of eigenmatrices 234
Trang 10CONTENTS xi
12 ICA by Nonlinear Decorrelation and Nonlinear PCA 239
12.5 Equivariant adaptive separation via independence 247
12.8 Learning rules for the nonlinear PCA criterion 254
12.8.2 Convergence of the nonlinear subspace rule * 255 12.8.3 Nonlinear recursive least-squares rule 258
13.1.3 High-pass filtering and innovations 265
13.2.2 Reducing noise and preventing overlearning 268 13.3 How many components should be estimated? 269
14.2 Connections between ICA estimation principles 274 14.2.1 Similarities between estimation principles 274 14.2.2 Differences between estimation principles 275
14.3.1 Comparison of asymptotic variance * 276
Trang 11xii CONTENTS
14.4 Experimental comparison of ICA algorithms 280 14.4.1 Experimental set-up and algorithms 281
15.5 Estimation of the noise-free independent components 299
15.5.2 Special case of shrinkage estimation 300
16.1 Estimation of the independent components 306
16.1.2 The case of supergaussian components 307
16.2.2 Maximizing likelihood approximations 308 16.2.3 Approximate estimation by quasiorthogonality 309
Trang 12CONTENTS xiii
17.1.1 The nonlinear ICA and BSS problems 315 17.1.2 Existence and uniqueness of nonlinear ICA 317
17.3 Nonlinear BSS using self-organizing maps 320 17.4 A generative topographic mapping approach * 322
18.2 Separation by nonstationarity of variances 346
18.3.1 Comparison of separation principles 351 18.3.2 Kolmogoroff complexity as unifying framework 352
Trang 1319.2.5 Spatiotemporal decorrelation methods 367 19.2.6 Other methods for convolutive mixtures 367
Appendix Discrete-time filters and thez-transform 369
Trang 14CONTENTS xv
Part IV APPLICATIONS OF ICA
21.4 Image denoising by sparse code shrinkage 398
22.1.1 Classes of brain imaging techniques 407 22.1.2 Measuring electric activity in the brain 408
22.2 Artifact identification from EEG and MEG 410
22.4 ICA applied on other measurement techniques 413
23.1 Multiuser detection and CDMA communications 417
23.4 Blind separation of convolved CDMA mixtures * 430
Trang 1524.1.1 Finding hidden factors in financial data 441
Trang 16Independent component analysis (ICA) is a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals ICA defines a generative model for the observed multivariate data, which is typically given as a large database of samples In the model, the data variables are assumed to be linear or nonlinear mixtures of some unknown latent variables, and the mixing system is also unknown The latent variables are assumed nongaussian and mutually independent, and they are called the independent components of the observed data These independent components, also called sources or factors, can be found by ICA.
ICA can be seen as an extension to principal component analysis and factor analysis ICA is a much more powerful technique, however, capable of finding the underlying factors or sources when these classic methods fail completely.
The data analyzed by ICA could originate from many different kinds of tion fields, including digital images and document databases, as well as economic indicators and psychometric measurements In many cases, the measurements are given as a set of parallel signals or time series; the term blind source separation is used
applica-to characterize this problem Typical examples are mixtures of simultaneous speech signals that have been picked up by several microphones, brain waves recorded by multiple sensors, interfering radio signals arriving at a mobile phone, or parallel time series obtained from some industrial process.
The technique of ICA is a relatively new invention It was for the first time troduced in early 1980s in the context of neural network modeling In mid-1990s, some highly successful new algorithms were introduced by several research groups,
in-xvii
Trang 17xviii PREFACE
together with impressive demonstrations on problems like the cocktail-party effect, where the individual speech waveforms are found from their mixtures ICA became one of the exciting new topics, both in the field of neural networks, especially unsu- pervised learning, and more generally in advanced statistics and signal processing Reported real-world applications of ICA on biomedical signal processing, audio sig- nal separation, telecommunications, fault diagnosis, feature extraction, financial time series analysis, and data mining began to appear.
Many articles on ICA were published during the past 20 years in a large number
of journals and conference proceedings in the fields of signal processing, artificial neural networks, statistics, information theory, and various application fields Several special sessions and workshops on ICA have been arranged recently [70, 348], and some edited collections of articles [315, 173, 150] as well as some monographs on ICA, blind source separation, and related subjects [105, 267, 149] have appeared However, while highly useful for their intended readership, these existing texts typ- ically concentrate on some selected aspects of the ICA methods only In the brief scientific papers and book chapters, mathematical and statistical preliminaries are usually not included, which makes it very hard for a wider audience to gain full understanding of this fairly technical topic.
A comprehensive and detailed text book has been missing, which would cover both the mathematical background and principles, algorithmic solutions, and practical applications of the present state of the art of ICA The present book is intended to fill that gap, serving as a fundamental introduction to ICA.
It is expected that the readership will be from a variety of disciplines, such
as statistics, signal processing, neural networks, applied mathematics, neural and cognitive sciences, information theory, artificial intelligence, and engineering Both researchers, students, and practitioners will be able to use the book We have made every effort to make this book self-contained, so that a reader with a basic background
in college calculus, matrix algebra, probability theory, and statistics will be able to read it This book is also suitable for a graduate level university course on ICA, which is facilitated by the exercise problems and computer assignments given in many chapters.
Scope and contents of this book
This book provides a comprehensive introduction to ICA as a statistical and tational technique The emphasis is on the fundamental mathematical principles and basic algorithms Much of the material is based on the original research conducted
compu-in the authors’ own research group, which is naturally reflected compu-in the weightcompu-ing of the different topics We give a wide coverage especially to those algorithms that are scalable to large problems, that is, work even with a large number of observed vari- ables and data points These will be increasingly used in the near future when ICA
is extensively applied in practical real-world problems instead of the toy problems
or small pilot studies that have been predominant until recently Respectively,
Trang 18we may have overlooked.
For easier reading, the book is divided into four parts.
Part I gives the mathematical preliminaries It introduces the general
math-ematical concepts needed in the rest of the book We start with a crash course
on probability theory in Chapter 2 The reader is assumed to be familiar with most of the basic material in this chapter, but also some concepts more spe- cific to ICA are introduced, such as higher-order cumulants and multivariate probability theory Next, Chapter 3 discusses essential concepts in optimiza- tion theory and gradient methods, which are needed when developing ICA algorithms Estimation theory is reviewed in Chapter 4 A complementary theoretical framework for ICA is information theory, covered in Chapter 5 Part I is concluded by Chapter 6, which discusses methods related to principal component analysis, factor analysis, and decorrelation.
More confident readers may prefer to skip some or all of the introductory chapters in Part I and continue directly to the principles of ICA in Part II.
In Part II, the basic ICA model is covered and solved This is the linear
instantaneous noise-free mixing model that is classic in ICA, and forms the core
of the ICA theory The model is introduced and the question of identifiability of the mixing matrix is treated in Chapter 7 The following chapters treat different methods of estimating the model A central principle is nongaussianity, whose relation to ICA is first discussed in Chapter 8 Next, the principles of maximum likelihood (Chapter 9) and minimum mutual information (Chapter 10) are reviewed, and connections between these three fundamental principles are shown Material that is less suitable for an introductory course is covered
in Chapter 11, which discusses the algebraic approach using higher-order cumulant tensors, and Chapter 12, which reviews the early work on ICA based
on nonlinear decorrelations, as well as the nonlinear principal component approach Practical algorithms for computing the independent components and the mixing matrix are discussed in connection with each principle Next, some practical considerations, mainly related to preprocessing and dimension reduction of the data are discussed in Chapter 13, including hints to practitioners
on how to really apply ICA to their own problem An overview and comparison
of the various ICA methods is presented in Chapter 14, which thus summarizes Part II.
In Part III, different extensions of the basic ICA model are given This part is by
its nature more speculative than Part II, since most of the extensions have been introduced very recently, and many open problems remain In an introductory
Trang 19xx PREFACE
course on ICA, only selected chapters from this part may be covered First,
in Chapter 15, we treat the problem of introducing explicit observational noise
in the ICA model Then the situation where there are more independent components than observed mixtures is treated in Chapter 16 In Chapter 17, the model is widely generalized to the case where the mixing process can be of
a very general nonlinear form Chapter 18 discusses methods that estimate a linear mixing model similar to that of ICA, but with quite different assumptions: the components are not nongaussian but have some time dependencies instead Chapter 19 discusses the case where the mixing system includes convolutions Further extensions, in particular models where the components are no longer required to be exactly independent, are given in Chapter 20.
Part IV treats some applications of ICA methods Feature extraction
(Chap-ter 21) is relevant to both image processing and vision research Brain imaging applications (Chapter 22) concentrate on measurements of the electrical and magnetic activity of the human brain Telecommunications applications are treated in Chapter 23 Some econometric and audio signal processing applica- tions, together with pointers to miscellaneous other applications, are treated in Chapter 24.
Throughout the book, we have marked with an asterisk some sections that are rather involved and can be skipped in an introductory course.
Several of the algorithms presented in this book are available as public domain software through the World Wide Web, both on our own Web pages and those of other ICA researchers Also, databases of real-world data can be found there for testing the methods We have made a special Web page for this book, which contains appropriate pointers The address is
www.cis.hut.fi/projects/ica/book
The reader is advised to consult this page for further information.
This book was written in cooperation between the three authors A Hyv¨arinen was responsible for the chapters 5, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 20, 21, and 22;
J Karhunen was responsible for the chapters 2, 4, 17, 19, and 23; while E Oja was responsible for the chapters 3, 6, and 12 The Chapters 1 and 24 were written jointly
by the authors.
Acknowledgments
We are grateful to the many ICA researchers whose original contributions form the foundations of ICA and who have made this book possible In particular, we wish to express our gratitude to the Series Editor Simon Haykin, whose articles and books on signal processing and neural networks have been an inspiration to us over the years.