Tài liệu Independent Component Analysis - Contents doc

x CONTENTS9.2 Algorithms for maximum likelihood estimation 207 10.1.2 Mutual information as measure of dependence 222 10.4 Algorithms for minimization of mutual information 224 11.2 Tens

Trang 1

Independent Component Analysis

Independent Component Analysis Aapo Hyv¨arinen, Juha Karhunen, Erkki Oja

Copyright  2001 John Wiley & Sons, Inc ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic)

Trang 2

Independent Component Ana&sis

Aapo Hyvtirinen Juha Karhunen

Erkki Oja

A Wiley-Interscience Publication

JOHN WILEY & SONS, INC

New York / Chichester / Weinheim / Brisbane / Singapore / Toronto

Trang 3

Designations used by companies to distinguish their products are often claimed as trademarks In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration.

No part of this publication may be reproduced, stored in a retrieval system

or transmitted in any form or by any means, electronic or mechanical, including uploading, downloading, printing, decompiling, recording or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008, E-Mail: PERMREQ@WILEY.COM.

This publication is designed to provide accurate and authoritative information in regard to the subject matter covered It is sold with the understanding that the publisher is not engaged in rendering professional services If professional advice or other expert assistance is required, the services of a competent professional person should be sought.

ISBN 0-471-22131-7

This title is also available in print as ISBN 0-471-40540-X.

For more information about Wiley products, visit our web site at www.Wiley.com.

Trang 4

1.2.1 Observing mixtures of unknown signals 4 1.2.2 Source separation based on independence 5

Trang 5

vi CONTENTS

2.8.2 Stationarity, mean, and autocorrelation 45

Trang 6

CONTENTS vii

3.2 Learning rules for unconstrained optimization 63

3.2.3 The natural gradient and relative gradient 67

3.2.5 Convergence of stochastic on-line algorithms * 71 3.3 Learning rules for constrained optimization 73

4.4.2 Nonlinear and generalized least squares * 88

4.6.3 Maximum a posteriori (MAP) estimator 97

5.2.2 Definition using Kullback-Leibler divergence 110

Trang 7

viii CONTENTS

5.3.2 Maximality property of gaussian distribution 112

5.5.2 Using expansions for entropy approximation 114 5.6 Approximation of entropy by nonpolynomial functions 115

5.6.2 Choosing the nonpolynomial functions 117

6.1.3 Choosing the number of principal components 129

6.2.1 The stochastic gradient ascent algorithm 133

6.2.4 PCA and back-propagation learning * 136 6.2.5 Extensions of PCA to nonquadratic criteria * 137

Trang 8

CONTENTS ix

8.2.1 Extrema give independent components 171

8.2.3 A fast fixed-point algorithm using kurtosis 178

8.5.1 Searching for interesting directions 197

Trang 9

x CONTENTS

9.2 Algorithms for maximum likelihood estimation 207

10.1.2 Mutual information as measure of dependence 222

10.4 Algorithms for minimization of mutual information 224

11.2 Tensor eigenvalues give independent components 230

11.4 Joint approximate diagonalization of eigenmatrices 234

Trang 10

CONTENTS xi

12 ICA by Nonlinear Decorrelation and Nonlinear PCA 239

12.5 Equivariant adaptive separation via independence 247

12.8 Learning rules for the nonlinear PCA criterion 254

12.8.2 Convergence of the nonlinear subspace rule * 255 12.8.3 Nonlinear recursive least-squares rule 258

13.1.3 High-pass filtering and innovations 265

13.2.2 Reducing noise and preventing overlearning 268 13.3 How many components should be estimated? 269

14.2 Connections between ICA estimation principles 274 14.2.1 Similarities between estimation principles 274 14.2.2 Differences between estimation principles 275

14.3.1 Comparison of asymptotic variance * 276

Trang 11

xii CONTENTS

14.4 Experimental comparison of ICA algorithms 280 14.4.1 Experimental set-up and algorithms 281

15.5 Estimation of the noise-free independent components 299

15.5.2 Special case of shrinkage estimation 300

16.1 Estimation of the independent components 306

16.1.2 The case of supergaussian components 307

16.2.2 Maximizing likelihood approximations 308 16.2.3 Approximate estimation by quasiorthogonality 309

Trang 12

CONTENTS xiii

17.1.1 The nonlinear ICA and BSS problems 315 17.1.2 Existence and uniqueness of nonlinear ICA 317

17.3 Nonlinear BSS using self-organizing maps 320 17.4 A generative topographic mapping approach * 322

18.2 Separation by nonstationarity of variances 346

18.3.1 Comparison of separation principles 351 18.3.2 Kolmogoroff complexity as unifying framework 352

Trang 13

19.2.5 Spatiotemporal decorrelation methods 367 19.2.6 Other methods for convolutive mixtures 367

Appendix Discrete-time filters and thez-transform 369

Trang 14

CONTENTS xv

Part IV APPLICATIONS OF ICA

21.4 Image denoising by sparse code shrinkage 398

22.1.1 Classes of brain imaging techniques 407 22.1.2 Measuring electric activity in the brain 408

22.2 Artifact identification from EEG and MEG 410

22.4 ICA applied on other measurement techniques 413

23.1 Multiuser detection and CDMA communications 417

23.4 Blind separation of convolved CDMA mixtures * 430

Trang 15

24.1.1 Finding hidden factors in financial data 441

Trang 16

Independent component analysis (ICA) is a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals ICA defines a generative model for the observed multivariate data, which is typically given as a large database of samples In the model, the data variables are assumed to be linear or nonlinear mixtures of some unknown latent variables, and the mixing system is also unknown The latent variables are assumed nongaussian and mutually independent, and they are called the independent components of the observed data These independent components, also called sources or factors, can be found by ICA.

ICA can be seen as an extension to principal component analysis and factor analysis ICA is a much more powerful technique, however, capable of finding the underlying factors or sources when these classic methods fail completely.

The data analyzed by ICA could originate from many different kinds of tion fields, including digital images and document databases, as well as economic indicators and psychometric measurements In many cases, the measurements are given as a set of parallel signals or time series; the term blind source separation is used

applica-to characterize this problem Typical examples are mixtures of simultaneous speech signals that have been picked up by several microphones, brain waves recorded by multiple sensors, interfering radio signals arriving at a mobile phone, or parallel time series obtained from some industrial process.

The technique of ICA is a relatively new invention It was for the first time troduced in early 1980s in the context of neural network modeling In mid-1990s, some highly successful new algorithms were introduced by several research groups,

in-xvii

Trang 17

xviii PREFACE

together with impressive demonstrations on problems like the cocktail-party effect, where the individual speech waveforms are found from their mixtures ICA became one of the exciting new topics, both in the field of neural networks, especially unsu- pervised learning, and more generally in advanced statistics and signal processing Reported real-world applications of ICA on biomedical signal processing, audio signal separation, telecommunications, fault diagnosis, feature extraction, financial time series analysis, and data mining began to appear.

Many articles on ICA were published during the past 20 years in a large number

of journals and conference proceedings in the fields of signal processing, artificial neural networks, statistics, information theory, and various application fields Several special sessions and workshops on ICA have been arranged recently [70, 348], and some edited collections of articles [315, 173, 150] as well as some monographs on ICA, blind source separation, and related subjects [105, 267, 149] have appeared However, while highly useful for their intended readership, these existing texts typically concentrate on some selected aspects of the ICA methods only In the brief scientific papers and book chapters, mathematical and statistical preliminaries are usually not included, which makes it very hard for a wider audience to gain full understanding of this fairly technical topic.

A comprehensive and detailed text book has been missing, which would cover both the mathematical background and principles, algorithmic solutions, and practical applications of the present state of the art of ICA The present book is intended to fill that gap, serving as a fundamental introduction to ICA.

It is expected that the readership will be from a variety of disciplines, such

as statistics, signal processing, neural networks, applied mathematics, neural and cognitive sciences, information theory, artificial intelligence, and engineering Both researchers, students, and practitioners will be able to use the book We have made every effort to make this book self-contained, so that a reader with a basic background

in college calculus, matrix algebra, probability theory, and statistics will be able to read it This book is also suitable for a graduate level university course on ICA, which is facilitated by the exercise problems and computer assignments given in many chapters.

Scope and contents of this book

This book provides a comprehensive introduction to ICA as a statistical and tational technique The emphasis is on the fundamental mathematical principles and basic algorithms Much of the material is based on the original research conducted

compu-in the authors’ own research group, which is naturally reflected compu-in the weightcompu-ing of the different topics We give a wide coverage especially to those algorithms that are scalable to large problems, that is, work even with a large number of observed variables and data points These will be increasingly used in the near future when ICA

is extensively applied in practical real-world problems instead of the toy problems

or small pilot studies that have been predominant until recently Respectively,

Trang 18

we may have overlooked.

For easier reading, the book is divided into four parts.

Part I gives the mathematical preliminaries It introduces the general

math-ematical concepts needed in the rest of the book We start with a crash course

on probability theory in Chapter 2 The reader is assumed to be familiar with most of the basic material in this chapter, but also some concepts more spe- cific to ICA are introduced, such as higher-order cumulants and multivariate probability theory Next, Chapter 3 discusses essential concepts in optimization theory and gradient methods, which are needed when developing ICA algorithms Estimation theory is reviewed in Chapter 4 A complementary theoretical framework for ICA is information theory, covered in Chapter 5 Part I is concluded by Chapter 6, which discusses methods related to principal component analysis, factor analysis, and decorrelation.

More confident readers may prefer to skip some or all of the introductory chapters in Part I and continue directly to the principles of ICA in Part II.

In Part II, the basic ICA model is covered and solved This is the linear

instantaneous noise-free mixing model that is classic in ICA, and forms the core

of the ICA theory The model is introduced and the question of identifiability of the mixing matrix is treated in Chapter 7 The following chapters treat different methods of estimating the model A central principle is nongaussianity, whose relation to ICA is first discussed in Chapter 8 Next, the principles of maximum likelihood (Chapter 9) and minimum mutual information (Chapter 10) are reviewed, and connections between these three fundamental principles are shown Material that is less suitable for an introductory course is covered

in Chapter 11, which discusses the algebraic approach using higher-order cumulant tensors, and Chapter 12, which reviews the early work on ICA based

on nonlinear decorrelations, as well as the nonlinear principal component approach Practical algorithms for computing the independent components and the mixing matrix are discussed in connection with each principle Next, some practical considerations, mainly related to preprocessing and dimension reduction of the data are discussed in Chapter 13, including hints to practitioners

on how to really apply ICA to their own problem An overview and comparison

of the various ICA methods is presented in Chapter 14, which thus summarizes Part II.

In Part III, different extensions of the basic ICA model are given This part is by

its nature more speculative than Part II, since most of the extensions have been introduced very recently, and many open problems remain In an introductory

Trang 19

xx PREFACE

course on ICA, only selected chapters from this part may be covered First,

in Chapter 15, we treat the problem of introducing explicit observational noise

in the ICA model Then the situation where there are more independent components than observed mixtures is treated in Chapter 16 In Chapter 17, the model is widely generalized to the case where the mixing process can be of

a very general nonlinear form Chapter 18 discusses methods that estimate a linear mixing model similar to that of ICA, but with quite different assumptions: the components are not nongaussian but have some time dependencies instead Chapter 19 discusses the case where the mixing system includes convolutions Further extensions, in particular models where the components are no longer required to be exactly independent, are given in Chapter 20.

Part IV treats some applications of ICA methods Feature extraction

(Chap-ter 21) is relevant to both image processing and vision research Brain imaging applications (Chapter 22) concentrate on measurements of the electrical and magnetic activity of the human brain Telecommunications applications are treated in Chapter 23 Some econometric and audio signal processing applications, together with pointers to miscellaneous other applications, are treated in Chapter 24.

Throughout the book, we have marked with an asterisk some sections that are rather involved and can be skipped in an introductory course.

Several of the algorithms presented in this book are available as public domain software through the World Wide Web, both on our own Web pages and those of other ICA researchers Also, databases of real-world data can be found there for testing the methods We have made a special Web page for this book, which contains appropriate pointers The address is

www.cis.hut.fi/projects/ica/book

The reader is advised to consult this page for further information.

This book was written in cooperation between the three authors A Hyv¨arinen was responsible for the chapters 5, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 20, 21, and 22;

J Karhunen was responsible for the chapters 2, 4, 17, 19, and 23; while E Oja was responsible for the chapters 3, 6, and 12 The Chapters 1 and 24 were written jointly

by the authors.

Acknowledgments

We are grateful to the many ICA researchers whose original contributions form the foundations of ICA and who have made this book possible In particular, we wish to express our gratitude to the Series Editor Simon Haykin, whose articles and books on signal processing and neural networks have been an inspiration to us over the years.

Tiêu đề	Independent Component Analysis
Tác giả	Aapo Hyvärinen, Juha Karhunen, Erkki Oja
Trường học	John Wiley & Sons, Inc.
Thể loại	publication
Năm xuất bản	2001
Thành phố	New York

Định dạng
Số trang	26
Dung lượng	189,17 KB