Springer Series on Signals and Communication Technology Signals and Communication Technology Circuits and Systems Based on Delta Modulation Linear, Nonlinear and Mixed Mode Processing D.G Zrilic ISBN 3-540-23751-8 Functional Structures in Networks AMLn – A Language for Model Driven Development of Telecom Systems T Muth ISBN 3-540-22545-5 RadioWave Propagation for Telecommunication Applications H Sizun ISBN 3-540-40758-8 Electronic Noise and Interfering Signals Principles and Applications G Vasilescu ISBN 3-540-40741-3 DVB The Family of International Standards for Digital Video Broadcasting, 2nd ed U Reimers ISBN 3-540-43545-X Digital Interactive TV and Metadata Future Broadcast Multimedia A Lugmayr, S Niiranen, and S Kalli ISBN 3-387-20843-7 Adaptive Antenna Arrays Trends and Applications S Chandran (Ed.) ISBN 3-540-20199-8 Digital Signal Processing with Field Programmable Gate Arrays U Meyer-Baese ISBN 3-540-21119-5 Neuro-Fuzzy and Fuzzy Neural Applications in Telecommunications P Stavroulakis (Ed.) ISBN 3-540-40759-6 SDMA for Multipath Wireless Channels Limiting Characteristics and Stochastic Models I.P Kovalyov ISBN 3-540-40225-X Digital Television A Practical Guide for Engineers W Fischer ISBN 3-540-01155-2 Multimedia Communication Technology Representation, Transmission and Identification of Multimedia Signals J.R Ohm ISBN 3-540-01249-4 Information Measures Information and its Description in Science and Engineering C Arndt ISBN 3-540-40855-X Processing of SAR Data Fundamentals, Signal Processing, Interferometry A Hein ISBN 3-540-05043-4 Chaos-Based Digital Communication Systems Operating Principles, Analysis Methods, and Performance Evalutation F.C.M Lau and C.K Tse ISBN 3-540-00602-8 Adaptive Signal Processing Application to Real-World Problems J Benesty and Y Huang (Eds.) ISBN 3-540-00051-8 Multimedia Information Retrieval and Management Technological Fundamentals and Applications D Feng, W.C Siu, and H.J Zhang (Eds.) ISBN 3-540-00244-8 Structured Cable Systems A.B Semenov, S.K Strizhakov, and I.R Suncheley ISBN 3-540-43000-8 UMTS The Physical Layer of the Universal Mobile Telecommunications System A Springer and R Weigel ISBN 3-540-42162-9 Advanced Theory of Signal Detection Weak Signal Detection in Generalized Obeservations I Song, J Bae, and S.Y Kim ISBN 3-540-43064-4 Wireless Internet Access over GSM and UMTS M Taferner and E Bonek ISBN 3-540-42551-9 The Variational Bayes Method in Signal Processing ˇ ıdl and A Quinn V Sm´ ISBN 3-540-28819-8 ˇ ıdl V´aclav Sm´ Anthony Quinn The Variational Bayes Method in Signal Processing With 65 Figures 123 ˇ ıdl Dr V´aclav Sm´ Institute of Information Theory and Automation Academy of Sciences of the Czech Republic, Department of Adaptive Systems PO Box 18, 18208 Praha 8, Czech Republic E-mail: smidl@utia.cas.cz Dr Anthony Quinn Department of Electronic and Electrical Engineering University of Dublin, Trinity College Dublin 2, Ireland E-mail: aquinn@tcd.ie ISBN-10 3-540-28819-8 Springer Berlin Heidelberg New York ISBN-13 978-3-540-28819-0 Springer Berlin Heidelberg New York Library of Congress Control Number: 2005934475 This work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specif ically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microf ilm or in any other way, and storage in data banks Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag Violations are liable to prosecution under the German Copyright Law Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2006 Printed in Germany The use of general descriptive names, registered names, trademarks, etc in this publication does not imply, even in the absence of a specif ic statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use Typesetting and production: SPI Publisher Services Cover design: design & production GmbH, Heidelberg Printed on acid-free paper SPIN: 11370918 62/3100/SPI - Do mo Thuismitheoirí A.Q Preface Gaussian linear modelling cannot address current signal processing demands In modern contexts, such as Independent Component Analysis (ICA), progress has been made specifically by imposing non-Gaussian and/or non-linear assumptions Hence, standard Wiener and Kalman theories no longer enjoy their traditional hegemony in the field, revealing the standard computational engines for these problems In their place, diverse principles have been explored, leading to a consequent diversity in the implied computational algorithms The traditional on-line and data-intensive preoccupations of signal processing continue to demand that these algorithms be tractable Increasingly, full probability modelling (the so-called Bayesian approach)—or partial probability modelling using the likelihood function—is the pathway for design of these algorithms However, the results are often intractable, and so the area of distributional approximation is of increasing relevance in signal processing The Expectation-Maximization (EM) algorithm and Laplace approximation, for example, are standard approaches to handling difficult models, but these approximations (certainty equivalence, and Gaussian, respectively) are often too drastic to handle the high-dimensional, multi-modal and/or strongly correlated problems that are encountered Since the 1990s, stochastic simulation methods have come to dominate Bayesian signal processing Markov Chain Monte Carlo (MCMC) sampling, and related methods, are appreciated for their ability to simulate possibly high-dimensional distributions to arbitrary levels of accuracy More recently, the particle filtering approach has addressed on-line stochastic simulation Nevertheless, the wider acceptability of these methods—and, to some extent, Bayesian signal processing itself— has been undermined by the large computational demands they typically make The Variational Bayes (VB) method of distributional approximation originates— as does the MCMC method—in statistical physics, in the area known as Mean Field Theory Its method of approximation is easy to understand: conditional independence is enforced as a functional constraint in the approximating distribution, and the best such approximation is found by minimization of a Kullback-Leibler divergence (KLD) The exact—but intractable—multivariate distribution is therefore factorized into a product of tractable marginal distributions, the so-called VB-marginals This straightforward proposal for approximating a distribution enjoys certain opti- VIII Preface mality properties What is of more pragmatic concern to the signal processing community, however, is that the VB-approximation conveniently addresses the following key tasks: The inference is focused (or, more formally, marginalized) onto selected subsets of parameters of interest in the model: this one-shot (i.e off-line) use of the VB method can replace numerically intensive marginalization strategies based, for example, on stochastic sampling Parameter inferences can be arranged to have an invariant functional form when updated in the light of incoming data: this leads to feasible on-line tracking algorithms involving the update of fixed- and finite-dimensional statistics In the language of the Bayesian, conjugacy can be achieved under the VB-approximation There is no reliance on propagating certainty equivalents, stochastically-generated particles, etc Unusually for a modern Bayesian approach, then, no stochastic sampling is required for the VB method In its place, the shaping parameters of the VB-marginals are found by iterating a set of implicit equations to convergence This Iterative Variational Bayes (IVB) algorithm enjoys a decisive advantage over the EM algorithm whose computational flow is similar: by design, the VB method yields distributions in place of the point estimates emerging from the EM algorithm Hence, in common with all Bayesian approaches, the VB method provides, for example, measures of uncertainty for any point estimates of interest, inferences of model order/rank, etc The machine learning community has led the way in exploiting the VB method in model-based inference, notably in inference for graphical models It is timely, however, to examine the VB method in the context of signal processing where, to date, little work has been reported In this book, at all times, we are concerned with the way in which the VB method can lead to the design of tractable computational schemes for tasks such as (i) dimensionality reduction, (ii) factor analysis for medical imagery, (iii) on-line filtering of outliers and other non-Gaussian noise processes, (iv) tracking of non-stationary processes, etc Our aim in presenting these VB algorithms is not just to reveal new flows-of-control for these problems, but—perhaps more significantly—to understand the strengths and weaknesses of the VB-approximation in model-based signal processing In this way, we hope to dismantle the current psychology of dependence in the Bayesian signal processing community on stochastic sampling methods Without doubt, the ability to model complex problems to arbitrary levels of accuracy will ensure that stochastic sampling methods—such as MCMC— will remain the golden standard for distributional approximation Notwithstanding this, our purpose here is to show that the VB method of approximation can yield highly effective Bayesian inference algorithms at low computational cost In showing this, we hope that Bayesian methods might become accessible to a much broader constituency than has been achieved to date Praha, Dublin October 2005 Václav Šmídl Anthony Quinn Contents Introduction 1.1 How to be a Bayesian 1.2 The Variational Bayes (VB) Method 1.3 A First Example of the VB Method: Scalar Additive Decomposition 1.3.1 A First Choice of Prior 1.3.2 The Prior Choice Revisited 1.4 The VB Method in its Context 1.5 VB as a Distributional Approximation 1.6 Layout of the Work 10 1.7 Acknowledgement 11 Bayesian Theory 2.1 Bayesian Benefits 2.1.1 Off-line vs On-line Parametric Inference 2.2 Bayesian Parametric Inference: the Off-Line Case 2.2.1 The Subjective Philosophy 2.2.2 Posterior Inferences and Decisions 2.2.3 Prior Elicitation 2.2.3.1 Conjugate priors 2.3 Bayesian Parametric Inference: the On-line Case 2.3.1 Time-invariant Parameterization 2.3.2 Time-variant Parameterization 2.3.3 Prediction 2.4 Summary 13 13 14 15 16 16 18 19 19 20 20 22 22 Off-line Distributional Approximations and the Variational Bayes Method 3.1 Distributional Approximation 3.2 How to Choose a Distributional Approximation 3.2.1 Distributional Approximation as an Optimization Problem 3.2.2 The Bayesian Approach to Distributional Approximation 25 25 26 26 27 X Contents 3.3 The Variational Bayes (VB) Method of Distributional Approximation 3.3.1 The VB Theorem 3.3.2 The VB Method of Approximation as an Operator 3.3.3 The VB Method 3.3.4 The VB Method for Scalar Additive Decomposition 3.4 VB-related Distributional Approximations 3.4.1 Optimization with Minimum-Risk KL Divergence 3.4.2 Fixed-form (FF) Approximation 3.4.3 Restricted VB (RVB) Approximation 3.4.3.1 Adaptation of the VB method for the RVB Approximation 3.4.3.2 The Quasi-Bayes (QB) Approximation 3.4.4 The Expectation-Maximization (EM) Algorithm 3.5 Other Deterministic Distributional Approximations 3.5.1 The Certainty Equivalence Approximation 3.5.2 The Laplace Approximation 3.5.3 The Maximum Entropy (MaxEnt) Approximation 3.6 Stochastic Distributional Approximations 3.6.1 Distributional Estimation 3.7 Example: Scalar Multiplicative Decomposition 3.7.1 Classical Modelling 3.7.2 The Bayesian Formulation 3.7.3 Full Bayesian Solution 3.7.4 The Variational Bayes (VB) Approximation 3.7.5 Comparison with Other Techniques 3.8 Conclusion 28 28 32 33 37 39 39 40 40 Principal Component Analysis and Matrix Decompositions 4.1 Probabilistic Principal Component Analysis (PPCA) 4.1.1 Maximum Likelihood (ML) Estimation for the PPCA Model 4.1.2 Marginal Likelihood Inference of A 4.1.3 Exact Bayesian Analysis 4.1.4 The Laplace Approximation 4.2 The Variational Bayes (VB) Method for the PPCA Model 4.3 Orthogonal Variational PCA (OVPCA) 4.3.1 The Orthogonal PPCA Model 4.3.2 The VB Method for the Orthogonal PPCA Model 4.3.3 Inference of Rank 4.3.4 Moments of the Model Parameters 4.4 Simulation Studies 4.4.1 Convergence to Orthogonal Solutions: VPCA vs FVPCA 4.4.2 Local Minima in FVPCA and OVPCA 4.4.3 Comparison of Methods for Inference of Rank 4.5 Application: Inference of Rank in a Medical Image Sequence 4.6 Conclusion 57 58 59 61 61 62 62 69 70 70 77 78 79 79 82 83 85 87 41 42 44 45 45 45 45 46 47 48 48 48 49 51 54 56 A.6 Von Mises-Fisher Matrix distribution 213 A.6.1 Definition The von Mises-Fisher distribution of matrix random variable, X ∈ Rp×n , restricted to X X = In , is given by exp (tr (F X )) , ζX (p, F F ) 1 p, F F C (p, n) , ζX (p, F F ) = F1 f (X|F ) = M (F ) = (A.30) (A.31) where F ∈ Rp×n is a matrix parameter of the same dimensions as X, and p ≥ n ζX (p, F F ) is the normalizing constant F1 (·) denotes a Hypergeometric function of matrix argument F F [159] C (p, r) denotes the area of the relevant Stiefel manifold, Sp,n (4.64) (A.30) is a Gaussian distribution with restriction X X = In , renormalized on Sp,n It is governed by a single matrix parameter F Consider the (economic) SVD (Definition 4.1), F = UF LF VF , of the parameter F , where UF ∈ Rp×n , LF ∈ Rn×n , VF ∈ Rn×n Then the maximum of (A.30) is reached at ˆ = UF V X (A.32) F The flatness of the distribution is controlled by LF When lF = diag−1 (LF ) = 0n,1 , the distribution is uniform on Sp,n [160] For li,F → ∞, ∀i = n, the ˆ (A.32) distribution is a Dirac δ-function at X A.6.2 First Moment Let Y be the transformed variable, YX = UF XVF (A.33) It can be shown that ζX (p, F F ) = ζX p, L2F The distribution of YX is then f (YX |F ) = 1 exp (tr (LF YX )) = exp (lF yX ) , (A.34) ζX (p, L2F ) ζX (p, L2F ) where yX = diag−1 (YX ) Hence, f (YX |F ) ∝ f (yX |lF ) (A.35) The first moment of (A.34) is given by [92] Ef (YX |F ) [YX ] = Ψ, where Ψ = diag (ψ) is a diagonal matrix with diagonal elements (A.36) 214 A Required Probability Distributions ψi = 1 ∂ p, L ln F1 ∂lF,i F (A.37) We will denote vector function (A.37) by ψ = G (p, lF ) (A.38) The mean value of the original random variable X is then [161] Ef (X|F ) [X] = UF Ψ VF = UF G (p, LF ) VF , (A.39) where G (p, LF ) = diag (G (p, lF )) A.6.3 Second Moment and Uncertainty Bounds The second central moment of the transformed variable, yX = diag−1 (YX ) (A.34), is given by Ef (YX |F ) yX yX − Ef (YX |F ) [yX ] Ef (YX |F ) [yX ] = Φ, (A.40) with elements φi,j = 1 ∂ p, L ln F1 ∂li,F ∂lj,F F , i, j = 1, , r (A.41) Transformation (A.33) is one-to-one, with unit Jacobian Hence, boundaries of confidence intervals on variables Y and Z can be mutually mapped using (A.33) However, mapping yX = diag−1 (YX ) is many-to-one, and so X → yX is surjective (but not injective) Conversion of second moments (and uncertainty bounds) of yX to X (via (A.33) and (A.34)) is therefore available in implicit form only For example, the lower bound subspace of X is expressible as follows: X = X| diag−1 (UF XVF ) = yX , where yX is an appropriately chosen lower bound on yX The upper bound, X, can be constructed similarly via a bound yX However, due to the topology of the support of X, i.e the Stiefel manifold (Fig 4.2), yX projects into the region with highest density of X Therefore, we consider the HPD region (Definition 2.1) to be bounded by X only It remains, then, to choose appropriate bound, yX , from (A.34) Exact confidence intervals for this multivariate distribution are not known Therefore, we use the first two moments, (A.36) and (A.40), to approximate (A.34) by a Gaussian The Maximum Entropy (MaxEnt) principle [158] ensures that uncertainty bounds on the MaxEnt Gaussian approximation of (A.34) enclose the uncertainty bounds of all distributions with the same first two moments Confidence intervals for the Gaussian distribution, with moments (A.37) and (A.41), are well known For example, Pr −2 φi < (yi,X − ψi ) < φi ≈ 0.95, (A.42) A.8 Dirichlet Distribution 215 where ψi is given by (A.37), and φi by (A.41) Therefore, we choose yi,X = ψi − φi (A.43) The required vector bounds are then constructed as yX = y1,X , , yr,X The geometric relationship between variables X and yX is illustrated graphically for p = and n = in Fig 4.2 A.7 Multinomial Distribution The Multinomial distribution of the c-dimensional vector variable l, where li ∈ N c and i=1 li = γ, is as follows: c f (l|α) = Mul (α) = αli χNc (l) ζl (α) i=1 i Its vector parameter is α = [α1 , α2 , , αc ] , αi > 0, malizing constant is ζl (α) = c i=1 li ! γ! c i=1 (A.44) αi = 1, and the nor- , (A.45) where ‘!’ denotes factorial If the argument l contains positive real numbers, i.e li ∈ (0, ∞), then we refer to (A.44) as the Multinomial distribution of continuous argument The only change in (A.44) is that the support is now (0, ∞)c , and the normalizing constant is ζl (α) = c i=1 Γ (li ) , Γ (γ) (A.46) where Γ (·) is the Gamma function [93] For both variants, the first moment is given by l = α (A.47) A.8 Dirichlet Distribution The Dirichlet distribution of the c-dimensional vector variable, α ∈ ∆c , is as follows: c f (α|β) = Diα (β) = where αβi −1 χ∆c (α), ζα (β) i=1 i (A.48) 216 A Required Probability Distributions c ∆c = α|αi ≥ 0, αi = i=1 is the probability simplex in Rc The vector parameter in (A.48) is β = [β1 , β2 , , βc ] , c βi > 0, i=1 βi = γ The normalizing constant is c i=1 Γ (βi ) , Γ (γ) ζα (β) = (A.49) where Γ (·) is the Gamma function [93] The first moment is given by αˆi = Ef (α|β) [αi ] = βi , i = 1, , c γ (A.50) The expected value of the logarithm is ln αi = Ef (α|β) [ln αi ] = ψΓ (βi ) − ψΓ (γ) , (A.51) ∂ ln Γ (·) is the digamma (psi) function where ψΓ (·) = ∂β For notational simplicity, we define the matrix Dirichlet distribution of matrix variable T ∈ Rp×p as follows: p DiT (Φ) ≡ Diti (φi ) , i=1 with matrix parameter Φ ∈ Rp×p = [φ1 , , φp ] Here, ti and φi are the ith columns of T and Φ respectively A.9 Truncated Exponential Distribution The truncated Exponential distribution is as follows: k exp (xk) χ(a,b] (x), exp (kb) − exp (ka) (A.52) where a < b are the boundaries of the support Its first moment is f (x|k, (a, b]) = tExpx (k, (a, b]) = x= exp (bk) (1 − bk) − exp (ak) (1 − ak) , k (exp (ak) − exp (bk)) (A.53) which is not defined for k = The limit at this point is lim x = k→0 a+b , which is consistent with the fact that the distribution is then uniform on the interval (a, b] References R T Cox, “Probability, frequency and reasonable expectation,” Am J Phys., vol 14, no 1, 1946 B de Finneti, Theory of Probability: A Critical Introductory Treatment New York: J Wiley, 1970 A P Quinn, Bayesian Point Inference in Signal Processing PhD thesis, Cambridge University Engineering Dept., 1992 E T Jaynes, “Bayesian methods: General background,” in The Fourth Annual Workshop on Bayesian/Maximum Entropy Methods in Geophysical Inverse Problems, (Calgary), 1984 S M Kay, Modern Spectral Estimation: Theory and Application Prentice-Hall, 1988 S L Marple Jr., Digital Spectral Analysis with Applications Prentice-Hall, 1987 H Jeffreys, Theory of Probability Oxford University Press, ed., 1961 G E P Box and G C Tiao, Bayesian Inference in Statistical Analysis Addison-Wesley, 1973 P M Lee, Bayesian Statistics, an Introduction Chichester, New York, Brisbane, Toronto, Singapore: John Wiley & Sons, ed., 1997 10 G Parisi, Statistical Field Theory Reading Massachusetts: Addison Wesley, 1988 11 M Opper and O Winther, “From naive mean field theory to the TAP equations,” in Advanced Mean Field Methods (M Opper and D Saad, eds.), The MIT Press, 2001 12 M Opper and D Saad, Advanced Mean Field Methods: Theory and Practice Cambridge, Massachusetts: The MIT Press, 2001 13 R P Feynman, Statistical Mechanics New York: Addison–Wesley, 1972 14 G E Hinton and D van Camp, “Keeping neural networks simple by minimizing the description length of the weights,” in Proceedings of 6th Annual Workshop on Computer Learning Theory, pp 5–13, ACM Press, New York, NY, 1993 15 L K Saul, T S Jaakkola, and M I Jordan, “Mean field theory for sigmoid belief networks.,” Journal of Artificial Inteligence Research, vol 4, pp 61–76, 1996 16 D J C MacKay, “Free energy minimization algorithm for decoding and cryptanalysis,” Electronics Letters, vol 31, no 6, pp 446–447, 1995 17 D J C MacKay, “Developments in probabilistic modelling with neural networks – ensemble learning,” in Neural Networks: Artificial Intelligence and Industrial Applications Proceedings of the 3rd Annual Symposium on Neural Networks, Nijmegen, Netherlands, 14-15 September 1995, (Berlin), pp 191–198, Springer, 1995 18 M I Jordan, Learning in graphical models MIT Press, 1999 218 References 19 H Attias, “A Variational Bayesian framework for graphical models.,” in Advances in Neural Information Processing Systems (T Leen, ed.), vol 12, MIT Press, 2000 20 Z Ghahramani and M Beal, “Graphical models and variational methods,” in Advanced Mean Field Methods (M Opper and D Saad, eds.), The MIT Press, 2001 21 A P Dempster, N M Laird, and D B Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of Royal Statistical Society, Series B, vol 39, pp 1– 38, 1977 22 R M Neal and G E Hinton, A New View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants NATO Science Series, Dordrecht: Kluwer Academic Publishers, 1998 23 M J Beal and Z Ghahramani, “The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures,” in Bayesian Statistics (J M et al Bernardo, ed.), Oxford University Press, 2003 24 C M Bishop, “Variational principal components,” in Proceedings of the Ninth International Conference on Artificial Neural Networks, (ICANN), 1999 25 Z Ghahramani and M J Beal, “Variational inference for Bayesian mixtures of factor analyzers,” Neural Information Processing Systems, vol 12, pp 449–455, 2000 26 M Sato, “Online model selection based on the variational Bayes,” Neural Computation, vol 13, pp 1649–1681, 2001 27 S J Roberts and W D Penny, “Variational Bayes for generalized autoregressive models,” IEEE Transactions on Signal Processing, vol 50, no 9, pp 2245–2257, 2002 28 P Sykacek and S J Roberts, “Adaptive classification by variational Kalman filtering,” in Advances in Neural Information Processing Systems 15 (S Thrun, S Becker, and K Obermayer, eds.), MIT press, 2003 29 J W Miskin, Ensemble Learning for Independent Component Analysis PhD thesis, University of Cambridge, 2000 30 J Pratt, H Raiffa, and R Schlaifer, Introduction to Statistical Decision Theory MIT Press, 1995 31 R E Kass and A E Raftery, “Bayes factors,” Journal of American Statistical Association, vol 90, pp 773–795, 1995 32 D Titterington, A Smith, and U Makov, Statistical Analysis of Finite Mixtures New York: John Wiley, 1985 33 E T Jaynes, “Clearing up mysteries—the original goal,” in Maximum Entropy and Bayesian Methods (J Skilling, ed.), pp 1–27, Kluwer, 1989 34 M Tanner, Tools for statistical inference New York: Springer-Verlag, 1993 35 J J K O’Ruanaidh and W J Fitzgerald, Numerical Bayesian Methods applied to Signal Processing Springer, 1996 36 B de Finetti, Theory of Probability, vol Wiley, 1975 37 J Bernardo and A Smith, Bayesian Theory Chichester, New York, Brisbane, Toronto, Singapore: John Wiley & Sons, ed., 1997 38 G L Bretthorst, Bayesian Spectrum Analysis and Parameter Estimation SpringerVerlag, 1989 39 M Kárný and R Kulhavý, “Structure determination of regression-type models for adaptive prediction and control,” in Bayesian Analysis of Time Series and Dynamic Models (J Spall, ed.), New York: Marcel Dekker, 1988 Chapter 12 40 A Quinn, “Regularized signal identification using Bayesian techniques,” in Signal Analysis and Prediction, Birkhäuser Boston Inc., 1998 41 R A Fisher, “Theory of statistical estimation,” Proc Camb Phil Soc., vol 22(V), pp 700–725, 1925 Reproduced in [162] References 219 42 V Peterka, “Bayesian approach to system identification,” in Trends and Progress in System identification (P Eykhoff, ed.), pp 239–304, Oxford: Pergamon Press, 1981 43 A W F Edwards, Likelihood Cambridge Univ Press, 1972 44 J D Kalbfleisch and D A Sprott, “Application of likelihood methods to models involving large numbers of parameters,” J Royal Statist Soc., vol B-32, no 2, 1970 45 R L Smith and J C Naylor, “A comparison of maximum likelihood and Bayesian estimators for the three-parameter Weibull distribution,” Appl Statist., vol 36, pp 358– 369, 1987 46 R D Rosenkrantz, ed., E T Jaynes: Papers on Probability, Statistics and Statistical Physics D Reidel, Dordrecht-Holland, 1983 47 G E P Box and G C Tiao, Bayesian Statistics Oxford: Oxford, 1961 48 J Berger, Statistical Decision Theory and Bayesian Analysis New York: SpringerVerlag, 1985 49 A Wald, Statistical Decision Functions New York, London: John Wiley, 1950 50 M DeGroot, Optimal Statistical Decisions New York: McGraw-Hill, 1970 51 C P Robert, The Bayesian Choice: A Decision Theoretic Motivation Springer texts in Statistics, Springer-Verlag, 1994 52 M Kárný, J Böhm, T Guy, L Jirsa, I Nagy, P Nedoma, and L Tesaˇr, Optimized Bayesian Dynamic Advising: Theory and Algorithms London: Springer, 2005 53 J Makhoul, “Linear prediction: A tutorial review,” Proceedings of the IEEE, vol 63, no 4, pp 561–580, 1975 54 S M Kay, Fundamentals of Statistical Signal Processing Prentice-Hall, 1993 55 A P Quinn, “The performance of Bayesian estimators in the superresolution of signal parameters,” in Proc IEEE Int Conf on Acoust., Sp and Sig Proc (ICASSP), (San Francisco), 1992 56 A Zellner, An Introduction to Bayesian Inference in Econometrics New York: J Wiley, 1976 57 A Quinn, “Novel parameter priors for Bayesian signal identification,” in Proc IEEE Int Conf on Acoust., Sp and Sig Proc (ICASSP), (Munich), 1997 58 R E Kass and A E Raftery, “Bayes factors and model uncertainty,” tech rep., University of Washington, 1994 59 S F Gull, “Bayesian inductive inference and maximum entropy,” in Maximum Entropy and Bayesian Methods in Science and Engineering (G J Erickson and C R Smith, eds.), Kluwer, 1988 60 J Skilling, “The axioms of maximum entropy,” in Maximum Entropy and Bayesian Methods in Science and Engineering Vol (G J Erickson and C R Smith, eds.), Kluwer, 1988 61 D Bosq, Nonparametric Statistics for Stochastic Processes: estimation and prediction Springer, 1998 62 J M Bernardo, “Expected infromation as expected utility,” The Annals of Statistics, vol 7, no 3, pp 686–690, 1979 63 S Kullback and R Leibler, “On information and sufficiency,” Annals of Mathematical Statistics, vol 22, pp 79–87, 1951 64 S Amari, S Ikeda, and H Shimokawa, “Information geometry of α-projection in mean field approximation,” in Advanced Mean Field Methods (M Opper and D Saad, eds.), (Cambridge, Massachusetts), The MIT Press, 2001 65 S Amari, Differential-Geometrical Methods in Statistics Sringer, 1985 66 C.F.J.Wu, “On the convergence properties of the EM algorithm,” The Annals of Statistics, vol 11, pp 95–103, 1983 220 References 67 S F Gull and J Skilling, “Maximum entropy method in image processing,” Proc IEE, vol F-131, October 1984 68 S F Gull and J Skilling, Quantified Maimum Entropy MemSys5 Users’ Manual Maximum Entropy Data Consultants Ltd., 1991 69 A Papoulis, “Maximum entropy and spectral estimation: a review,” IEEE Trans on Acoust., Sp., and Sig Proc., vol ASSP-29, December 1981 70 D J C MacKay, Information Theory, Inference & Learning Algorithms Cambridge Univerzity Press, 2004 71 G Demoment and J Idier, “Problèmes inverses et déconvolution,” Journal de Physique IV, pp 929–936, 1992 72 M Nikolova and A Mohammad-Djafari, “Maximum entropy image reconstruction in eddy current tomography,” in Maximum Entropy and Bayesian Methods (A Mohammad-Djafari and G Demoment, eds.), Kluwer, 1993 73 W Gilks, S Richardson, and D Spiegelhalter, Markov Chain Monte Carlo in Practice London: Chapman & Hall, 1997 74 A Doucet, N de Freitas, and N Gordon, eds., Sequential Monte Carlo Methods in Practice Springer, 2001 75 A F M Smith and A E Gelfand, “Bayesian statistics without tears: a samplingresampling perspective,” The American Statistician, vol 46, pp 84–88, 1992 76 T Ferguson, “A Bayesian analysis of some nonparametric problems,” The Annals of Statistics, vol 1, pp 209–230, 1973 77 S Walker, P Damien, P Laud, and A Smith, “Bayesian nonparametric inference for random distributions and related functions,” J R Statist Soc., vol 61, pp 485–527, 2004 with discussion 78 S J Press and K Shigemasu, “Bayesian inference in factor analysis,” in Contributions to Probability and Statistics (L J Glesser, ed.), ch 15, Springer Verlag, New York, 1989 79 D B Rowe and S J Press, “Gibbs sampling and hill climbing in Bayesian factor analysis,” tech rep., University of California, Riverside, 1998 80 I Jolliffe, Principal Component Analysis Springer-Verlag, 2nd ed., 2002 81 S M Kay, Fundamentals Of Statistical Signal Processing: Estimation Theory Prentice Hall, 1993 82 M E Tipping and C M Bishop, “Mixtures of probabilistic principal component analyzers,” tech rep., Aston University, 1998 83 K Pearson, “On lines and planes of closest fit to systems of points in space,” The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science, vol 2, pp 559– 572, 1901 84 T W Anderson, An Introduction to Multivariate Statistical Analysis John Wiley and Sons, 1971 85 M E Tipping and C M Bishop, “Probabilistic principal component analysis,” Journal of the Royal Statistical Society, Series B, vol 61, pp 611–622, 1998 86 G H Golub and C F Van Loan, Matrix Computations Baltimore – London: The John Hopkins University Press, 1989 87 H Hotelling, “Analysis of a complex of statistical variables into principal components,” Journal of Educational Psychology, vol 24, pp 417–441, 1933 88 D B Rowe, Multivariate Bayesian Statistics: Models for Source Separation and Signal Unmixing Boca Raton, FL, USA: CRC Press, 2002 89 T P Minka, “Automatic choice of dimensionality for PCA,” tech rep., MIT, 2000 90 V Šmídl, The Variational Bayes Approach in Signal Processing PhD thesis, Trinity College Dublin, 2004 References 221 91 V Šmídl and A Quinn, “Fast variational PCA for functional analysis of dynamic image sequences,” in Proceedings of the 3rd International Conference on Image and Signal Processing, ISPA 03, (Rome, Italy), September 2003 92 C G Khatri and K V Mardia, “The von Mises-Fisher distribution in orientation statistics,” Journal of Royal Statistical Society B, vol 39, pp 95–106, 1977 93 M Abramowitz and I Stegun, Handbook of Mathematical Functions New York: Dover Publications, 1972 94 I Buvat, H Benali, and R Di Paola, “Statistical distribution of factors and factor images in factor analysis of medical image sequences,” Physics in Medicine and Biology, vol 43, no 6, pp 1695–1711, 1998 95 H Benali, I Buvat, F Frouin, J P Bazin, and R Di Paola, “A statistical model for the determination of the optimal metric in factor analysis of medical image sequences (FAMIS),” Physics in Medicine and Biology, vol 38, no 8, pp 1065–1080, 1993 96 J Harbert, W Eckelman, and R Neumann, Nuclear Medicine Diagnosis and Therapy New York: Thieme, 1996 97 S Kotz and N Johnson, Encyclopedia of statistical sciences New York: John Wiley, 1985 98 T W Anderson, “Estimating linear statistical relationships,” Annals of Statististics, vol 12, pp 1–45, 1984 99 J Fine and A Pouse, “Asymptotic study of the multivariate functional model Application to the metric of choice in Principal Component Analysis,” Statistics, vol 23, pp 63–83, 1992 100 F Pedersen, M Bergstroem, E Bengtsson, and B Langstroem, “Principal component analysis of dynamic positron emission tomography studies,” Europian Journal of Nuclear Medicine, vol 21, pp 1285–1292, 1994 101 F Hermansen and A A Lammertsma, “Linear dimension reduction of sequences of medical images: I optimal inner products,” Physics in Medicine and Biology, vol 40, pp 1909–1920, 1995 102 M Šámal, M Kárný, H Surová, E Maˇríková, and Z Dienstbier, “Rotation to simple structure in factor analysis of dynamic radionuclide studies,” Physics in Medicine and Biology, vol 32, pp 371–382, 1987 103 M Kárný, M Šámal, and J Böhm, “Rotation to physiological factors revised,” Kybernetika, vol 34, no 2, pp 171–179, 1998 104 A Hyvärinen, “Survey on independent component analysis,” Neural Computing Surveys, vol 2, pp 94–128, 1999 105 V Šmídl, A Quinn, and Y Maniouloux, “Fully probabilistic model for functional analysis of medical image data,” in Proceedings of the Irish Signals and Systems Conference, (Belfast), pp 201–206, University of Belfast, June 2004 106 J R Magnus and H Neudecker, Matrix Differential Calculus Wiley, 2001 107 M Šámal and H Bergmann, “Hybrid phantoms for testing the measurement of regional dynamics in dynamic renal scintigraphy.,” Nuclear Medicine Communications, vol 19, pp 161–171, 1998 108 L Ljung and T Söderström, Theory and practice of recursive identification Cambridge; London: MIT Press, 1983 109 E Mosca, Optimal, Predictive, and Adaptive Control Prentice Hall, 1994 110 T Söderström and R Stoica, “Instrumental variable methods for system identification,” Lecture Notes in Control and Information Sciences, vol 57, 1983 111 D Clarke, Advances in Model-Based Predictive Control Oxford: Oxford University Press, 1994 222 References 112 R Patton, P Frank, and R Clark, Fault Diagnosis in Dynamic Systems: Theory & Applications Prentice Hall, 1989 113 R Kalman, “A new approach to linear filtering and prediction problem,” Trans ASME, Ser D, J Basic Eng., vol 82, pp 34–45, 1960 114 F Gustafsson, Adaptive Filtering and Change Detection Chichester: Wiley, 2000 115 V Šmídl, A Quinn, M Kárný, and T V Guy, “Robust estimation of autoregressive processes using a mixture-based filter bank,” System & Control Letters, vol 54, pp 315– 323, 2005 116 K Astrom and B Wittenmark, Adaptive Control Reading, Massachusetts: AddisonWesley, 1989 117 R Koopman, “On distributions admitting a sufficient statistic,” Transactions of American Mathematical Society, vol 39, p 399, 1936 118 L.Ljung, System Identification-Theory for the User Prentice-hall Englewood Cliffs, N.J: D van Nostrand Company Inc., 1987 119 A V Oppenheim and R W Schafer, Discrete-Time Signal Processing Prentice-Hall, 1989 120 V Šmídl and A Quinn, “Mixture-based extension of the AR model and its recursive Bayesian identification,” IEEE Transactions on Signal Processing, vol 53, no 9, pp 3530–3542, 2005 121 R Kulhavý, “Recursive nonlinear estimation: A geometric approach,” Automatica, vol 26, no 3, pp 545–555, 1990 122 R Kulhavý, “Implementation of Bayesian parameter estimation in adaptive control and signal processing,” The Statistician, vol 42, pp 471–482, 1993 123 R Kulhavý, “Recursive Bayesian estimation under memory limitations,” Kybernetika, vol 26, pp 1–20, 1990 124 R Kulhavý, Recursive Nonlinear Estimation: A Geometric Approach, vol 216 of Lecture Notes in Control and Information Sciences London: Springer-Verlag, 1996 125 R Kulhavý, “A Bayes-closed approximation of recursive non-linear estimation,” International Journal Adaptive Control and Signal Processing, vol 4, pp 271–285, 1990 126 C S Wong and W K Li, “On a mixture autoregressive model,” Journal of the Royal Statistical Society: Series B, vol 62, pp 95–115, 2000 127 M Kárný, J Böhm, T Guy, and P Nedoma, “Mixture-based adaptive probabilistic control,” International Journal of Adaptive Control and Signal Processing, vol 17, no 2, pp 119–132, 2003 128 H Attias, J C Platt, A Acero, and L Deng, “Speech denoising and dereverberation using probabilistic models,” in Advances in Neural Information Processing Systems, pp 758–764, 2001 129 J Stutz and P Cheeseman, “AutoClass - a Bayesian approach to classification,” in Maximum Entropy and Bayesian Methods (J Skilling and S Sibisi, eds.), Dordrecht: Kluwer, 1995 130 M Funaro, M Marinaro, A Petrosino, and S Scarpetta, “Finding hidden events in astrophysical data using PCA and mixture of Gaussians clustering,” Pattern Analysis & Applications, vol 5, pp 15–22, 2002 131 S Haykin, "Neural Networks: A Comprehensive Foundation New York: Macmillan, 1994 132 K Warwick and M Kárný, Computer-Intensive Methods in Control and Signal Processing: Curse of Dimensionality Birkhauser, 1997 133 J Andrýsek, “Approximate recursive Bayesian estimation of dynamic probabilistic mixtures,” in Multiple Participant Decision Making (J Andrýsek, M Kárný, and J Kracík, eds.), pp 39–54, Adelaide: Advanced Knowledge International, 2004 References 223 134 S Roweis and Z Ghahramani, “A unifying review of linear Gaussian models,” Neural computation, vol 11, pp 305–345, 1999 135 A P Quinn, “Threshold-free Bayesian estimation using censored marginal inference,” in Signal Processing VI: Proc of the 6th European Sig Proc Conf (EUSIPCO-’92) Vol 2, (Brussels), 1992 136 A P Quinn, “A consistent, numerically efficient Bayesian framework for combining the selection, detection and estimation tasks in model-based signal processing,” in Proc IEEE Int Conf on Acoust., Sp and Sig Proc., (Minneapolis), 1993 137 “Project IST-1999-12058, decision support tool for complex industrial processes based on probabilistic data clustering (ProDaCTool),” tech rep., 1999–2002 138 P Nedoma, M Kárný, and I Nagy, “MixTools, MATLAB toolbox for mixtures: User’s ˇ Praha, 2001 guide,” tech rep., ÚTIA AV CR, 139 A Quinn, P Ettler, L Jirsa, I Nagy, and P Nedoma, “Probabilistic advisory systems for data-intensive applications,” International Journal of Adaptive Control and Signal Processing, vol 17, no 2, pp 133–148, 2003 140 Z Chen, “Bayesian filtering: From Kalman filters to particle filters, and beyond,” tech rep., Adaptive Syst Lab., McMaster University, Hamilton, ON, Canada, 2003 141 B Ristic, S Arulampalam, and N Gordon, Beyond the Kalman Filter: Particle Filters for Tracking Applications Artech House Publishers, 2004 142 E Daum, “New exact nonlinear filters,” in Bayesian Analysis of Time Series and Dynamic Models (J Spall, ed.), New York: Marcel Dekker, 1988 143 P Vidoni, “Exponential family state space models based on a conjugate latent process,” J Roy Statist Soc., Ser B, vol 61, pp 213–221, 1999 144 A H Jazwinski, Stochastic Processes and Filtering Theory New York: Academic Press, 1979 145 R Kulhavý and M B Zarrop, “On a general concept of forgetting,” International Journal of Control, vol 58, no 4, pp 905–924, 1993 146 R Kulhavý, “Restricted exponential forgetting in real-time identification,” Automatica, vol 23, no 5, pp 589–600, 1987 147 C F So, S C Ng, and S H Leung, “Gradient based variable forgetting factor RLS algorithm,” Signal Processing, vol 83, pp 1163–1175, 2003 148 R H Middleton, G C Goodwin, D J Hill, and D Q Mayne, “Design issues in adaptive control,” IEEE Transactions on Automatic Control, vol 33, no 1, pp 50–58, 1988 149 R Elliot, L Assoun, and J Moore, Hidden Markov Models New York: Springer-Verlag, 1995 150 V Šmídl and A Quinn, “Bayesian estimation of non-stationary AR model parameters via an unknown forgetting factor,” in Proceedings of the IEEE Workshop on Signal Processing, (New Mexico), pp 100–105, August 2004 151 M H Vellekoop and J M C Clark, “A nonlinear filtering approach to changepoint detection problems: Direct and differential-geometric methods,” SIAM Journal on Control and Optimization, vol 42, no 2, pp 469–494, 2003 152 G Bierman, Factorization Methods for Discrete Sequential Estimation New York: Academic Press, 1977 153 G D Forney, “The Viterbi algorithm,” Proceedings of the IEEE, vol 61, no 3, pp 268– 278, 1973 154 J Deller, J Proakis, and J Hansen, Discrete-Time Processing of Speech Signals Macmillan, New York, 1993 155 L R Rabiner and R W Schafer, Digital Processing of Speech Signals Prentice-Hall, 1978 224 References 156 J M Bernardo, “Approximations in statistics from a decision-theoretical viewpoint,” in Probability and Bayesian Statistics (R Viertl, ed.), pp 53–60, New York: Plenum, 1987 157 N D Le, L Sun, and J V Zidek, “Bayesian spatial interpolation and backcasting using Gaussian-generalized inverted Wishart model,” tech rep., University of British Columbia, 1999 158 E T Jaynes, Probability Theory: The Logic of Science Cambridge University Press, 2003 159 A T James, “Distribution of matrix variates and latent roots derived from normal samples,” Annals of Mathematical Statistics, vol 35, pp 475–501, 1964 160 K Mardia and P E Jupp, Directional Statistics Chichester, England: John Wiley and Sons, 2000 161 T D Downs, “Orientational statistics,” Biometrica, vol 59, pp 665–676, 1972 162 R A Fisher, Contributions to Mathematical Statistics John Wiley and Sons, 1950 Index activity curve, 91 additive Gaussian noise model, 17 advisory system, 140 augmented model, 124 Automatic Rank Determination (ARD) property, 68, 85, 86, 99, 103 AutoRegressive (AR) model, 111, 112, 114, 130, 173, 179, 193 AutoRegressive model with eXogenous variables (ARX), 113, 181 Bayesian filtering, 47, 146 Bayesian smoothing, 146 certainty equivalence approximation, 43, 45, 188 changepoints, 173 classical estimators, 18 combinatoric explosion, 130 components, 130 conjugate distribution, 19, 111, 120 conjugate parameter distribution to DEF family (CDEF), 113, 146, 153, 179, 191 correspondence analysis, 86, 92, 103 covariance matrix, 117 covariance method, 116 cover-up rule, 37 criterion of cumulative variance, 83, 86 data update, 21, 146, 149 digamma (psi) function, 133, 164, 211 Dirac δ-function, 44 Dirichlet distribution, 132, 134, 159, 164, 215 discount schedules, 139, 174 distributional estimation, 47 dyad, 115, 181, 190 Dynamic Exponential Family (DEF), 113, 153, 179 Dynamic Exponential Family with Hidden variables (DEFH), 126 Dynamic Exponential Family with Separable parameters (DEFS), 120 dynamic mixture model, 140 economic SVD, 59 EM algorithm, 44, 124 empirical distribution, 46, 152 exogenous variables, 113, 139, 180 exponential family, 112 Exponential Family with Hidden variables (EFH), 126 exponential forgetting, 154 Extended AR (EAR) model, 179, 180 extended information matrix, 115, 181 extended regressor, 115, 131, 180 factor images, 91 factor analysis, 51 factor curve, 91 FAMIS model, 93, 94, 102 Fast Variational PCA (FVPCA), 68 filter-bank, 182, 192, 196 forgetting, 139, 191 forgetting factor, 153 226 Index Functional analysis of medical image sequences, 89 Gamma distribution, 212 geometric approach, 128 Gibbs sampling, 61 global approximation, 117, 128 Hadamard product, 64, 98, 120, 160 Hamming distance, 167 Hidden Markov Model (HMM), 158 hidden variable, 124, 182 Highest Posterior Density (HPD) region, 18, 79, 195 hyper-parameter, 19, 62, 95 importance function, 152 Independent Component Analysis (ICA), 94 independent, identically-distributed (i.i.d.) noise, 58 independent, identically-distributed (i.i.d.) process, 168 independent, identically-distributed (i.i.d.) sampling, 46, 152 inferential breakpoint, 49 informative prior, 49 initialization, 136, 174 innovations process, 114, 130, 180 inverse-Wishart distribution, 211 Iterative Variational Bayes (VB) algorithm, 32, 53, 118, 122, 134, 174 Jacobian, 180 Jeffreys’ notation, 16, 110 Jeffreys’ prior, 4, 71 Jensen’s inequality, 32, 77 Kalman filter, 146, 155, 196 KL divergence for Minimum Risk (MR) calculations, 28, 39, 128 KL divergence for Variational Bayes (VB) calculations, 28, 40, 121, 147 Kronecker function, 44 Kronecker product, 99 Kullback-Leibler (KL) divergence, 27 Laplace approximation, 62 LD decomposition, 182 Least Squares (LS) estimation, 18, 116 local approximation, 117 Low-Pass Filter (LPF), 200 Maple, 68 Markov chain, 47, 113, 158, 182 Markov-Chain Monte Carlo (MCMC) methods , 47 MATLAB, 64, 120, 140 matrix Dirichlet distribution, 160, 216 matrix Normal distribution, 58, 209 Maximum a Posteriori (MAP) estimation, 17, 44, 49, 188 Maximum Likelihood (ML) estimation, 17, 44, 57, 69 medical imaging, 89 Minimum Mean Squared-Error (MMSE) criterion, 116 missing data, 124, 132 MixTools, 140 Mixture-based Extended AutoRegressive (MEAR) model, 183, 191, 192, 196 moment fitting, 129 Monte Carlo (MC) simulation, 80, 83, 138, 166 Multinomial distribution, 132, 134, 159, 162, 185, 215 Multinomial distribution of continuous argument, 163, 185, 215 multivariate AutoRegressive (AR) model, 116 multivariate Normal distribution, 209 natural gradient technique, 32 non-informative prior, 18, 103, 114, 172 non-minimal conjugate distribution, 121 non-smoothing restriction, 151, 162, 187 nonparametric prior, 47 normal equations, 116 Normal-inverse-Gamma distribution, 115, 170, 179, 211 Normal-inverse-Wishart distribution, 132, 134, 210 normalizing constant, 15, 22, 113, 116, 134, 169, 187 nuclear medicine, 89 observation model, 21, 102, 111, 112, 146, 159, 179, 183 one-step approximation, 117, 121, 129 Index One-step Fixed-Form (FF) Approximation, 135 optimal importance function, 165 orthogonal PPCA model, 70 Orthogonal Variational PCA (OVPCA), 76 outliers, 192 parameter evolution model, 21, 146 particle filtering, 47, 145, 152 particles, 152 Poisson distribution, 91 precision matrix, 117 precision parameter, 58, 93 prediction, 22, 116, 141, 181 Principal Component Analysis (PCA), 57 probabilistic editor, 129 Probabilistic Principal Component Analysis (PPCA), 58 probability fitting, 129 probability simplex, 216 ProDaCTool, 140 proximity measure, 27, 128 pseudo-stationary window, 155 227 scaling ambiguity, 48 separable-in-parameters family, 34, 63, 118 shaping parameters, 3, 34, 36 sign ambiguity, 49, 70 signal flowgraph, 114 Signal-to-Noise Ratio (SNR), 201 Singular Value Decomposition (SVD), 59, 213 spanning property, 191 speech reconstruction, 201 static mixture models, 135 Stiefel manifold, 71, 213 stochastic distributional approximation, 46 stressful regime, 136 Student’s t-distribution, 5, 116, 141, 200, 211 sufficient statistics, 19 time update, 21, 149, 153 transition matrix, 159 truncated Exponential distribution, 172, 216 truncated Normal distribution, 73, 211 uniform prior, 18 Quasi-Bayes (QB) approximation, 43, 128, 133, 140, 150, 186 rank, 59, 77, 83, 99 Rao-Blackwellization, 165 recursive algorithm, 110 Recursive Least Squares (RLS) algorithm, 116, 168 regressor, 111, 180 regularization, 18, 48, 114, 154 relative entropy, Restricted VB (RVB) approximation, 128, 133, 186 rotational ambiguity, 60, 70 scalar additive decomposition, 3, 37 scalar multiplicative decomposition, 120 scalar Normal distribution, 120, 209 scaling, 92 Variational Bayes (VB) method, 33, 126, 149 Variational EM (VEM), 32 Variational PCA (VPCA), 68 VB-approximation, 29, 51 VB-conjugacy, 149 VB-equations, 35 VB-filtering, 148 VB-marginalization, 33 VB-marginals, 3, 29, 32, 34 VB-moments, 3, 35, 36 VB-observation model, 121, 125, 148, 172, 184 VB-parameter predictor, 148, 184 VB-smoothing, 148 vec-transpose operator, 115 Viterbi-Like (VL) Approximation, 188 von Mises-Fisher distribution, 73, 212 ... way in exploiting the VB method in model-based inference, notably in inference for graphical models It is timely, however, to examine the VB method in the context of signal processing where, to... (3.18) using Jensen’s inequality [24] Minimizing the KL divergence on the right-hand side of (3.17)—e.g using the result of Theorem 3.1? ?the error in the approximation is minimized The main computational... when updated in the light of incoming data: this leads to feasible on-line tracking algorithms involving the update of fixed- and finite-dimensional statistics In the language of the Bayesian, conjugacy