Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 482 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
482
Dung lượng
4,63 MB
Nội dung
Matrix Differential Calculus with Applications in Statistics and Econometrics WILEY SERIES IN PROBABILITY AND STATISTICS Established by Walter E Shewhart and Samuel S Wilks Editors: David J Balding, Noel A C Cressie, Garrett M Fitzmaurice, Geof H Givens, Harvey Goldstein, Geert Molenberghs, David W Scott, Adrian F M Smith, and Ruey S Tsay Editors Emeriti: J Stuart Hunter, Iain M Johnstone, Joseph B Kadane, and Jozef L Teugels The Wiley Series in Probability and Statistics is well established and authoritative It covers many topics of current research interest in both pure and applied statistics and probability theory Written by leading statisticians and institutions, the titles span both state-of-the-art developments in the field and classical methods Reflecting the wide range of current research in statistics, the series encompasses applied, methodological, and theoretical statistics, ranging from applications and new techniques made possible by advances in computerized practice to rigorous treatment of theoretical approaches This series provides essential and invaluable reading for all statisticians, whether in academia, industry, government, or research A complete list of the titles in this series can be found at http://www.wiley.com/go/wsps Matrix Differential Calculus with Applications in Statistics and Econometrics Third Edition Jan R Magnus Department of Econometrics and Operations Research Vrije Universiteit Amsterdam, The Netherlands and Heinz Neudecker† Amsterdam School of Economics University of Amsterdam, The Netherlands This edition first published 2019 c 2019 John Wiley & Sons Ltd Edition History John Wiley & Sons (1e, 1988) and John Wiley & Sons (2e, 1999) All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions The right of Jan R Magnus and Heinz Neudecker to be identified as the authors of this work has been asserted in accordance with law Registered Offices John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK Editorial Office 9600 Garsington Road, Oxford, OX4 2DQ, UK For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com Wiley also publishes its books in a variety of electronic formats and by print-on-demand Some content that appears in standard print versions of this book may not be available in other formats Limit of Liability/Disclaimer of Warranty While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make This work is sold with the understanding that the publisher is not engaged in rendering professional services The advice and strategies contained herein may not be suitable for your situation You should consult with a specialist where appropriate Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages Library of Congress Cataloging-in-Publication Data applied for ISBN: 9781119541202 Cover design by Wiley Cover image: c phochi/Shutterstock Typeset by the author in LATEX 10 Contents Preface xiii Part One — Matrices Basic properties of vectors and matrices Introduction Sets Matrices: addition and multiplication The transpose of a matrix Square matrices Linear forms and quadratic forms The rank of a matrix The inverse The determinant 10 The trace 11 Partitioned matrices 12 Complex matrices 13 Eigenvalues and eigenvectors 14 Schur’s decomposition theorem 15 The Jordan decomposition 16 The singular-value decomposition 17 Further results concerning eigenvalues 18 Positive (semi)definite matrices 19 Three further results for positive definite 20 A useful result 21 Symmetric matrix functions Miscellaneous exercises Bibliographical notes 3 6 10 10 11 12 14 14 17 18 20 20 23 25 26 27 28 30 Kronecker products, vec operator, and Moore-Penrose inverse Introduction The Kronecker product Eigenvalues of a Kronecker product The vec operator The Moore-Penrose (MP) inverse 31 31 31 33 34 36 v matrices Contents vi Existence and uniqueness of the MP inverse Some properties of the MP inverse Further properties The solution of linear equation systems Miscellaneous exercises Bibliographical notes 37 38 39 41 43 45 Miscellaneous matrix results Introduction The adjoint matrix Proof of Theorem 3.1 Bordered determinants The matrix equation AX = The Hadamard product The commutation matrix Kmn The duplication matrix Dn Relationship between Dn+1 and Dn , I 10 Relationship between Dn+1 and Dn , II 11 Conditions for a quadratic form to be positive (negative) subject to linear constraints 12 Necessary and sufficient conditions for r(A : B) = r(A) + r(B) 13 The bordered Gramian matrix 14 The equations X1 A + X2 B ′ = G1 , X1 B = G2 Miscellaneous exercises Bibliographical notes 47 47 47 49 51 51 52 54 56 58 59 60 63 65 67 69 70 Part Two — Differentials: the theory Mathematical preliminaries Introduction Interior points and accumulation points Open and closed sets The Bolzano-Weierstrass theorem Functions The limit of a function Continuous functions and compactness Convex sets Convex and concave functions Bibliographical notes 73 73 73 75 77 78 79 80 81 83 86 Differentials and differentiability Introduction Continuity Differentiability and linear approximation The differential of a vector function Uniqueness of the differential Continuity of differentiable functions 87 87 88 90 91 93 94 Contents Partial derivatives The first identification theorem Existence of the differential, I 10 Existence of the differential, II 11 Continuous differentiability 12 The chain rule 13 Cauchy invariance 14 The mean-value theorem for real-valued functions 15 Differentiable matrix functions 16 Some remarks on notation 17 Complex differentiation Miscellaneous exercises Bibliographical notes vii 95 96 97 99 100 100 102 103 104 106 108 110 110 The second differential Introduction Second-order partial derivatives The Hessian matrix Twice differentiability and second-order approximation, I Definition of twice differentiability The second differential Symmetry of the Hessian matrix The second identification theorem Twice differentiability and second-order approximation, II 10 Chain rule for Hessian matrices 11 The analog for second differentials 12 Taylor’s theorem for real-valued functions 13 Higher-order differentials 14 Real analytic functions 15 Twice differentiable matrix functions Bibliographical notes 111 111 111 112 113 114 115 117 119 119 121 123 124 125 125 126 127 Static optimization 129 Introduction 129 Unconstrained optimization 130 The existence of absolute extrema 131 Necessary conditions for a local minimum 132 Sufficient conditions for a local minimum: first-derivative test 134 Sufficient conditions for a local minimum: second-derivative test 136 Characterization of differentiable convex functions 138 Characterization of twice differentiable convex functions 141 Sufficient conditions for an absolute minimum 142 10 Monotonic transformations 143 11 Optimization subject to constraints 144 12 Necessary conditions for a local minimum under constraints 145 13 Sufficient conditions for a local minimum under constraints 149 14 Sufficient conditions for an absolute minimum under constraints 154 Contents viii 15 A note on constraints in matrix form 16 Economic interpretation of Lagrange multipliers Appendix: the implicit function theorem Bibliographical notes 155 155 157 159 Some important differentials Introduction Fundamental rules of differential calculus The differential of a determinant The differential of an inverse Differential of the Moore-Penrose inverse The differential of the adjoint matrix On differentiating eigenvalues and eigenvectors The continuity of eigenprojections The differential of eigenvalues and eigenvectors: 10 Two alternative expressions for dλ 11 Second differential of the eigenvalue function Miscellaneous exercises Bibliographical notes symmetric case 163 163 163 165 168 169 172 174 176 180 183 185 186 189 First-order differentials and Jacobian matrices Introduction Classification Derisatives Derivatives Identification of Jacobian matrices The first identification table Partitioning of the derivative Scalar functions of a scalar Scalar functions of a vector 10 Scalar functions of a matrix, I: trace 11 Scalar functions of a matrix, II: determinant 12 Scalar functions of a matrix, III: eigenvalue 13 Two examples of vector functions 14 Matrix functions 15 Kronecker products 16 Some other problems 17 Jacobians of transformations Bibliographical notes 191 191 192 192 194 196 197 197 198 198 199 201 202 203 204 206 208 209 210 10 Second-order differentials and Hessian matrices Introduction The second identification table Linear and quadratic forms A useful theorem 211 211 211 212 213 Part Three — Differentials: the practice Contents The determinant function The eigenvalue function Other examples Composite functions The eigenvector function 10 Hessian of matrix functions, 11 Hessian of matrix functions, Miscellaneous exercises ix I II 214 215 215 217 218 219 219 220 11 Inequalities Introduction The Cauchy-Schwarz inequality Matrix analogs of the Cauchy-Schwarz inequality The theorem of the arithmetic and geometric means The Rayleigh quotient Concavity of λ1 and convexity of λn Variational description of eigenvalues Fischer’s min-max theorem Monotonicity of the eigenvalues 10 The Poincar´e separation theorem 11 Two corollaries of Poincar´e’s theorem 12 Further consequences of the Poincar´e theorem 13 Multiplicative version 14 The maximum of a bilinear form 15 Hadamard’s inequality 16 An interlude: Karamata’s inequality 17 Karamata’s inequality and eigenvalues 18 An inequality concerning positive semidefinite matrices 19 A representation theorem for ( api )1/p 20 A representation theorem for (trAp )1/p 21 Hă olders inequality 22 Concavity of log|A| 23 Minkowski’s inequality 24 Quasilinear representation of |A|1/n 25 Minkowski’s determinant theorem 26 Weighted means of order p 27 Schlăomilchs inequality 28 Curvature properties of Mp (x, a) 29 Least squares 30 Generalized least squares 31 Restricted least squares 32 Restricted least squares: matrix version Miscellaneous exercises Bibliographical notes 225 225 226 227 228 230 232 232 234 236 236 237 238 239 241 242 242 244 245 246 247 248 250 251 253 255 256 258 259 260 261 262 264 265 269 Part Four — Inequalities Contents x Part Five — The linear model 12 Statistical preliminaries Introduction The cumulative distribution function The joint density function Expectations Variance and covariance Independence of two random variables Independence of n random variables Sampling The one-dimensional normal distribution 10 The multivariate normal distribution 11 Estimation Miscellaneous exercises Bibliographical notes 273 273 273 274 274 275 277 279 279 279 280 282 282 283 13 The linear regression model Introduction Affine minimum-trace unbiased estimation The Gauss-Markov theorem The method of least squares Aitken’s theorem Multicollinearity Estimable functions Linear constraints: the case M(R′ ) ⊂ M(X ′ ) Linear constraints: the general case 10 Linear constraints: the case M(R′ ) ∩ M(X ′ ) = {0} 11 A singular variance matrix: the case M(X) ⊂ M(V ) 12 A singular variance matrix: the case r(X ′ V + X) = r(X) 13 A singular variance matrix: the general case, I 14 Explicit and implicit linear constraints 15 The general linear model, I 16 A singular variance matrix: the general case, II 17 The general linear model, II 18 Generalized least squares 19 Restricted least squares Miscellaneous exercises Bibliographical notes 285 285 286 287 290 291 293 295 296 300 302 304 305 307 307 310 311 314 315 316 318 319 of σ 321 321 322 322 324 326 14 Further topics in the linear model Introduction Best quadratic unbiased estimation of σ The best quadratic and positive unbiased estimator The best quadratic unbiased estimator of σ Best quadratic invariant estimation of σ 464 Bibliography Tucker, L R (1966) Some mathematical notes on three-mode factor analysis, Psychometrika, 31, 279–311 van de Velden, M., I Iodice d’Enza and F Palumbo (2017) Cluster correspondence analysis, Psychometrika, 82, 158–185 Von Rosen, D (1985) Multivariate Linear Normal Models with Special References to the Growth Curve Model , Ph.D Thesis, University of Stockholm Wang, S G and S C Chow (1994) Advanced Linear Models, Marcel Dekker, New York Wilkinson, J H (1965) The Algebraic Eigenvalue Problem, Clarendon Press, Oxford Wilks, S S (1962) Mathematical Statistics, 2nd edition, John Wiley, New York Wolkowicz, H and G P H Styan (1980) Bounds for eigenvalues using traces, Linear Algebra and Its Applications, 29, 471–506 Wong, C S (1980) Matrix derivatives and its applications in statistics, Journal of Mathematical Psychology, 22, 70–81 Wong, C S (1985) On the use of differentials in statistics, Linear Algebra and Its Applications, 70, 285–299 Wong, C S and K S Wong (1979) A first derivative test for the ML estimates, Bulletin of the Institute of Mathematics, Academia Sinica, 7, 313–321 Wong, C S and K S Wong (1980) Minima and maxima in multivariate analysis, Canadian Journal of Statistics, 8, 103–113 Yang, Y (1988) A matrix trace inequality, Journal of Mathematical Analysis and Applications, 133, 573–574 Young, W H (1910) The Fundamental Theorems of the Differential Calculus, Cambridge Tracts in Mathematics and Mathematical Physics, No 11, Cambridge University Press, Cambridge Yuan, K.-H., L L Marshall and P M Bentler (2002) A unified approach to exploratory factor analysis with missing data, nonnormal data, and in the presence of outliers, Psychometrika, 67, 95–122 Yuen, T P., H Wong and K F C Yiu (2018) On constrained estimation of graphical time series models, Computational Statistics and Data Analysis, 124, 27–52 Bibliography 465 Zellner, A (1962) An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias, Journal of the American Statistical Association, 57, 348–368 Zyskind, G and F B Martin (1969) On best linear estimation and a general Gauss-Markov theorem in linear models with arbitrary nonnegative covariance structure, SIAM Journal of Applied Mathematics, 17, 1190– 1202 Index of symbols The symbols listed below are followed by a brief statement of their meaning and by the number of the page where they are defined General symbols =⇒ ⇐⇒ ✷ max sup lim i e, exp δij ! ≺ |ξ| ξ¯ implies if and only if end of proof minimum, minimize maximum, maximize supremum limit, 79 imaginary unit, 14 exponential Kronecker delta, factorial majorization, 242 absolute value of scalar ξ (possibly complex) complex conjugate of scalar ξ, 14 Sets ∈, ∈ / {x : x ∈ S, x satisfies P } ⊂ ∪ ∩ ∅ B−A Ac IN IR C IRn , IRm×n belongs to (does not belong to), set of all elements of S with property P , is a subset of, union, intersection, empty set, complement of A relative to B, complement of A, {1, 2, }, set of real numbers, set of complex numbers, 108 set of real n × vectors (m × n matrices), 467 Index of symbols 468 IRn+ positive orthant of IRn , 409 S S′ S¯ ∂S B(c), B(c; r), B(C; r) N (c), N (C) M(A) O interior of S, 74 derived set of S, 74 closure of S, 74 boundary of S, 74 ball with center c (C), 73, 104 neighborhood of c (C), 73, 104 column space of A, Stiefel manifold, 396 ◦ Special matrices and vectors I, In ei Kmn Kn Nn Dn Jk (λ) ı identity matrix (of order n × n), null matrix, null vector, elementary vector , 50 commutation matrix, 54 Knn , 54 2 (In + Kn ), 55 duplication matrix, 56 Jordan block, 19 (1, 1, , 1)′ , 44 Operations on matrix A and vector a A′ A−1 A+ A− dg A, dg(A) diag(a1 , , an ) A2 A1/2 Ap A# A∗ Ak (A : B), (A, B) vec A, vec(A) vech(A) r(A) λi , λi (A) µ(A) tr A, tr(A) |A| transpose, inverse, 10 Moore-Penrose inverse, 37 generalized inverse, 44 diagonal matrix containing the diagonal elements of A, diagonal matrix containing a1 , a2 , , an on the diagonal, AA, square root, pth power, 200, 245 adjoint (matrix), 11 complex conjugate, 14 principal submatrix of order k × k, 26 partitioned matrix vec operator, 34 vector containing aij (i ≥ j), 56 rank, ith eigenvalue (of A), 18 (maxi λi (A′ A))1/2 , 18, 266 trace, 11 determinant, 10 Index of symbols A a Mp (x, a) M0 (x, a) A ≥ B, B ≤ A A > B, B < A 469 norm of matrix, 12 norm of vector, weighted mean of order p, 256 geometric mean, 258 A − B positive semidefinite, 24 A − B positive definite, 24 Matrix products ⊗ ⊙ Kronecker product, 31 Hadamard product, 52 Functions f :S→T φ, ψ f, g F, G g ◦ f, G ◦ F function defined on S with values in T , 78 real-valued function, 192 vector function, 192 matrix function, 192 composite function, 100, 106 Derivatives d d2 dn Dj φ, Dj fi D2kj φ, D2kj fi φ′ (ξ) Dφ(x), ∂φ(x)/∂x′ Df (x), ∂f (x)/∂x′ DF (X) ∂ vec F (X)/∂(vec X)′ ∇φ, ∇f φ′′ (ξ) Hφ(x), ∂ φ(x)/∂x∂x′ differential, 90, 92, 105 second differential, 116, 127 nth order differential, 125 partial derivative, 95 second-order partial derivative, 111 derivative of φ(ξ), 90 derivative of φ(x), 97, 191 derivative (Jacobian matrix) of f (x), 96, 191 derivative (Jacobian matrix) of F (X), 105 derivative of F (X), alternative notation, 105 gradient, 97 second derivative of φ(ξ), 121 second derivative (Hessian matrix) of φ(x), 112, 211 Statistical symbols Pr a.s E var varas cov LS(E) probability, 273 almost surely, 277 expectation, 274 variance (matrix), 275 asymptotic variance (matrix), 362 covariance (matrix), 275 least squares (estimator), 260 470 ML(E) FIML LIML MSE Fn F ∼ Nm (µ, Ω) Index of symbols maximum likelihood (estimator), 348 full information maximum likelihood, 374 limited information maximum likelihood, 378 mean squared error, 282 information matrix, 348 asymptotic information matrix, 348 is distributed as, 279 normal distribution, 279 Subject index Accumulation point, 74, 78, 79, 88 Adjoint (matrix), 11, 47–51, 165, 187 differential of, 172–174, 187 of nonsingular matrix, 11 rank of, 48 Aitken’s theorem, 291 Aitken, A C., 30, 70, 291, 293, 319 Almost surely (a.s.), 277 Anderson’s theorem, 394 Anderson, T W., 283, 394, 419, 420 Approximation first-order (linear), 90–91 second-order, 113, 119 zero-order, 89 Cauchy’s rule of invariance, 103, 106, 425 and simplified notation, 106– 107 Cauchy-Riemann equations, 109 Cayley-Hamilton theorem, 16, 184 Chain rule, 101 for first differentials, 428 for Hessian matrices, 122 for matrix functions, 106, 433 for second differentials, 436 Characteristic equation, 14 Closure, 74 Cofactor (matrix), 11, 47 Column space, Commutation matrix, 54–55, 442 as derivative of X ′ , 204 as Hessian matrix of (1/2) tr X , 216 Complement, Complexity, entropic, 29 Component analysis, 395–403 core matrix, 396 core vector, 400 multimode, 400–403 one-mode, 395–399 and sample principal components, 398 two-mode, 399–400 Concave function (strictly), 83 see also Convex function Concavity (strict) of log x, 85, 142, 229 of log |X|, 250 of smallest eigenvalue, 232 see also Convexity Ball convex, 82 in IRn , 73 in IRn×q , 105 open, 75 Bias, 282 of least-squares estimator of σ , 333 bounds of, 333–334 Bilinear form, maximum of, 241, 414–416 Bolzano-Weierstrass theorem, 78 Bordered determinantal criterion, 152 Boundary, 74 Boundary point, 74 Canonical correlations, 414–416 Cartesian product, 471 472 Consistency of linear model, 304 with constraints, 308 see also Linear equations Constant elasticity of substitution, 256 Continuity, 80, 88 of differentiable function, 94 on compact set, 131 Convex combination (of points), 82 Convex function (strictly), 83–86 and absolute minimum, 142 and absolute minimum under constraints, 154 and inequalities, 243, 244 characterization (differentiable), 138, 139 characterization (twice differentiable), 141 continuity of, 84 Convex set, 81–83 Convexity (strict) of Lagrangian function, 154 of largest eigenvalue, 186, 232 Correspondence analysis, 417–418 Covariance (matrix), 275 Critical point, 130, 146 Critical value, 130 Demand equations, 364 Density function, 274 joint, 274 marginal, 277 Dependent (linearly), Derisative, 192–194 Derivative, 90, 92, 105 complex, 108–110 first derivative, 92, 105, 426, 427 first-derivative test, 134 notation, 191, 194–196, 426, 427 partial derivative, 95 differentiability of, 115 existence of, 95 notation, 95 second-order, 111 partitioning of, 197–198 second-derivative test, 136 Subject index Determinant, 10 concavity of log |X|, 250 continuity of |X|, 168 derivative of |X|, 201 differential of log |X|, 167, 433 differential of |X|, 165, 187, 433 equals product of eigenvalues, 21 Hessian of log |X|, 214 Hessian of |X|, 214 higher-order differentials of log |X|, 168 minor, see Minor of partitioned matrix, 13, 25, 28, 51 of triangular matrix, 11 second differential of log |X|, 168, 251 Diagonalization of matrix with distinct eigenvalues, 19 of symmetric matrix, 17, 27 Dieudonn´e, J., 86, 110, 159 Differentiability, 92, 97–100, 104 see also Derivative, Differential, Function Differential first differential and infinitely small quantities, 90 composite function, 103, 106 existence of, 97–100 fundamental rules, 163–165, 425, 427, 432 geometrical interpretation, 90, 424 notation, 90, 106–107 of matrix function, 104, 423, 432 of real-valued function, 90, 424 of vector function, 92, 423 uniqueness of, 93 higher-order differential, 125 second differential does not satisfy Cauchy’s rule of invariance, 123 Subject index Differential (Cont’d ) existence of, 116 implies second-order Taylor formula, 119 notation, 116, 126 of composite function, 123– 124, 127 of matrix function, 126 of real-valued function, 116 of vector function, 116, 117, 434 uniqueness of, 117 Disjoint, 4, 63 Distribution cumulative, 273 degenerate, 277 Disturbance, 285 prediction of, 335–340 Duplication matrix, 56–60, 444 Eckart and Young’s theorem, 396 Eigenprojection, 176 continuity of, 176–179 total, 176 continuity of, 178 uniqueness of, 176 Eigenspace, 176 dimension of, 176 Eigenvalue, 14 and Karamata’s inequality, 244 continuity of, 177, 178 convexity (concavity) of extreme eigenvalue, 186, 232 derivative of, 202 differential of, 174–175, 180– 185 alternative expressions, 183– 185 application in factor analysis, 409 with symmetric perturbations, 182 gradient of, 202 Hessian matrix of, 215 monotonicity of, 236 multiple, 177 multiple eigenvalue, 14 473 multiplicity of, 14 of (semi)definite matrix, 16 of idempotent matrix, 15 of singular matrix, 15 of symmetric matrix, 15 of unitary matrix, 15 ordering, 230 quasilinear representation, 231, 234 second differential of, 186 application in factor analysis, 410 simple eigenvalue, 14, 21, 22 variational description, 232 Eigenvector, 14 (dis)continuity of, 177 column eigenvector, 14 derivative of, 203 differential of, 174, 180–183 with symmetric perturbations, 182 Hessian of, 218 linear independence, 16 normalization, 14, 181 row eigenvector, 14 Elementary vector, 95 Errors-in-variables, 357–359 Estimable function, 286, 295–296, 300 necessary and sufficient conditions, 296 strictly estimable, 302 Estimator, 282 affine, 286 affine minimum-determinant unbiased, 290 affine minimum-trace unbiased, 287–317 definition, 287 optimality of, 292 best affine unbiased, 286–317 definition, 286 relation with affine minimumtrace unbiased estimator, 287 best linear unbiased, 286 best quadratic invariant, 327 474 Estimator (Cont’d ) best quadratic unbiased, 322– 326, 330–332 definition, 322 maximum likelihood, see Maximum likelihood positive, 322 quadratic, 322 unbiased, 282 Euclid, 228 Euclidean space, Expectation, 274, 275 as linear operator, 275, 276 of quadratic form, 277, 283 Exponential of a matrix, 188 differential of, 188 Factor analysis, 404–414 Newton-Raphson routine, 408 varimax, 412–414 zigzag procedure, 407–408 First-derivative test, 134 First-order conditions, 146, 149, 430 Fischer’s min-max theorem, 234 Fischer, E., 225, 233, 234 Full-information ML (FIML), see Maximum likelihood Function, 78 affine, 79, 85, 90, 123 analytic, 109, 125–126, 177, 178 bounded, 79, 80 classification of, 192 component, 88, 89, 93, 115 composite, 89, 100–103, 106, 121–124, 127, 143, 217 continuous, 80 convex, see Convex function differentiable, 92, 97–100, 104 continuously, 100 n times, 125 twice, 114 domain of, 78 estimable (strictly), 286, 295– 296, 300, 302 increasing (strictly), 78, 85 likelihood, 347 limit of, 79 Subject index linear, 79 loglikelihood, 347 matrix, 104 monotonic (strictly), 79 range of, 78 real-valued, 78, 87 vector, 78, 87 Gauss, K F., 287, 289, 319 Gauss-Markov theorem, 289 Generalized inverse, 44 Gradient, 97 Hadamard product, 52–53, 69 derivative of, 207 differential of, 164 in factor analysis, 409, 413, 414 Hessian matrix identification of, 211–213, 442 of matrix function, 113 of real-valued function, 112, 203, 211, 213–217, 348, 435 of vector function, 113, 118 symmetry of, 113, 117–118, 435 Identification (in simultaneous equations), 369–373 global, 370, 371 with linear constraints, 371 local, 370, 372, 373 with linear constraints, 372 with nonlinear constraints, 373 Identification table first, 197 second, 212 Identification theorem, first for matrix functions, 105, 197, 441 for real-valued functions, 97, 197, 425 for vector functions, 96, 197, 427 Identification theorem, second for real-valued functions, 119, 212, 435, 441 Implicit function theorem, 157–159, 180 Subject index Independent (linearly), of eigenvectors, 16 Independent (stochastically), 277– 279 and correlation, 278 and identically distributed (i.i.d.), 279 Inequality arithmetic-geometric mean, 148, 229, 258 matrix analog, 268 Bergstrom, 227 matrix analog, 268 Cauchy-Schwarz, 226 matrix analogs, 227228 Cramer-Rao, 348 Hă older, 249 matrix analog, 249 Hadamard, 242 Kantorovich, 267 matrix analog, 268 Karamata, 243 applied to eigenvalues, 244 Minkowski, 252, 260 matrix analog, 252 Schlă omilch, 258 Schur, 228 triangle, 227 Information matrix, 348 asymptotic, 348 for full-information ML, 374 for limited-information ML, 381– 383 for multivariate linear model, 355 for nonlinear regression model, 360, 361, 363 for normal distribution, 352, 447 multivariate, 354 Interior, 74 Interior point, 74, 129 Intersection, 4, 76, 77, 82 Interval, 75 Inverse, 10 convexity of, 251 derivative of, 205 475 differential of, 168, 433 higher-order, 169 second, 169 Hessian of, 219 of partitioned matrix, 12–13 Isolated point, 74, 88 Jacobian, 97, 209–210, 369 involving symmetry, 210 Jacobian matrix, 97, 105, 209, 435 identification of, 196–197 Jordan decomposition, 18, 49 Kronecker delta, Kronecker product, 31–34, 439 derivative of, 206–207 determinant of, 33 differential of, 164 eigenvalues of, 33 eigenvectors of, 33 inverse of, 32, 440 Moore-Penrose inverse of, 38 rank of, 34 trace of, 32, 440 transpose of, 32, 440 vec of, 55 Lagrange multipliers, 145, 430 economic interpretation of, 155– 157 matrix of, 155, 433 symmetric matrix of, 325, 337, 339, 396, 398, 402, 413 Lagrange’s theorem, 145 Lagrangian function, 145, 154, 430, 433 convexity (concavity) of, 154 first-order conditions, 146 Least squares (LS), 260, 290–291, 431–432 and best affine unbiased estimation, 290–291, 315–318 as approximation method, 290 generalized, 261, 315–316 LS estimator of σ , 332 bounds for bias of, 333–334 476 Least squares (Cont’d ) restricted, 262–265, 316–318, 431 matrix version, 264–265 Limit, 79 Limited-information ML (LIML), see Maximum likelihood Linear discriminant analysis, 418– 419 Linear equations, 41 consistency of, 42 homogeneous equation, 41 matrix equation, 43, 51, 67 uniqueness of, 43 vector equation, 42 Linear form, 7, 117, 434 derivative of, 198, 428 second derivative of, 212 Linear model consistency of, 304 with constraints, 308 estimation of σ , 322–330 estimation of W β, 286–318 alternative route, 311 singular variance matrix, 304– 314 under linear constraints, 296– 304, 307–314 explicit and implicit constraints, 307–310 local sensitivity analysis, 341– 343 multivariate, 354–357, 367 prediction of disturbances, 335– 340 Lipschitz condition, 94 Logarithm of a matrix, 188 differential of, 188 Majorization, 242 Markov switching, 366 Markov, A A., 287, 289, 319 Matrix, adjoint, see Adjoint commuting, 6, 250 complex, 4, 14 complex conjugate, 14 Subject index diagonal, 7, 27 element of, function, 188 Hessian of, 219–220 of symmetric matrix, 27, 188 Gramian, 65–67 MP inverse of, 65 Hermitian, 7, 14 idempotent, 6, 22, 23, 40 eigenvalue of, 15 locally, 171 identity, indefinite, inverse, see Inverse lower triangular (strictly), negative (semi)definite, nonnegative, 249, 252, 417 nonsingular, 10 null, orthogonal, 7, 14, 179 always real, partitioned, 12 determinant of, 13, 28 inverse of, 12–13 product of, 12 sum of, 12 permutation, 10 positive (semi)definite, 8, 23– 26 eigenvalue of, 16 inequality, 245, 255 power of, 200, 202, 205, 245 product, real, semi-orthogonal, singular, 10 skew-Hermitian, 7, 14 skew-symmetric, 6, 14, 29 always real, square, square root, 8, 28 submatrix, see Submatrix sum, symmetric, 6, 14 always real, transpose, triangular, Subject index Matrix (Cont’d ) unit lower (upper) triangular, unitary, 7, 14 eigenvalue of, 15 upper triangular (strictly), Vandermonde, 183, 187 Maximum of a bilinear form, 241 see also Minimum Maximum likelihood (ML), 347–366 errors-in-variables, 357–359 estimate, estimator, 348 full-information ML (FIML), 374–378 limited-information ML (LIML), 378–388 as special case of FIML, 378 asymptotic variance matrix, 383–388 estimators, 379 first-order conditions, 378 information matrix, 381 multivariate linear regression model, 354–355 multivariate normal distribution, 348, 445–447 with distinct means, 354–364 nonlinear regression model, 360– 362 sample principal components, 394 Mean squared error, 282, 318, 327– 330 Mean-value theorem for real-valued functions, 103, 124 for vector functions, 110 Means, weighted, 256 bounds of, 256 curvature of, 259 limits of, 257 linear homogeneity of, 256 monotonicity of, 258 Minimum (strict) absolute, 130 (strict) local, 130 477 existence of absolute minimum, 131 necessary conditions for local minimum, 132–134 sufficient conditions for absolute minimum, 142–143 sufficient conditions for local minimum, 134–137 Minimum under constraints (strict) absolute, 145 (strict) local, 144 necessary conditions for constrained local minimum, 145–149 sufficient conditions for constrained absolute minimum, 154– 155 sufficient conditions for constrained local minimum, 149–153 Minkowski’s determinant theorem, 255 Minor, 11 principal, 11, 26, 240 Monotonicity, 143 Moore-Penrose (MP) inverse and linear equations, 41–43 definition of, 36 differentiability of, 169–171 differential of, 169–171, 188 existence of, 37 of bordered Gramian matrix, 65–66 properties of, 38–41 uniqueness of, 37 Multicollinearity, 293 Neighborhood, 73 Nonlinear regression model, 359– 364 Norm convex, 86 of matrix, 12, 105 of vector, 6, 11 Normal distribution affine transformation of, 281 and zero correlation, 280 integrates to one, 210 478 Normal distribution (Cont’d ) marginal distribution, 280 moments, 280 of affine function, 281 of quadratic function, 281, 283, 330 n-dimensional, 280 one-dimensional, 144, 279 moments, 280 standard-normal, 279, 281 Notation, xviii, 103, 106–107, 191, 424, 426, 427, 435, 441 Observational equivalence, 370 Opportunity set, 144 Optimization constrained, 129, 144–157, 430, 433 unconstrained, 129–144, 429 Partial derivative, see Derivative Poincar´e’s separation theorem, 236 consequences of, 237–240 Poincar´e, H., 225, 236–239, 269 Positivity (in optimization problems), 254, 323, 327, 351, 392 Predictor best linear unbiased, 335 BLUF, 338–341 BLUS, 336 Principal components (population), 390 as approximation to population variance, 392 optimality of, 391 uncorrelated, 390 unique, 391 usefulness, 392 Principal components (sample), 394 and one-mode component analysis, 398 as approximation to sample variance, 395 ML estimation of, 394 optimality of, 395 sample variance, 394 Probability, 273 Subject index with probability one, 277 Quadratic form, 7, 117, 434 convex, 85 derivative of, 198, 428 positivity of linear constraints, 60–63 under linear constraints, 152 second derivative of, 212, 213 Quasilinearization, 231, 246 of (tr Ap )1/p , 247 of |A|1/n , 253 of eigenvalues, 234 of extreme eigenvalues, 231 Random variable (continuous), 274 Rank, and nonzero eigenvalues, 21, 22 column rank, locally constant, 106, 151, 169– 171, 174 and continuity of Moore-Penrose inverse, 169 and differentiability of MoorePenrose inverse, 170 of idempotent matrix, 22 of partitioned matrix, 63 of symmetric matrix, 22 rank condition, 370 row rank, Rao, C R., 30, 45, 70, 283, 319, 344, 348, 365, 420 Rayleigh quotient, 230 bounds of, 230 generalized, 418 Saddle point, 130, 137 Sample, 279 sample variance, 394, 395 Schur decomposition, 17 Score vector, 348 Second-derivative test, 136 Second-order conditions, 149, 152 Sensitivity analysis, local, 341–343 of posterior mean, 341 of posterior precision, 343 Subject index Set, (proper) subset, dense, 167 bounded, 4, 75, 78 closed, 75 compact, 75, 80, 131 complement, convex, 81–83 derived, 74 disjoint, element of, empty, intersection, open, 75 union, Simultaneous equations model, 367– 388 identification, 369–373 loglikelihood, 369 normality assumption, 368 rank condition, 370 reduced form, 368 reduced-form parameters, 369– 370 structural parameters, 369–370 see also Maximum likelihood Singular-value decomposition, 20 Spectrum, 176 Stiefel manifold, 396 Submatrix, 11 principal, 11, 231 Symmetry, treatment of, 350–351 Taylor formula first-order, 90, 113, 124 of order zero, 89 second-order, 113, 114, 119 Taylor’s series, 125 Taylor’s theorem (for real-valued functions), 124, 126 Theil, H., 319, 344, 420 Trace, 11, 432 derivative of, 199–200 equals sum of eigenvalues, 21 Uncorrelated, 275, 276, 278, 281 Union, 4, 76, 77 479 Variance (matrix), 275–277 asymptotic, 348, 352, 354, 355, 360, 362, 364, 376, 378, 383–388 generalized, 276, 352 of quadratic form in normal variables, 281, 283, 330 positive semidefinite, 276 sample, 394, 395 Variational description, 232 Vec operator, 34–36, 439 of Kronecker product, 55 Vector, column vector, component of, elementary, 95 orthonormal, row vector, Weierstrass theorem, 131 ... 35 4 35 4 35 7 35 9 36 1 36 2 36 4 36 5 36 7 36 7 36 7 36 9 37 1 37 1 37 3 37 4 37 6 37 8 38 1 38 3 38 8 Contents xii 17 Topics in. .. 32 7 32 9 33 0 33 2 33 3 33 5 33 6 33 8 34 0 34 1 34 2 34 4 Part Six — Applications to maximum likelihood estimation 15 Maximum likelihood estimation Introduction ... 285 285 286 287 290 291 2 93 295 296 30 0 30 2 30 4 30 5 30 7 30 7 31 0 31 1 31 4 31 5 31 6 31 8 31 9 of σ 32 1 32 1 32 2 32 2 32 4 32 6