Hardle et al applied multivariate statistical analysis ( 2003)

Applied Multivariate Statistical Analysis Wolfgang Hă ardle L eopold Simar ∗ Version: 29th April 2003 Contents I Descriptive Techniques Comparison of Batches 11 13 1.1 Boxplots 14 1.2 Histograms 22 1.3 Kernel Densities 25 1.4 Scatterplots 30 1.5 Chernoff-Flury Faces 34 1.6 Andrews’ Curves 39 1.7 Parallel Coordinates Plots 42 1.8 Boston Housing 44 1.9 Exercises 52 II Multivariate Random Variables 55 A Short Excursion into Matrix Algebra 57 2.1 Elementary Operations 57 2.2 Spectral Decompositions 63 2.3 Quadratic Forms 65 2.4 Derivatives 68 2.5 Partitioned Matrices 68 Contents 2.6 Geometrical Aspects 71 2.7 Exercises 79 Moving to Higher Dimensions 81 3.1 Covariance 82 3.2 Correlation 86 3.3 Summary Statistics 92 3.4 Linear Model for Two Variables 95 3.5 Simple Analysis of Variance 103 3.6 Multiple Linear Model 108 3.7 Boston Housing 112 3.8 Exercises 115 Multivariate Distributions 119 4.1 Distribution and Density Function 120 4.2 Moments and Characteristic Functions 125 4.3 Transformations 135 4.4 The Multinormal Distribution 137 4.5 Sampling Distributions and Limit Theorems 142 4.6 Bootstrap 148 4.7 Exercises 152 Theory of the Multinormal 155 5.1 Elementary Properties of the Multinormal 155 5.2 The Wishart Distribution 162 5.3 Hotelling Distribution 165 5.4 Spherical and Elliptical Distributions 167 5.5 Exercises 169 Contents Theory of Estimation 173 6.1 The Likelihood Function 174 6.2 The Cramer-Rao Lower Bound 178 6.3 Exercises 181 Hypothesis Testing 183 7.1 Likelihood Ratio Test 184 7.2 Linear Hypothesis 192 7.3 Boston Housing 209 7.4 Exercises 212 III Multivariate Techniques Decomposition of Data Matrices by Factors 217 219 8.1 The Geometric Point of View 220 8.2 Fitting the p-dimensional Point Cloud 221 8.3 Fitting the n-dimensional Point Cloud 225 8.4 Relations between Subspaces 227 8.5 Practical Computation 228 8.6 Exercises 232 Principal Components Analysis 233 9.1 Standardized Linear Combinations 234 9.2 Principal Components in Practice 238 9.3 Interpretation of the PCs 241 9.4 Asymptotic Properties of the PCs 246 9.5 Normalized Principal Components Analysis 249 9.6 Principal Components as a Factorial Method 250 9.7 Common Principal Components 256 Contents 9.8 Boston Housing 259 9.9 More Examples 261 9.10 Exercises 272 10 Factor Analysis 275 10.1 The Orthogonal Factor Model 275 10.2 Estimation of the Factor Model 282 10.3 Factor Scores and Strategies 291 10.4 Boston Housing 293 10.5 Exercises 298 11 Cluster Analysis 301 11.1 The Problem 301 11.2 The Proximity between Objects 302 11.3 Cluster Algorithms 308 11.4 Boston Housing 316 11.5 Exercises 318 12 Discriminant Analysis 323 12.1 Allocation Rules for Known Distributions 323 12.2 Discrimination Rules in Practice 331 12.3 Boston Housing 337 12.4 Exercises 339 13 Correspondence Analysis 341 13.1 Motivation 341 13.2 Chi-square Decomposition 344 13.3 Correspondence Analysis in Practice 347 13.4 Exercises 358 Contents 14 Canonical Correlation Analysis 361 14.1 Most Interesting Linear Combination 361 14.2 Canonical Correlation in Practice 366 14.3 Exercises 372 15 Multidimensional Scaling 373 15.1 The Problem 373 15.2 Metric Multidimensional Scaling 379 15.2.1 The Classical Solution 379 15.3 Nonmetric Multidimensional Scaling 383 15.3.1 Shepard-Kruskal algorithm 384 15.4 Exercises 391 16 Conjoint Measurement Analysis 393 16.1 Introduction 393 16.2 Design of Data Generation 395 16.3 Estimation of Preference Orderings 398 16.4 Exercises 405 17 Applications in Finance 407 17.1 Portfolio Choice 407 17.2 Efficient Portfolio 408 17.3 Efficient Portfolios in Practice 415 17.4 The Capital Asset Pricing Model (CAPM) 417 17.5 Exercises 418 18 Highly Interactive, Computationally Intensive Techniques 421 18.1 Simplicial Depth 421 18.2 Projection Pursuit 425 18.3 Sliced Inverse Regression 431 Contents 18.4 Boston Housing 439 18.5 Exercises 440 A Symbols and Notation 443 B Data 447 B.1 Boston Housing Data 447 B.2 Swiss Bank Notes 448 B.3 Car Data 452 B.4 Classic Blue Pullovers Data 454 B.5 U.S Companies Data 455 B.6 French Food Data 457 B.7 Car Marks 458 B.8 French Baccalauréat Frequencies 459 B.9 Journaux Data 460 B.10 U.S Crime Data 461 B.11 Plasma Data 463 B.12 WAIS Data 464 B.13 ANOVA Data 466 B.14 Timebudget Data 467 B.15 Geopol Data 469 B.16 U.S Health Data 471 B.17 Vocabulary Data 473 B.18 Athletic Records Data 475 B.19 Unemployment Data 477 B.20 Annual Population Data 478 Bibliography 479 Index 483 Preface Most of the observable phenomena in the empirical sciences are of a multivariate nature In financial studies, assets in stock markets are observed simultaneously and their joint development is analyzed to better understand general tendencies and to track indices In medicine recorded observations of subjects in different locations are the basis of reliable diagnoses and medication In quantitative marketing consumer preferences are collected in order to construct models of consumer behavior The underlying theoretical structure of these and many other quantitative studies of applied sciences is multivariate This book on Applied Multivariate Statistical Analysis presents the tools and concepts of multivariate data analysis with a strong focus on applications The aim of the book is to present multivariate data analysis in a way that is understandable for non-mathematicians and practitioners who are confronted by statistical data analysis This is achieved by focusing on the practical relevance and through the e-book character of this text All practical examples may be recalculated and modified by the reader using a standard web browser and without reference or application of any specific software The book is divided into three main parts The first part is devoted to graphical techniques describing the distributions of the variables involved The second part deals with multivariate random variables and presents from a theoretical point of view distributions, estimators and tests for various practical situations The last part is on multivariate techniques and introduces the reader to the wide selection of tools available for multivariate data analysis All data sets are given in the appendix and are downloadable from www.md-stat.com The text contains a wide variety of exercises the solutions of which are given in a separate textbook In addition a full set of transparencies on www.md-stat.com is provided making it easier for an instructor to present the materials in this book All transparencies contain hyper links to the statistical web service so that students and instructors alike may recompute all examples via a standard web browser The first section on descriptive techniques is on the construction of the boxplot Here the standard data sets on genuine and counterfeit bank notes and on the Boston housing data are introduced Flury faces are shown in Section 1.5, followed by the presentation of Andrews curves and parallel coordinate plots Histograms, kernel densities and scatterplots complete the first part of the book The reader is introduced to the concept of skewness and correlation from a graphical point of view Preface At the beginning of the second part of the book the reader goes on a short excursion into matrix algebra Covariances, correlation and the linear model are introduced This section is followed by the presentation of the ANOVA technique and its application to the multiple linear model In Chapter the multivariate distributions are introduced and thereafter specialized to the multinormal The theory of estimation and testing ends the discussion on multivariate random variables The third and last part of this book starts with a geometric decomposition of data matrices It is influenced by the French school of analyse de données This geometric point of view is linked to principal components analysis in Chapter An important discussion on factor analysis follows with a variety of examples from psychology and economics The section on cluster analysis deals with the various cluster techniques and leads naturally to the problem of discrimination analysis The next chapter deals with the detection of correspondence between factors The joint structure of data sets is presented in the chapter on canonical correlation analysis and a practical study on prices and safety features of automobiles is given Next the important topic of multidimensional scaling is introduced, followed by the tool of conjoint measurement analysis The conjoint measurement analysis is often used in psychology and marketing in order to measure preference orderings for certain goods The applications in finance (Chapter 17) are numerous We present here the CAPM model and discuss efficient portfolio allocations The book closes with a presentation on highly interactive, computationally intensive techniques This book is designed for the advanced bachelor and first year graduate student as well as for the inexperienced data analyst who would like a tour of the various statistical tools in a multivariate data analysis workshop The experienced reader with a bright knowledge of algebra will certainly skip some sections of the multivariate random variables part but will hopefully enjoy the various mathematical roots of the multivariate techniques A graduate student might think that the first part on description techniques is well known to him from his training in introductory statistics The mathematical and the applied parts of the book (II, III) will certainly introduce him into the rich realm of multivariate statistical data analysis modules The inexperienced computer user of this e-book is slowly introduced to an interdisciplinary way of statistical thinking and will certainly enjoy the various practical examples This e-book is designed as an interactive document with various links to other features The complete e-book may be downloaded from www.xplore-stat.de using the license key given on the last page of this book Our e-book design offers a complete PDF and HTML file with links to MD*Tech computing servers The reader of this book may therefore use all the presented methods and data via the local XploRe Quantlet Server (XQS) without downloading or buying additional software Such XQ Servers may also be installed in a department or addressed freely on the web (see www.ixplore.de for more information) 472 DE MD VA MV NC SC GA FL KY TN AL MS AR LA OK TX MT ID WY CO NM AZ UT NV WA OR CA AK HI B Data 2044 10460 40767 24231 52669 31113 58910 58664 40409 42144 51705 47689 53187 47751 69956 266807 147046 83564 97809 104091 121593 1140 84899 110561 68138 97073 158706 5914 6471 622 4392 5706 1936 6255 3347 5976 11366 3726 4762 4021 2613 2359 4481 3301 16370 826 15.0 509 3231 1450 3187 1645 936 4409 2687 26365 521 1054 38.8 35.2 37.4 46.7 45.4 47.8 48.2 46.0 48.8 45.0 48.9 59.3 51.0 52.3 62.5 48.9 59.0 51.5 67.6 44.7 62.3 48.3 39.3 57.3 41.4 41.6 40.3 85.8 32.5 404.5 366.7 365.3 502.7 392.6 374.4 371.4 501.8 442.5 427.2 411.5 422.3 482 390.9 441.4 327.9 372.2 324.8 264.2 280.2 235.6 331.5 242 299.5 358.1 387.8 357.8 114.6 216.9 202.8 195 174.4 199.6 169.2 156.9 157.9 244 194.7 185.6 185.8 173.9 202.1 168.1 182.4 146.5 170.7 140.4 112.2 125.1 137.2 165.6 93.7 162.3 171 179.4 173 76.1 125.8 25.3 23.4 22.4 35.2 22.6 19.6 22.6 34.0 29.8 27.0 25.5 21.7 29.0 18.6 27.6 20.7 33.4 29.9 27.7 29.9 28.7 36.3 17.6 32.3 31.1 33.8 26.9 8.3 16.0 16.0 15.8 20.3 20.1 19.8 19.2 20.5 18.3 22.9 20.8 16.8 19.5 22.7 15.8 24.5 17.4 25.1 22.3 18.5 22.8 17.8 21.2 14.5 13.7 21.2 23.1 22.2 12.4 16.8 25.0 16.1 11.4 18.4 13.1 14.8 13.2 16.1 15.9 12.0 16.1 14.0 15.0 17.8 15.3 12.1 14.4 12.4 9.2 9.6 17.5 12.6 11.1 11.1 13.0 11.2 10.7 3.4 12.7 10.5 9.6 9.2 10.0 10.2 9.0 10.4 17.2 9.1 8.3 9.1 7.1 8.7 8.3 9.6 8.7 11.1 9.2 9.2 9.5 13.1 13.1 7.3 15.4 10.9 10.4 16.7 11.0 6.2 1046 11961 9749 2813 9355 4355 8256 18836 5189 7572 5157 2883 2952 7061 4128 23481 1058 1079 606 5899 2127 5137 2563 1272 7768 4904 57225 545 1953 14 85 135 75 159 89 191 254 120 162 146 118 97 158 143 562 67 52 31 98 56 79 44 26 122 83 581 26 26 3 3 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 6 6 7 7 8 8 8 8 9 9 B.17 Vocabulary Data B.17 473 Vocabulary Data This example of the evolution of the vocabulary of children can be found in Bock (1975) Data are drawn from test results on file in the Records Office of the Laboratory School of the University of Chicago They consist of scores, obtained from a cohort of pupils from the eighth through eleventh grade levels, on alternative forms of the vocabulary section of the Coorperative Reading Test It provides the following scaled scores shown for the sample of 64 subjects (the origin and units are fixed arbitrarily) Subjects 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 1.75 0.90 0.80 2.42 −1.31 −1.56 1.09 −1.92 −1.61 2.47 −0.95 1.66 2.07 3.30 2.75 2.25 2.08 0.14 0.13 2.19 −0.64 2.02 2.05 1.48 1.97 1.35 −0.56 0.26 1.22 −1.43 −1.17 1.68 −0.47 2.18 4.21 8.26 1.24 5.94 Grade 10 2.60 3.76 2.47 2.44 0.93 0.40 4.15 4.56 −1.31 −0.66 1.67 0.18 1.50 0.52 1.03 0.50 0.29 0.73 3.64 2.87 0.41 0.21 2.74 2.40 4.92 4.46 6.10 7.19 2.53 4.28 3.38 5.79 1.74 4.12 0.01 1.48 3.19 0.60 2.65 3.27 −1.31 −0.37 3.45 5.32 1.80 3.91 0.47 3.63 2.54 3.26 4.63 3.54 −0.36 1.14 0.08 1.17 1.41 4.66 0.80 −0.03 1.66 2.11 1.71 4.07 0.93 1.30 6.42 4.64 7.08 6.00 9.55 10.24 4.90 2.42 6.56 9.36 11 3.68 3.43 2.27 4.21 −2.22 2.33 2.33 3.04 3.24 5.38 1.82 2.17 4.71 7.46 5.93 4.40 3.62 2.78 3.14 2.73 4.09 6.01 2.49 3.88 5.62 5.24 1.34 2.15 2.62 1.04 1.42 3.30 0.76 4.82 5.65 10.58 2.54 7.72 Mean 2.95 2.31 1.10 3.83 −1.38 0.66 1.36 0.66 0.66 3.59 0.37 2.24 4.04 6.02 3.87 3.96 2.89 1.10 1.77 2.71 0.44 4.20 2.56 2.37 3.35 3.69 0.39 0.92 2.47 0.09 1.00 2.69 0.63 4.51 5.73 9.66 2.78 7.40 474 B Data 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Mean 0.87 −0.09 3.24 1.03 3.58 1.41 −0.65 1.52 0.57 2.18 1.10 0.15 −1.27 2.81 2.62 0.11 0.61 −2.19 1.55 0.04 3.10 −0.29 2.28 2.57 −2.19 −0.04 1.14 3.36 2.29 4.78 2.10 4.67 1.75 −0.11 3.04 2.71 2.96 2.65 2.69 1.26 5.19 3.54 2.25 1.14 −0.42 2.42 0.50 2.00 2.62 3.39 5.78 0.71 2.44 2.54 2.58 3.08 3.52 3.88 3.83 3.70 2.40 2.74 1.90 4.78 1.72 2.69 0.71 6.33 4.86 1.56 1.35 1.54 1.11 2.60 3.92 1.60 4.91 5.12 1.56 1.79 2.99 1.73 3.35 4.84 2.81 5.19 3.77 3.53 2.63 2.41 3.34 2.96 3.50 2.68 5.93 5.80 3.92 0.53 1.16 2.18 2.61 3.91 1.86 3.89 4.98 2.31 2.64 3.47 2.14 2.15 4.10 2.45 4.32 2.66 1.29 2.48 1.90 3.32 2.11 2.26 0.85 5.06 4.21 1.96 0.91 0.02 1.82 1.42 3.24 1.45 3.62 4.61 0.60 1.71 2.53 B.18 Athletic Records Data B.18 475 Athletic Records Data This data set provides data on athletic records for 55 countries Country 100m (s) 200m (s) 400m (s) 800m (s) 1500m (min) 5000m (min) 10000m (min) Marathon (min) Argentina Australia Austria Belgium Bermuda Brazil Burma Canada Chile China Columbia Cook Is Costa Rica Czech Denmark Dom Rep Finland France GDR FRG GB Greece Guatemala Hungary India Indonesia Ireland Israel Italy Japan Kenya Korea 10.39 10.31 10.44 10.34 10.28 10.22 10.64 10.17 10.34 10.51 10.43 12.18 10.94 10.35 10.56 10.14 10.43 10.11 10.12 10.16 10.11 10.22 10.98 10.26 10.60 10.59 10.61 10.71 10.01 10.34 10.46 10.34 20.81 20.06 20.81 20.68 20.58 20.43 21.52 20.22 20.80 21.04 21.05 23.20 21.90 20.65 20.52 20.65 20.69 20.38 20.33 20.37 20.21 20.71 21.82 20.62 21.42 21.49 20.96 21.00 19.72 20.81 20.66 20.89 46.84 44.84 46.82 45.04 45.91 45.21 48.30 45.68 46.20 47.30 46.10 52.94 48.66 45.64 45.89 46.80 45.49 45.28 44.87 44.50 44.93 46.56 48.40 46.02 45.73 47.80 46.30 47.80 45.26 45.86 44.92 46.90 1.81 1.74 1.79 1.73 1.80 1.73 1.80 1.76 1.79 1.81 1.82 2.02 1.87 1.76 1.78 1.82 1.74 1.73 1.73 1.73 1.70 1.78 1.89 1.77 1.76 1.84 1.79 1.77 1.73 1.79 1.73 1.79 3.70 3.57 3.60 3.60 3.75 3.66 3.85 3.63 3.71 3.73 3.74 4.24 3.84 3.58 3.61 3.82 3.61 3.57 3.56 3.53 3.51 3.64 3.80 3.62 3.73 3.92 3.56 3.72 3.60 3.64 3.55 3.77 14.04 13.28 13.26 13.22 14.68 13.62 14.45 13.55 13.61 13.90 13.49 16.70 14.03 13.42 13.50 14.91 13.27 13.34 13.17 13.21 13.01 14.59 14.16 13.49 13.77 14.73 13.32 13.66 13.23 13.41 13.10 13.96 29.36 27.66 27.72 27.45 30.55 28.62 30.28 28.09 29.30 29.13 27.88 35.38 28.81 28.19 28.11 31.45 27.52 27.97 27.42 27.61 27.51 28.45 30.11 28.44 28.81 30.79 27.81 28.93 27.52 27.72 27.80 29.23 137.71 128.30 135.90 129.95 146.61 133.13 139.95 130.15 134.03 133.53 131.35 164.70 136.58 134.32 130.78 154.12 130.87 132.30 129.92 132.23 129.13 134.60 139.33 132.58 131.98 148.83 132.35 137.55 131.08 128.63 129.75 136.25 476 B Data P Korea Luxemburg Malaysia Mauritius Mexico Netherlands NZ Norway Png Philippines Poland Portugal Rumania Singapore Spain Sweden Switzerland Tapei Thailand Turkey USA USSR W Samoa 10.91 10.35 10.40 11.19 10.42 10.52 10.51 10.55 10.96 10.78 10.16 10.53 10.41 10.38 10.42 10.25 10.37 10.59 10.39 10.71 9.93 10.07 10.82 21.94 20.77 20.92 33.45 21.30 29.95 20.88 21.16 21.78 21.64 20.24 21.17 20.98 21.28 20.77 20.61 20.45 21.29 21.09 21.43 19.75 20.00 21.86 47.30 47.40 46.30 47.70 46.10 45.10 46.10 46.71 47.90 46.24 45.36 46.70 45.87 47.40 45.98 45.63 45.78 46.80 47.91 47.60 43.86 44.60 49.00 1.85 1.82 1.82 1.88 1.80 1.74 1.74 1.76 1.90 1.81 1.76 1.79 1.76 1.88 1.76 1.77 1.78 1.79 1.83 1.79 1.73 1.75 2.02 3.77 3.67 3.80 3.83 3.65 3.62 3.54 3.62 4.01 3.83 3.60 3.62 3.64 3.89 3.55 3.61 3.55 3.77 3.84 3.67 3.53 3.59 4.24 14.13 13.64 14.64 15.06 13.46 13.36 13.21 13.34 14.72 14.74 13.29 13.13 13.25 15.11 13.31 13.29 13.22 14.07 15.23 13.56 13.20 13.20 16.28 29.67 29.08 31.01 31.77 27.95 27.61 27.70 27.69 31.36 30.64 27.89 27.38 27.67 31.32 27.73 27.94 27.91 30.07 32.56 28.58 27.43 27.53 34.71 130.87 141.27 154.10 152.23 129.20 129.02 128.98 131.48 148.22 145.27 131.58 128.65 132.50 157.77 131.57 130.63 131.20 139.27 149.90 131.50 128.22 130.55 161.83 B.19 Unemployment Data B.19 477 Unemployment Data This data set provides unemployment rates in all federal states of Germany in September 1999 No 10 11 12 13 14 15 16 Federal state Unemployment rate Schleswig-Holstein 8.7 Hamburg 9.8 Mecklenburg-Vorpommern 17.3 Niedersachsen 9.8 Bremen 13.9 Nordrhein-Westfalen 9.8 Hessen 7.9 Rheinland-Pfalz 7.7 Saarland 10.4 Baden-Wă urttemberg 6.2 Bayern 5.8 Berlin 15.8 Brandenburg 17.1 Sachsen-Anhalt 19.9 Thă uringen 15.1 Sachsen 16.8 478 B.20 B Data Annual Population Data The data shows yearly average population rates for the old federal states (given in 1000 inhabitants) Year Inhabitants Unemployed 1960 55433 271 1961 56158 181 1962 56837 155 1963 57389 186 1964 57971 169 1965 58619 147 1966 59148 161 1967 59268 459 1968 59500 323 1969 60067 179 1970 60651 149 1971 61302 185 1972 61672 246 1973 61976 273 1974 62054 582 1975 61829 1074 1976 61531 1060 1977 61400 1030 1978 61327 993 1979 61359 876 1980 61566 889 1981 61682 1272 1982 61638 1833 1983 61423 2258 1984 61175 2266 1985 61024 2304 1986 61066 2228 1987 61077 2229 1988 61449 2242 1989 62063 2038 1990 63254 1883 1991 64074 1689 1992 64865 1808 1993 65535 2270 1994 65858 2556 1995 66156 2565 1996 66444 2796 1997 66648 3021 Bibliography Andrews, D (1972) Plots of high-dimensional data, Biometrics 28: 125–136 Backhaus, K., Erichson, B., Plinke, W and Weiber, R (1996) Multivariate Analysemethoden, Springer, Berlin Bartlett, M S (1939) A note on tests of significance in multivariate analysis, Proceedings of the Cambridge Philosophical Society 35: 180–185 Bartlett, M S (1954) A note on multiplying factors for various chi-squared approximations, JRSSB 16: 296–298 Belsley, D A., Kuh, E and Welsch, R E (1980) Regression Diagnostics, Wiley Berndt, E (1990) The Practice of Econometics: Classic and Contemporary, AddisonWesley, Massacusetts Bock, R D (1975) Multivariate Statistical Methods In Behavioral Research, Mc Graw-Hill, New York Bouroche, J.-M.and Saporta, G (1980) L’analyse des données, Presses Universitaires de France, Paris Breiman, L (1973) Statistics: With a view towards application, Houghton Mifflin Company, Boston Chambers, J M., Cleveland, W S., Kleiner, B and Tukey, P A (1983) Graphical Methods for Data Analysis, Duxbury Press, Boston Chernoff, H (1973) Using faces to represent points in k-dimensional space graphically, Journal of the American Statistical Association 68: 361–368 Cook, R D and Weisberg, S (1991) Comment on “sliced inverse regression for dimension reduction”, Journal of the American Statistical Association 86(414): 328–332 Dillon, W R and Goldstein, M (1984) Multivariate Analysis, John Wiley & Sons, New York Duan, N and Li, K.-C (1991) Slicing regression: A link-free regression method, Annals of Statistics 19(2): 505–530 Embrechts, P., McNeil, A and Straumann, D (1999) Correlation and dependence in risk management: Properties and pitfalls Preprint ETH Ză urich 480 Bibliography Fahrmeir, L and Hamerle, A (1984) Multivariate Statistische Verfahren, De Gruyter, Berlin Fang, K T., Kotz, S and Ng, K W (1990) Symmetric Multivariate and Related Distributions, Chapman and Hall, London Feller, W (1966) An Introduction to Probability Theory and Its Application, Vol 2, Wiley & Sons, New York Fengler, M R., Hăardle, W and Villa, C (2001) The dynamics of implied volatilities: A common principal components approach, Discussion Paper 38, SFB 373, HumboldtUniversităat zu Berlin Flury, B (1988) Common Principle Components Analysis and Related Multivariate Models, John Wiley and Sons, New York Flury, B and Gautschi, W (1986) An algorithm for simultaneous orthogonal transformation of several positive definite symmetric matrices to nearly diagonal form, SIAM Journal on Scientific and Statistical Computing 7: 169–184 Flury, B and Riedwyl, H (1988) Multivariate Statistics, A practical Approach, Cambridge University Press Franke, J., Hăardle, W and Hafner, C (2001) Einfă uhrung in die Statistik der Finanzmăarkte, Springer, Heidelberg Friedman, J H and Stuetzle, W (1981) Projection pursuit classification unpublished manuscript Friedman, J H and Tukey, J W (1974) A projection pursuit algorithm for exploratory data analysis, IEEE Transactions on Computers C 23: 881–890 Gibbins, R (1985) Canonical Analysis A Review with Application in Ecology, SpringerVerlag, Berlin Hall, P (1992) The Bootstrap and Edgeworth Expansion, Statistical Series, Springer, New York Hall, P and Li, K.-C (1993) On almost linearity of low dimensional projections from high dimensional data, Annals of Statistics 21(2): 867889 Hăardle, W (1991) Smoothing Techniques, With Implementations in S, Springer, New York Hăardle, W., Kleinow, T and Stahl, G (2002) Applied Quantitative Finance, Springer, Heidelberg Hăardle, W., Mă uller, M., Sperlich, S and Werwatz, A (2003) Non- and Semiparametric Models, Springer, Heidelberg Hăardle, W and Scott, D (1992) Smoothing by weighted averaging of rounded points, Computational Statistics 7: 97–128 Harrison, D and Rubinfeld, D L (1978) Hedonic prices and the demand for clean air, J Environ Economics & Management 5: 81–102 Bibliography 481 Hoaglin, W., Mosteller, F and Tukey, J (1983) Understanding Robust and Exploratory Data Analysis, Whiley, New York Hodges, J L and Lehman, E L (1956) The efficiency of some non–parametric competitors of the t-test, Annals of Mathematical Statistics 27: 324–335 Hotelling, H (1935) The most predictable criterion, Journal of Educational Psychology 26: 139–142 Hotelling, H (1953) New light on the correlation coefficient and its transform, Journal of the Royal Statistical Society, Series B 15: 193–232 Huber, P (1985) Projection pursuit, Annals of Statistics 13(2): 435–475 Johnson, R A and Wichern, D W (1998) Applied Multivariate Analysis, 4th ed., Prentice Hall, Englewood Cliffs, New Jersey Jones, M C and Sibson, R (1987) What is projection pursuit? (with discussion), Journal of the Royal Statistical Society, Series A 150(1): 1–36 Kaiser, H F (1985) The varimax criterion for analytic rotation in factor analysis, Psychometrika 23: 187–200 Kendall, K and Stuart, S (1977) Distribution Theory, Vol of The advanced Theory of Statistics, Griffin, London Klinke, S and Polzehl (1995) Implementation of kernel based indices in XGobi, Discussion paper 47, SFB 373, Humboldt-University of Berlin Kăotter, T (1996) Entwicklung statistischer Software, PhD thesis, Institut fă ur Statistik und ă Okonometrie, HU Berlin Kruskal, J B (1965) Analysis of factorial experiments by estimating a monotone transformation of data., Journal of the Royal Statistical Society, Series B 27: 251–263 Kruskal, J B (1969) Toward a practical method which helps uncover the structure of a set of observations by finding the line tranformation which optimizes a new “index of condensation”, in R C Milton and J A Nelder (eds), Statistical Computation, Academic Press, New York, pp 427–440 Kruskal, J B (1972) Linear transformation of multivariate data to reveal clustering, in R N Shepard, A K Romney and S B Nerlove (eds), Multidimensional scaling: Theory and Applications in the Behavioural Sciences, Vol 1, Seminar Press, London, pp 179–191 Lachenbruch, P A and Mickey, M R (1968) Estimation of error rates in discriminant analysis, Technometrics 10: 1–11 Li, K.-C (1991) Sliced inverse regression for dimension reduction (with discussion), Journal of the American Statistical Association 86(414): 316–342 Li, K.-C (1992) On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma, Journal of the American Statistical Association 87: 1025–1039 482 Bibliography Mardia, K V., Kent, J T and Bibby, J M (1979) Multivariate Analysis, Academic Press, Duluth, London Morrison, D F (1990) Multivariate statistical methods, McGraw–Hill, New-York Muirhead, R J (1982) Aspects of Multivariate Statistics, John Wiley and Sons, New York Nelsen, R B (1999) An Introduction to Copulas, Springer, New York Olkin, I and Veath, M (1980) Maximum likelihood estimation in a two-way analysis with correlated errors in one classification, Biometrika 68: 653–660 Parzen, E (1962) On estimating of a probability density and mode, Annals of Mathematical Statistics 35: 1065–1076 Rosenblatt, M (1956) Remarks on some nonparametric estimates of a density function, Annals of Mathematical Statistics 27: 832–837 Schott, J R (1994) Determining the dimensionality in sliced inverse regression, Journal of the American Statistical Association 89(425): 141–148 Scott, D (1985) Averaged shifted histograms: Effective nonparametric density estimation in several dimensions, Annals of Statistics 13: 1024–1040 Silverman, B W (1986) Density Estimation for Statistics and Data Analysis, Vol 26 of Monographs on Statistics and Applied Probability, Chapman and Hall, London Sklar, A (1959) Fonctions de répartition à n dimensions et leurs marges, Publ Inst Statist Univ Paris pp 229–231 Tufte, E (1983) The Visual Display of Quantitative Information, Graphics Press Volle, V (1985) Analyse des Données, Economica, Paris Whittle, P (1958) On the smoothing of probability density functions, Journal of the Royal Statistical Society, Series B 55: 549–557 Index admissible, 329 agglomerative techniques, 308 allocation rules, 323 Andrews’ curves, 39 angle between two vectors, 75 ANOVA, 103 ANOVA – simple analysis of variace, 103 Bayes discriminant rule, 328 Bernoulli distribution, 143 Bernoulli distributions, 143 best line, 221 binary structure, 303 Biplots, 356 bootstrap, 148 bootstrap sample, 150 Boston housing, 44, 112, 209, 259, 293, 316, 337 boxplot, 15 construction, 17 canonical correlation, 361 canonical correlation analysis, 361 canonical correlation coefficient, 363 canonical correlation variable, 363 canonical correlation vector, 363 centering matrix, 93 central limit theorem (CLT), 143, 145 centroid, 312 characteristic functions, 125, 131 classic blue pullovers, 84 cluster algorithms, 308 cluster analysis, 301 Cochran theorem, 163 coefficient of determination, 98, 109 corrected, 109 column space, 77, 221 common factors, 277 common principal components, 256 communality, 278 complete linkage, 311 computationally intensive techniques, 421 concentration ellipsoid, 138 conditional approximations, 160 conditional covariance, 433 conditional density, 121 conditional distribution, 157 conditional expectation, 127, 432, 433 conditional pdf, 120 confidence interval, 145 confussion matrix, 332 conjoint measurement analysis, 393 contingency table, 341 contrast, 195 convex hull, 423 copula, 122 correlation, 86 multiple, 160 correspondence analysis, 341 covariance, 82 covariance matrix decomposition, 234 properties, 126 CPCA, 256 Cramer-Rao, 179 Cramer-Rao-lower bound, 178 Cramer-Wold, 132 cumulant, 133 cumulative distribution function (cdf), 120 curse of dimensionality, 431 data depth, 423 data sets XFGvolsurf01, 257 XFGvolsurf02, 257 XFGvolsurf03, 257 degrees of freedom, 105 dendrogram, 310 density estimates, 22 density functions, 120 derivatives, 68 determinant, 59 diagonal matrix, 59 Dice, 304 discriminant analysis, 323 discriminant rule, 324 discrimination rules in practice, 331 dissimilarity of cars, 376 distance d, 71 484 Euclidean, 71 iso-distance curves, 71 distance matrix, 379 distance measures, 305 distribution, 120 draftman’s plot, 32 duality relations, 227 duality theorem, 382 effective dimension reduction directions, 431, 433 effective dimension reduction space, 431 efficient portfolio, 408 eigenvalues, 61 eigenvectors, 61 elliptical distribution, 167 elliptically symmetric distribution, 431 estimation, 173 existence of a riskless asset, 412 expected cost of misclassification, 325 explained variation, 98 exploratory projection pursuit, 425 extremes, 17 F-spread, 16 f-test, 106 faces, 34 factor analysis, 275 factor analysis model, 275 factor model, 282 factor score, 291 factor scores, 291 factorial axis, 223 factorial method, 250 factorial representation, 229, 231 factorial variable, 223, 230 factors, 221 Farthest Neighbor, 311 Fisher information, 180 Fisher information matrix, 178, 179 Fisher’s linear discrimination function, 333 five-number summary, 15 flury faces, 35 fourths, 15 French food expenditure, 253 full model, 105 G-inverse, 60 non-uniqueness, 64 general multinormal distribution, 165 gradient, 68 group-building algorithm, 302 Hessian, 68 Index hierarchical algorithm, 308 histograms, 22 Hotelling T -distribution, 165 idempotent matrix, 59 identity matrix, 59 independence copula, 123 independent, 87, 121 inertia, 229, 231 information matrix, 179 interpretation of the factors, 278 interpretation of the principal components, 241 invariance of scale, 279 inverse, 60 inverse regression, 431, 433 Jaccard, 304 Jacobian, 135 Jordan decomposition, 63, 64 kernel densities, 25 kernel estimator, 25 Kulczynski, 304 likelihood function, 174 likelihood ratio test, 184 limit theorems, 142 linear discriminant analysis, 327 linear regression, 95 linear transformation, 94 link function, 431 loadings, 277, 278 non-uniqueness, 280 log-likelihood function, 174 Mahalanobis distance, 327 Mahalanobis transformation, 95, 137, 138 marginal densities, 121 marketing strategies, 104 maximum likelihood discriminant rule, 324 maximum likelihood estimator, 174 MDS direction, 376 mean-variance, 407, 408 median, 15, 422 metric methods, 377 moments, 125 multidimentional scaling, 373 multinormal, 139, 155 multinormal distribution, 137 multivariate distributions, 119 multivariate median, 423 multivariate t-distribution, 168 Index Nearest Neighbor, 311 non-metric solution, 400 Nonexistence of a riskless asset, 410 nonhomogeneous, 94 nonmetric methods of MDS, 377 norm of a vector, 74 normal distribution, 175 normalized principal components (NPCs), 249 null space, 77 order statistics, 15 orthogonal complement, 78 orthogonal matrix, 59 orthonormed, 223 outliers, 13 outside bars, 16 parallel coordinates plots, 42 parallel profiles, 205 partitioned covariance matrix, 156 partitioned matrices, 68 PAV algorithm, 384, 405 pool-adjacent violators algorithm, 384, 405 portfolio analysis, 407 portfolio choice, 407 positive definite, 65 positive definiteness, 67 positive or negative dependence, 34 positive semidefinite, 65, 93 principal axes, 73 principal component method, 286 principal components, 237 principal components analysis (PCA), 233, 432, 435 principal components in practice, 238 principal components technique, 238 principal components transformation, 234, 237 principal factors, 285 profile analysis, 205 profile method, 396 projection matrix, 77 projection pursuit, 425 projection pursuit regression, 428 projection vector, 431 proximity between objects, 302 proximity measure, 302 quadratic discriminant analysis, 330 quadratic form, 65 quadratic forms, 65 quality of the representations, 252 485 randomized discriminant rule, 329 rank, 58 reduced model, 105 rotation, 289 rotations, 76 row space, 221 Russel and Rao (RR), 304 sampling distributions, 142 scatterplot matrix, 31 scatterplots, 30 separation line, 31 similarity of objects, 303 Simple Matching, 304 single linkage, 311 single matching, 305 singular normal distribution, 140 singular value decomposition (SVD), 64, 228 sliced inverse regression, 431, 435 algorithm, 432 sliced inverse regression II, 433, 434, 436, 437 algorithm, 434 solution nonmetric, 403 specific factors, 277 specific variance, 278 spectral decompositions, 63 spherical distribution, 167 standardized linear combinations (SLC), 234 statistics, 142 stimulus, 395 Student’s t-distribution, 96 sum of squares, 105 summary statistics, 92 Swiss bank data, 14 symmetric matrix, 59 t-test, 96 Tanimoto, 304 testing, 183 The CAPM, 417 total variation, 98 trace, 58 trade-off analysis, 396 transformations, 135 transpose, 60 two factor method, 396 unbiased estimator, 179 uncorrelated factors, 277 unexplained variation, 98 486 unit vector, 74 upper triangular matrix, 59 variance explained by PCs, 247 varimax criterion, 290 varimax method, 289 varimax rotation method, 289 Ward clustering, 312 Wishart distribution, 162, 164 XFGvolsurf01 data, 257 XFGvolsurf02 data, 257 XFGvolsurf03 data, 257 Index ... Dallas New Orleans Pop (1 0,000) Order Statistics 778 x(15) 355 x(14) 248 x(13) 200 x(12) 167 x(11) 94 x(10) 94 x(9) 88 x(8) 76 x(7) 75 x(6) 74 x(5) 74 x(4) 70 x(3) 68 x(2) 63 x(1) Table 1.1 The 15... shifts (upper left), shifts (lower left), shifts (upper right), and 16 shifts (lower right) MVAashbank.xpl K(•) K(u) = 12 I(|u| ≤ 1) K(u) = (1 − |u|)I(|u| ≤ 1) K(u) = 34 (1 − u2 )I(|u| ≤ 1) K(u)... quantitative studies of applied sciences is multivariate This book on Applied Multivariate Statistical Analysis presents the tools and concepts of multivariate data analysis with a strong focus on applications

Định dạng
Số trang	488
Dung lượng	4,75 MB