Applied statistics and multivariate data analysis for business and economics

Thomas Cleff Applied Statistics and Multivariate Data Analysis for Business and Economics A Modern Approach Using SPSS, Stata, and Excel Applied Statistics and Multivariate Data Analysis for Business and Economics Thomas Cleff Applied Statistics and Multivariate Data Analysis for Business and Economics A Modern Approach Using SPSS, Stata, and Excel Thomas Cleff Pforzheim Business School Pforzheim University of Applied Sciences Pforzheim, Baden-Württemberg, Germany ISBN 978-3-030-17766-9 ISBN 978-3-030-17767-6 https://doi.org/10.1007/978-3-030-17767-6 (eBook) # Springer Nature Switzerland AG 2014, 2019 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Preface This textbook, Applied Statistics and Multivariate Data Analysis in Business and Economics: A Modern Approach Using SPSS, Stata, and Excel, aims to familiarize students of business and economics and all other students of social sciences and humanities as well as practitioners in firms with the basic principles, techniques, and applications of applied statistics and applied data analysis Drawing on practical examples from business settings, it demonstrates the techniques of statistical testing and univariate, bivariate, and multivariate statistical analyses The textbook covers a range of subject matter, from scaling, sampling, and data preparation to advanced analytic procedures for assessing multivariate relationships Techniques covered include univariate analyses (e.g measures of central tendencies, frequency tables, univariate charts, dispersion parameters), bivariate analyses (e.g contingency tables, correlation), parametric and nonparametric tests (e.g t-tests, Wilcoxon signed-rank test, U test, H test), and multivariate analyses (e.g analysis of variance, regression, cluster analysis, and factor analysis) In addition, the book covers issues such as time series and indices, classical measurement theory, point estimation, and interval estimation Each chapter concludes with a set of exercises In this way, it addresses all of the topics typically covered in university courses on statistics and advanced applied data analysis In writing this book, I have consistently endeavoured to provide readers with an understanding of the thinking processes underlying complex methods of data analysis I believe this approach will be particularly valuable to those who might otherwise have difficulty with the formal method of presentation used by many other textbooks in statistics In numerous instances, I have tried to avoid unnecessary formulas, attempting instead to provide the reader with an intuitive grasp of a concept before deriving or introducing the associated mathematics Nevertheless, a book about statistics and data analysis that omits formulas would be neither possible nor desirable Whenever ordinary language reaches its limits, the mathematical formula has always been the best tool to express meaning To provide further depth, I have included practice problems and solutions at the end of each chapter, which are intended to make it easier for students to pursue effective self-study The broad availability of computers now makes it possible to learn and to teach statistics in new ways Indeed, students now have access to a range of powerful computer applications, from Excel to various professional statistics programs v vi Preface Accordingly, this textbook does not confine itself to presenting statistical methods, but also addresses the use of programs such as Excel, SPSS, and Stata To aid the learning process, datasets have been made available at springer.com, along with other supplemental materials, allowing all of the examples and practice problems to be recalculated and reviewed I want to take this opportunity to thank all those who have collaborated in making this book possible Well-deserved gratitude for their critical review of the manuscript and valuable suggestions goes to Uli Föhl, Wolfgang Gohout, Bernd Kuppinger, Bettina Müller, Bettina Peters, Wolfgang Schäfer ({), Lucais Sewell, and Kirsten Wüst, as well as many other unnamed individuals Any errors or shortcomings that remain are entirely my own Finally, this book could not have been possible without the ongoing support of my family They deserve my very special gratitude Please not hesitate to contact me directly with feedback or any suggestions you may have for improvements (thomas.cleff@hs-pforzheim.de) Pforzheim, Germany May 2019 Thomas Cleff Contents Statistics and Empirical Research 1.1 Do Statistics Lie? 1.2 Different Types of Statistics 1.3 The Generation of Knowledge Through Statistics 1.4 The Phases of Empirical Research 1.4.1 From Exploration to Theory 1.4.2 From Theories to Models 1.4.3 From Models to Business Intelligence References 1 13 14 From Disarray to Dataset 2.1 Data Collection 2.2 Level of Measurement 2.3 Scaling and Coding 2.4 Missing Values 2.5 Outliers and Obviously Incorrect Values 2.6 Chapter Exercises 2.7 Exercise Solutions References 15 15 17 20 22 24 24 25 25 Univariate Data Analysis 3.1 First Steps in Data Analysis 3.2 Measures of Central Tendency 3.2.1 Mode or Modal Value 3.2.2 Mean 3.2.3 Geometric Mean 3.2.4 Harmonic Mean 3.2.5 The Median 3.2.6 Quartile and Percentile 3.3 The Boxplot: A First Look at Distributions 3.4 Dispersion Parameters 3.4.1 Standard Deviation and Variance 3.4.2 The Coefficient of Variation 3.5 Skewness and Kurtosis 3.6 Robustness of Parameters 27 27 33 34 34 39 40 43 45 47 49 50 53 54 56 vii viii Contents 3.7 3.8 Measures of Concentration Using the Computer to Calculate Univariate Parameters 3.8.1 Calculating Univariate Parameters with SPSS 3.8.2 Calculating Univariate Parameters with Stata 3.8.3 Calculating Univariate Parameters with Excel 3.9 Chapter Exercises 3.10 Exercise Solutions References 57 60 60 61 62 63 66 70 Bivariate Association 4.1 Bivariate Scale Combinations 4.2 Association Between Two Nominal Variables 4.2.1 Contingency Tables 4.2.2 Chi-Square Calculations 4.2.3 The Phi Coefficient 4.2.4 The Contingency Coefficient 4.2.5 Cramer’s V 4.2.6 Nominal Associations with SPSS 4.2.7 Nominal Associations with Stata 4.2.8 Nominal Associations with Excel 4.3 Association Between Two Metric Variables 4.3.1 The Scatterplot 4.3.2 The Bravais–Pearson Correlation Coefficient 4.4 Relationships Between Ordinal Variables 4.4.1 Spearman’s Rank Correlation Coefficient (Spearman’s Rho) 4.4.2 Kendall’s Tau (τ) 4.5 Measuring the Association Between Two Variables with Different Scales 4.5.1 Measuring the Association Between Nominal and Metric Variables 4.5.2 Measuring the Association Between Nominal and Ordinal Variables 4.5.3 Association Between Ordinal and Metric Variables 4.6 Calculating Correlation with a Computer 4.6.1 Calculating Correlation with SPSS 4.6.2 Calculating Correlation with Stata 4.6.3 Calculating Correlation with Excel 4.7 Spurious Correlations 4.7.1 Partial Correlation 4.7.2 Partial Correlations with SPSS 4.7.3 Partial Correlations with Stata 4.7.4 Partial Correlation with Excel 4.8 Chapter Exercises 71 71 71 71 73 77 79 81 82 83 86 87 87 90 94 95 100 105 106 108 108 110 110 110 112 114 115 117 117 119 119 Contents ix 4.9 Exercise Solutions 125 References 129 Classical Measurement Theory 5.1 Sources of Sampling Errors 5.2 Sources of Nonsampling Errors References 131 132 135 137 Calculating Probability 6.1 Key Terms for Calculating Probability 6.2 Probability Definitions 6.3 Foundations of Probability Calculus 6.3.1 Probability Tree 6.3.2 Combinatorics 6.3.3 The Inclusion–Exclusion Principle for Disjoint Events 6.3.4 Inclusion–Exclusion Principle for Nondisjoint Events 6.3.5 Conditional Probability 6.3.6 Independent Events and Law of Multiplication 6.3.7 Law of Total Probability 6.3.8 Bayes’ Theorem 6.3.9 Postscript: The Monty Hall Problem 6.4 Chapter Exercises 6.5 Exercise Solutions References 139 140 141 145 145 146 150 152 153 154 154 155 157 159 163 169 Random Variables and Probability Distributions 7.1 Discrete Distributions 7.1.1 Binomial Distribution 7.1.1.1 Calculating Binomial Distributions Using Excel 7.1.1.2 Calculating Binomial Distributions Using Stata 7.1.2 Hypergeometric Distribution 7.1.2.1 Calculating Hypergeometric Distributions Using Excel 7.1.2.2 Calculating the Hypergeometric Distribution Using Stata 7.1.3 The Poisson Distribution 7.1.3.1 Calculating the Poisson Distribution Using Excel 7.1.3.2 Calculating the Poisson Distribution Using Stata 7.2 Continuous Distributions 7.2.1 The Continuous Uniform Distribution 171 173 173 176 176 177 181 181 182 184 184 185 187 x Contents 7.2.2 The Normal Distribution 7.2.2.1 Calculating the Normal Distribution Using Excel 7.2.2.2 Calculating the Normal Distribution Using Stata 7.3 Important Distributions for Testing 7.3.1 The Chi-Squared Distribution 7.3.1.1 Calculating the Chi-Squared Distribution Using Excel 7.3.1.2 Calculating the Chi-Squared Distribution Using Stata 7.3.2 The t-Distribution 7.3.2.1 Calculating the t-Distribution Using Excel 7.3.2.2 Calculating the t-Distribution Using Stata 7.3.3 The F-Distribution 7.3.3.1 Calculating the F-Distribution Using Excel 7.3.3.2 Calculating the F-Distribution Using Stata 7.4 Chapter Exercises 7.5 Exercise Solutions References Parameter Estimation 8.1 Point Estimation 8.2 Interval Estimation 8.2.1 The Confidence Interval for the Mean of a Population (μ) 8.2.2 Planning the Sample Size for Mean Estimation 8.2.3 Confidence Intervals for Proportions 8.2.4 Planning Sample Sizes for Proportions 8.2.5 The Confidence Interval for Variances 8.2.6 Calculating Confidence Intervals with the Computer 8.2.6.1 Calculating Confidence Intervals with Excel 8.2.6.2 Calculating Confidence Intervals with SPSS 8.2.6.3 Calculating Confidence Intervals with Stata 8.3 Chapter Exercises 8.4 Exercise Solutions References 190 197 198 199 199 201 201 202 204 205 205 206 208 208 212 222 223 223 230 230 236 239 240 241 243 243 245 247 250 252 256 456 List of Formulas Combinatorics Permutation of N elements without repetition: PNN ¼ N! Permutation of N elements with repetition (k different groups of elements exist): PnN1 ; :nk ¼ k X N! , with ni ¼ N n1 !n2 ! Á Á nk ! i¼1 Combination without repetition (order does not matter): C nN N n ẳ N! n!N nị! ẳ Combination with repetition (order does not matter): C~ nN ¼ N ỵn1 n Variation without repetition (order matters): V nN N ¼ n! n ¼ N! ðN nị! Variation with repetition (order matters): V~ nN ẳ N n Calculating Probabilities Addition principle of nondisjoint events: X m m P [ Ai ẳ PAi ị i¼1 i¼1 Complementary events: A \ B ¼ fg; PðA \ Bị ẳ List of Formulas 457 Inclusionexclusion principle of nondisjoint events: PA [ Bị ẳ PAị ỵ PBị À ðA \ BÞ Law of multiplication with independent events: PA \ Bị ẳ PAị PBị Law of multiplication: PA \ Bị ẳ PAjBị PBị Law of total probability: PAị ẳ PAjBị PBị ỵ PAjC ị PCị ỵ ỵ PAjZ ị PZ Þ Bayes’ theorem: PðAjBÞ ¼ PðBjAÞ Á PðAÞ=PðBÞ Discrete Distributions Binomial distribution: n! n Bn; k; pị ẳ pk pịnk ẳ pk pịnk k k!n k ị! ã Expected value of a random variable in a binomial distribution: E(X) ¼ n Á p • Variance of a random variable in a binomial distribution: Var(X) ¼ n Á p Á (1 À p) • Random variables in binomial distributions approximately follow a pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi – normal distribution (N(n Á p; n Á pð1 À pÞ)), if n Á p Á (1 À p) > – Poisson distribution (Po(n Á p)), if n Á p 10 and n ! 1500 Á p Hypergeometric distribution: H N; M; n; xị ẳ NÀM M Á nÀx x N n • Expected value of a random variable in a hypergeometric distribution: E (X) ẳ n M/N ã Variance of À a random Á NÀn variable in a hypergeometric distribution: M M VarX ị ẳ n N N Á NÀ1 458 List of Formulas • Random variables in a hypergeometric distribution (H(N; M; n)) approximately follow – 0a normal distribution 11 ffi C B B M rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nð N À nÞ M M C B B CC Á Á 1À BN B n ; CC, @ @|{z} N NÀ1 N N AA |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} E ð xÞ σ if n > 30 and 0.1 < M 0.9, N < 0 11 B B M CC B CC – a Poisson distribution B @Po@ n N AA, |{z} E ðxÞ M if 0.1 ! M n > 30 and Nn < 0.05, N or N ! 0.9 and À À M ÁÁ – a binomial distribution B n; N , n if n > 10 and 0.1 < M N < 0.9 and N < 0.05 Poisson distribution: PX ị ẳ x e x! • Expected value of a random variable in a Poisson distribution: E(X) ẳ ẳ ã Variance of a random variable in a Poisson distribution: Var(X) ¼ λ • The Poisson distribution with λ ¼ n Á p is derived from the binomial distribution, which itself can be approximated using the continuous normal distribution When λ ! 10, therefore, the Poisson distribution can also be approximated using the pffiffiffi continuous normal distribution with (N(μ ¼ μ;σ ¼ μ)) Continuous Distributions Continuous uniform distribution: ( f xị ẳ f u€ r a < X bÀa sonst b • Expected value of a random variable in a continuous uniform distribution: E X ị ẳ aỵb List of Formulas 459 • Variance of a random variable in a continuous uniform distribution: VarX ị ẳ b aÞ 12 Normal distribution: ðxÀμÞ2 f x ðxÞ ¼ pffiffiffiffiffiffiffiffiffiffi Á eÀ 2Áσ2 σ 2ÁΠ • Expected value of a random variable in a normal distribution: E(X) ¼ μ • Variance of a random variable in a normal distribution: Var(X) ẳ ã Standardized variable (z-transformation): À Á X upper À μ X lower À μ Z P X lower X X upper ¼ P σ σ • Random variables in a normal distribution are reproductive, which means that the merging of two (or more) random variables in a binomial distribution leads to another random variable in a normal distribution (see example for two variables): q N ỵ ; 21 ỵ σ 22 Confidence Interval for the Mean Calculating confidence intervals for means: see Fig 8.7 Length of a two-sided confidence interval for means: Semp Stheor E ¼ Á z1Àα2 Á σ^ x ¼ Á z1Àα2 pffiffiffi ¼ Á z1Àα2 pffiffiffiffiffiffiffiffiffiffiffi n nÀ1 Planning sample size for confidence intervals for means: n¼ 22 Á z21Àα Á S2theor E2 ẳ 22 z21 S2emp E2 ỵ1 460 List of Formulas Confidence Interval for Proportions Calculating confidence intervals for proportions: see Fig 8.10 Length of a two-sided confidence interval for proportions: Semp Stheor E ¼ Á z1Àα2 Á σ^ x ¼ Á z1Àα2 pffiffiffi ¼ Á z1Àα2 pffiffiffiffiffiffiffiffiffiffiffi n nÀ1 Planning sample size for confidence intervals for proportions: E ¼ 22 Á z21Àα Á S2theor ¼ n À Á 22 Á z21Àα Á p Á À p n Confidence Interval for Variances Two-sided: ðn À 1Þ Á S2theor P χ 1À α ; n À 2 ¼P σ 2 n Á S2emp χ 1À α ; n À σ2 One-sided: ! ðn À 1Þ Á S2theor χα; n À 2! n Á S2emp ¼1Àα χα; n À ! n Á S2emp n 1ị S2theor ẳ P χ 1Àα;nÀ1 χ 1Àα;nÀ1 P σ2 σ2 ðn À 1ị S2theor n Semp ẳ 2;n1 2α;nÀ1 ¼ À α or ! ¼1Àα One-Sample t-Test One-sample Z-test/one-sample t-test: see Fig 9.6 For a one-tailed hypothesis test, the p-value is calculated by: p ¼ Á À P t nÀ1 x À μo t critical ¼ σ x For a two-tailed hypothesis, the p-values are calculated by: pleft ¼ P t nÀ1 t critical x À μo ¼ σ x for H : μ < μo List of Formulas pright 461 ¼ À P t nÀ1 t critical x À μo ¼ σ x for H : μ > μo Kruskal–Wallis–Test (H-Test) Hypotheses: À Á À Á À Á À Á À Á H : E R1 ¼ E R2 ¼ Á Á Á ¼ E Rk ;H : E Ri 6¼ E R j , for at least one i 6¼ j Test statistic: H finite corr ¼ H ¼ C NÀ1 N k P Rj Nỵ1 ị iẳ1 m P N À1 12Áni t 3j Àt j $ χ 2kÀ1 j¼1 N ÀN Decision: H has to be rejected if χ 2kÀ1 H finite corr Chi-square Test of Independence Calculation of Chi-Square: χ 2emp ¼ 2 k X m nij À nije X i¼1 j¼1 nije Decision: H ðIndependenceÞ has to be rejected if : χ 2emp > χ 21Àα;ðmÀ1ÞÁðqÀ1Þ Appendices # Springer Nature Switzerland AG 2019 T Cleff, Applied Statistics and Multivariate Data Analysis for Business and Economics, https://doi.org/10.1007/978-3-030-17767-6 463 464 Appendices Appendix A: The Standard Normal Distribution f(x) The table contains abscissa values for z between 0.00 and 3.29 along with the cumulative probabilities  (z)=P(Zz) For negative z values,  (-z)=1 -  (z) Example:  (2.33)=0.9901 -4 z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.9452 0.9554 0.9641 0.9713 0.9772 0.9821 0.9861 0.9893 0.9918 0.9938 0.9953 0.9965 0.9974 0.9981 0.9987 0.9990 0.9993 0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665 0.8869 0.9049 0.9207 0.9345 0.9463 0.9564 0.9649 0.9719 0.9778 0.9826 0.9864 0.9896 0.9920 0.9940 0.9955 0.9966 0.9975 0.9982 0.9987 0.9991 0.9993 0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.9474 0.9573 0.9656 0.9726 0.9783 0.9830 0.9868 0.9898 0.9922 0.9941 0.9956 0.9967 0.9976 0.9982 0.9987 0.9991 0.9994 0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357 0.7673 0.7967 0.8238 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.9484 0.9582 0.9664 0.9732 0.9788 0.9834 0.9871 0.9901 0.9925 0.9943 0.9957 0.9968 0.9977 0.9983 0.9988 0.9991 0.9994 0.5160 0.5557 0.5948 0.6331 0.6700 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.8729 0.8925 0.9099 0.9251 0.9382 0.9495 0.9591 0.9671 0.9738 0.9793 0.9838 0.9875 0.9904 0.9927 0.9945 0.9959 0.9969 0.9977 0.9984 0.9988 0.9992 0.9994 0.5199 0.5596 0.5987 0.6368 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749 0.8944 0.9115 0.9265 0.9394 0.9505 0.9599 0.9678 0.9744 0.9798 0.9842 0.9878 0.9906 0.9929 0.9946 0.9960 0.9970 0.9978 0.9984 0.9989 0.9992 0.9994 0.5239 0.5636 0.6026 0.6406 0.6772 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.9515 0.9608 0.9686 0.9750 0.9803 0.9846 0.9881 0.9909 0.9931 0.9948 0.9961 0.9971 0.9979 0.9985 0.9989 0.9992 0.9994 0.5279 0.5675 0.6064 0.6443 0.6808 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.9525 0.9616 0.9693 0.9756 0.9808 0.9850 0.9884 0.9911 0.9932 0.9949 0.9962 0.9972 0.9979 0.9985 0.9989 0.9992 0.9995 0.5319 0.5714 0.6103 0.6480 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.9535 0.9625 0.9699 0.9761 0.9812 0.9854 0.9887 0.9913 0.9934 0.9951 0.9963 0.9973 0.9980 0.9986 0.9990 0.9993 0.9995 0.5359 0.5753 0.6141 0.6517 0.6879 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441 0.9545 0.9633 0.9706 0.9767 0.9817 0.9857 0.9890 0.9916 0.9936 0.9952 0.9964 0.9974 0.9981 0.9986 0.9990 0.9993 0.9995 N(0;1) 1- -3 -2 -1  z1- Appendices 465 Appendix B: The Chi-Squared Distribution The table contains the quantiles corresponding to the cumulative probability p=( 1-) for a particular number of degrees of freedom p ¼ (1 À α) n 0.001 0.000 0.002 0.024 0.091 0.210 0.381 0.598 0.857 1.152 10 1.479 11 1.834 12 2.214 13 2.617 14 3.041 15 3.483 16 3.942 17 4.416 18 4.905 19 5.407 20 5.921 21 6.447 22 6.983 23 7.529 24 8.085 25 8.649 26 9.222 27 9.803 28 10.391 29 10.986 30 11.588 40 17.916 60 31.738 80 46.520 100 61.918 0.005 0.000 0.010 0.072 0.207 0.412 0.676 0.989 1.344 1.735 2.156 2.603 3.074 3.565 4.075 4.601 5.142 5.697 6.265 6.844 7.434 8.034 8.643 9.260 9.886 10.520 11.160 11.808 12.461 13.121 13.787 20.707 35.534 51.172 67.328 0.025 0.001 0.051 0.216 0.484 0.831 1.237 1.690 2.180 2.700 3.247 3.816 4.404 5.009 5.629 6.262 6.908 7.564 8.231 8.907 9.591 10.283 10.982 11.689 12.401 13.120 13.844 14.573 15.308 16.047 16.791 24.433 40.482 57.153 74.222 1-  21- 0.005 0.000 0.010 0.072 0.207 0.412 0.676 0.989 1.344 1.735 2.156 2.603 3.074 3.565 4.075 4.601 5.142 5.697 6.265 6.844 7.434 8.034 8.643 9.260 9.886 10.520 11.160 11.808 12.461 13.121 13.787 20.707 35.534 51.172 67.328 0.1 0.016 0.211 0.584 1.064 1.610 2.204 2.833 3.490 4.168 4.865 5.578 6.304 7.042 7.790 8.547 9.312 10.085 10.865 11.651 12.443 13.240 14.041 14.848 15.659 16.473 17.292 18.114 18.939 19.768 20.599 29.051 46.459 64.278 82.358 21- 0.5 0.455 1.386 2.366 3.357 4.351 5.348 6.346 7.344 8.343 9.342 10.341 11.340 12.340 13.339 14.339 15.338 16.338 17.338 18.338 19.337 20.337 21.337 22.337 23.337 24.337 25.336 26.336 27.336 28.336 29.336 39.335 59.335 79.334 99.334 466 Appendices The table contains the quantiles corresponding to the cumulative probability p ¼ (1Àα) for a particular number of degrees of freedom Example: χ 21À0:1;20 ¼ χ 290%;20 ¼ 28:412 p ¼ (1 À α) n 0.9 2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.362 14.684 10 15.987 11 17.275 12 18.549 13 19.812 14 21.064 15 22.307 16 23.542 17 24.769 18 25.989 19 27.204 20 28.412 21 29.615 22 30.813 23 32.007 24 33.196 25 34.382 26 35.563 27 36.741 28 37.916 29 39.087 30 40.256 40 51.805 60 74.397 80 96.578 100 118.498 0.95 3.841 5.991 7.815 9.488 11.070 12.592 14.067 15.507 16.919 18.307 19.675 21.026 22.362 23.685 24.996 26.296 27.587 28.869 30.144 31.410 32.671 33.924 35.172 36.415 37.652 38.885 40.113 41.337 42.557 43.773 55.758 79.082 101.879 124.342 0.975 5.024 7.378 9.348 11.143 12.833 14.449 16.013 17.535 19.023 20.483 21.920 23.337 24.736 26.119 27.488 28.845 30.191 31.526 32.852 34.170 35.479 36.781 38.076 39.364 40.646 41.923 43.195 44.461 45.722 46.979 59.342 83.298 106.629 129.561 0.99 6.635 9.210 11.345 13.277 15.086 16.812 18.475 20.090 21.666 23.209 24.725 26.217 27.688 29.141 30.578 32.000 33.409 34.805 36.191 37.566 38.932 40.289 41.638 42.980 44.314 45.642 46.963 48.278 49.588 50.892 63.691 88.379 112.329 135.807 0.995 7.879 10.597 12.838 14.860 16.750 18.548 20.278 21.955 23.589 25.188 26.757 28.300 29.819 31.319 32.801 34.267 35.718 37.156 38.582 39.997 41.401 42.796 44.181 45.559 46.928 48.290 49.645 50.993 52.336 53.672 66.766 91.952 116.321 140.169 0.999 10.828 13.816 16.266 18.467 20.515 22.458 24.322 26.124 27.877 29.588 31.264 32.909 34.528 36.123 37.697 39.252 40.790 42.312 43.820 45.315 46.797 48.268 49.728 51.179 52.620 54.052 55.476 56.892 58.301 59.703 73.402 99.607 124.839 149.449 Appendices 467 Appendix C: The Student’s t-Distribution f(t) For a particular number of degrees of freedom, the table contains t-values (t1-;n ) for the probability p=(1 -) in the lower tail of the t-distribution If n>30, the tdistribution will approximate the normal distribution Example: t1-0.1;30 = t90%;30 =1.310 1-  t1- t n t 1Àα for p ¼ (1–a) in the lower tail of the t-distribution n 0.8 0.9 0.95 0.975 0.99 0.995 0.999 0.9995 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 60 120 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.423 2.390 2.358 2.326 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.660 2.617 2.576 318.309 22.327 10.215 7.173 5.893 5.208 4.785 4.501 4.297 4.144 4.025 3.930 3.852 3.787 3.733 3.686 3.646 3.610 3.579 3.552 3.527 3.505 3.485 3.467 3.450 3.435 3.421 3.408 3.396 3.385 3.307 3.232 3.160 3.090 636.619 31.599 12.924 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 3.819 3.792 3.768 3.745 3.725 3.707 3.690 3.674 3.659 3.646 3.551 3.460 3.373 3.291 1.376 1.061 0.978 0.941 0.920 0.906 0.896 0.889 0.883 0.879 0.876 0.873 0.870 0.868 0.866 0.865 0.863 0.862 0.861 0.860 0.859 0.858 0.858 0.857 0.856 0.856 0.855 0.855 0.854 0.854 0.851 0.848 0.845 0.842 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.296 1.289 1.282 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.684 1.671 1.658 1.645 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.000 1.980 1.960 468 Appendices Appendix D: Critical Values for the Wilcoxon Signed-Rank Test One-tailed test (α¼) Two-tailed test (α ¼) n¼ 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 5% 2.5% 1% 0.5% 10% – – – – 10 13 17 21 25 30 35 41 47 53 60 67 75 83 91 100 5% – – – – 0 10 13 17 21 25 29 34 40 46 52 58 65 73 81 89 2% – – – – – – 12 15 19 23 27 32 37 43 49 55 62 69 76 1% – – – – – – – 12 15 19 23 27 32 37 42 48 54 61 68 n 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 5% 2.5% 1% 0.5% 10% 110 119 130 140 151 163 175 187 200 213 227 241 256 271 286 302 319 336 353 371 389 407 426 446 466 5% 98 107 116 126 137 147 159 170 182 195 208 221 235 249 264 279 294 310 327 343 361 378 396 415 434 2% 84 92 101 110 120 130 140 151 162 173 185 198 211 224 238 252 266 28 296 312 328 345 362 379 397 1% 75 83 91 100 109 118 128 138 148 159 171 182 194 207 220 233 247 261 276 291 307 322 339 355 373 Index A Absolute deviation, 51 Absolute scales, 19 Addition rule for disjoint events, 150 for nondisjoint events, 152 Adjusted R-square, 363, 366–367, 374 Agglomeration schedule, 413, 417, 425 Agglomerative methods, 408 Alpha error, 258 Alternative hypothesis, 257–261 Analysis of covariance (ANCOVA), 306 Analysis of variance (ANOVA), 298–310 with Excel, 309 one-way ANOVA, 299–302 with SPSS, 309 with Stata, 309 two-way ANOVA, 302–306 Anti-image covariance matrix (AIC), 435 Arithmetic mean, see Mean Autocorrelation, 374–375 Auxiliary regression, 377–379 Average, see Mean Average linkage, see Linkage methods B Bar chart, 29 Bartlett test, 302, 434–435 Base period, 390, 392, 394 Bayes’ theorem, 155–157 Bessel’s corrected variance, 52 Beta coefficient of a regression, 355 Beta error, 258 Binomial distribution, 173–177 with Excel, 176 with Stata, 176 Birthday paradox, 160 Biserial rank correlation, 108 Bivariate association, 71–129 strength of, 82, 93 Bivariate centroid, 93, 357, 369 Boxplot, 47 Bravais–Pearson correlation, 90 C Cardinal scale, 18–21, 23, 31 Causality, 353 Central limit theorem, 227, 302, 324 Central tendency, 261, 278, 311 measures of, 33–47 Centroid linkage, see Linkage methods Chi-square, 73–77 Chi-squared distribution, 199–202 with Excel, 201 with Stata, 201 Chi-square test of independence, 317–323 with Excel, 322 with SPSS, 320 with Stata, 322 Cluster analysis, 423–424 with SPSS, 424 with Stata, 424 Coefficient of correlation, 90–94 Coefficient of determination, see R-square Coefficient of determination (adjusted), see Adjusted R-square Coefficient of variation, 53 Combination with repetition, 149 without repetition, 148 Combinatorics, 146–150 Communalities, 435–437, 445 Complete linkage, see Linkage methods Concentration concentration ratio, 57 Gini coefficient, 58 # Springer Nature Switzerland AG 2019 T Cleff, Applied Statistics and Multivariate Data Analysis for Business and Economics, https://doi.org/10.1007/978-3-030-17767-6 469 470 Concentration (cont.) Herfindahl index, 58 Lorenz curve, 58 measures of, 57–60 Concentration ratio, see Concentration Conditional frequency, 73 Conditional probability, 153–154 Confidence intervals with Excel, 243 for the mean, 230–236 for proportions, 239–241 with SPSS, 245 with Stata, 247 for variances, 241–243 Confidence level, 231, 232, 237, 245, 258, 320 Consistent estimation, 230 Contingency coefficient, 79–81 Contingency table, 73, 318 Correlation with Excel, 112 Pearson, 90–93 Spearman (see Spearman’s rank correlation) with SPSS, 110 spurious, 114 with Stata, 110 Correlation matrix, 433–436 Covariable, 306 Covariance, 91, 93, 110 Covariate, 306 Cramer’s V, 120 Cross-sectional analyses, 134, 389 Crosstab, 71–73 D Deflating time series, 399–400 Dendrogram, 417 Density, 31, 185, 191 Density function, 185, 187, 199 Dependent t-test, see t-test for dependent samples Descriptive statistics, Dichotomous variable, 72 Dispersion parameters, 49–54 Distance matrix, 413 Distribution function, 29, 31–33 Ducan, 306 E Eigenvalue, 436–438 Empirical standard deviation, 51 Empirical variance, 51 Index Equidistance, 19, 38, 96, 100 Error probability, Error sum of squares (ESS), 363 Error term, 374 Euclidian distance, 410, 412 Event, 139–146 combined, 140 complementary, 141 elementary, 140 intersection of events, 140 Excel analysis of variance (ANOVA) with, 309 binomial distribution with, 176 chi-squared distribution with, 201 chi-square test of independence with, 322 confidence intervals with, 243 correlation with, 112 dependent t-test with, 278 F-distribution with, 206 hypergeometric distribution with, 181 independent t-test with, 290 nominal association with, 86–87 normal distribution with, 197 one-sample t-test with, 268 paired t-test with, 278 partial correlation with, 119 Poisson distribution with, 184 regression with, 363 t-distribution with, 204 t-test for independent samples with, 290 univariate parameters with, 62 Wilcoxon signed-rank test with, 283 Excess, 56 Expected counts, 74–77, 83, 120, 323 Expected frequency, 74, 83, 319 Expected relative frequency, 75 Expected value continuous distribution, 187 discrete distribution, 172 Extreme values, 47 F Factor analysis, 433–441 with SPSS, 441 with Stata, 441 Factor matrix, 439 Factor score coefficient matrix, 440 F-distribution, 205–208 with Excel, 206 with Stata, 208 Fisher index, see Index Fourth central moment, 56 Index Frequency distribution, 29, 63, 171, 190 Frequency table, 27, 71 F-test, 287 Full survey, 3, G Gauß test, see One-sample Z-test Geometric mean, 39 Gini Coefficient, see Concentration Goodness of fit, 366 H Harmonic mean, 40 Herfindahl index, see Concentration Heteroskedastic, 375 H1 hypothesis, 257–261 Histogram, 31–33 Homoscedasticity, 375 H test, see Kruskal–Wallis H test Hypergeometric distribution, 177–182 with Excel, 181 with Stata, 181 Hypothesis about a relationship, 260 falsification, 257 verification, 257 I Inability error, 136 Inclusion–Exclusion Principle for disjoint events, 150 for nondisjoint events, 152 Independent t-test, see t-test for independent samples Index, 389–405 Laspeyres price index, 392 Paasche price index, 395 price index, 390 sales index, 398 value index, 398 Inductive statistics, 7, 11 Interaction effect, 303–306 Interquartile range, 47 Interval estimation, 230–250 Interval scales, 19 Item non-response, 236 K Kaiser criterion, 437 471 Kaiser-Meyer-Olkin measure, 435, 441 Kendall’s Tau, 100–105 KMO measure, 435, 441 Kolmogorov–Smirnov test, 324 Kruskal–Wallis H test, 310–317 with SPSS, 316 with Stata, 316 Kurtosis, 56 L Laspeyres price index, see Index Law of total probability, 154 Left-skewed, 54–56 Leptokurtic distribution, 56 Level of measurement, 71 Levene’s test, 287 Linear relationship, 90–93 Linkage methods, 416–417 average linkage, 416 centroid linkage, 416 complete linkage, 416 single linkage, 416 Ward’s method, 416 Longitudinal study, 389 Lorenz curve, see Concentration M Mann–Whitney U test, 292–298 with SPSS, 296 with Stata, 296 MANOVA, 308 Marginal frequencies, 73, 75 Mean arithmetic, 34 geometric, 39 harmonic, 40 trimmed, 36 Mean rank, 97, 100, 112, 313, 329 Measurement error, 137 Measurement theory, 131 Measure of sample adequacy, 435 Median, 43 Median absolute deviation, 50 Mesokurtic distribution, 56 Metric variable, 87, 106–108 Missing values, 22–24 Modal value, see Mode Mode, 34, 56 Model symbolic, 10 verbal, 10 .. .Applied Statistics and Multivariate Data Analysis for Business and Economics Thomas Cleff Applied Statistics and Multivariate Data Analysis for Business and Economics A Modern... Switzerland AG 2019 T Cleff, Applied Statistics and Multivariate Data Analysis for Business and Economics, https://doi.org/10.1007/978-3-030-17767-6_3 27 28 Univariate Data Analysis Analysis... future orders and plan production and material needs accordingly But # Springer Nature Switzerland AG 2019 T Cleff, Applied Statistics and Multivariate Data Analysis for Business and Economics,

Định dạng
Số trang	487
Dung lượng	24,7 MB
File đính kèm	108. Applied Stati.rar (17 MB)