Thống kê kinh doanh sử dụng Excel

505 20 0
Thống kê kinh doanh sử dụng Excel

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

business statistics using Excel® This page intentionally left blank business statistics using Excel® Second edition Glyn Davis & Branko Pecar 1 Great Clarendon Street, Oxford, OX2 6DP, United Kingdom Oxford University Press is a department of the University of Oxford It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © Glyn Davis and Branko Pecar 2013 The moral rights of the authors have been asserted First Edition copyright 2010 Impression: All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer British Library Cataloguing in Publication Data Data available ISBN 978–0–19–965951–7 Printed in Italy by L.E.G.O S.p.A.—Lavis TN Links to third party websites are provided by Oxford in good faith and for information only Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work Preface Aims of the book It has long been recognized that the development of modular undergraduate programmes coupled with a dramatic increase in student numbers has led to a reconsideration of teaching practices This statement is particularly true in the teaching of statistics and, in response, a more supportive learning process has been developed A classic approach to teaching statistics, unless one is teaching a class of future professional statisticians, can be difficult and is often met with very little enthusiasm by the majority of students A more supportive learning process based on method application rather than method derivation is clearly needed The authors thought that by relying on some commonly available tools, Microsoft Excel 2010 in particular, such an approach would be possible To this effect, a new programme relying on the integration of workbook based open learning materials with information technology tools has been adopted The current learning and assessment structure may be defined as follows: (a) To help students ‘bridge the gap’ between school and university (b) To enable a student to be confident in handling numerical data (c) To enable students to appreciate the role of statistics as a business decision-making tool (d) To provide a student with the knowledge to use Excel 2010 to solve a range of statistical problems This book is aimed at students who require a general introduction to business statistics that would normally form a foundation-level business school module The learning material in this book requires minimal input from a lecturer and can be used as a self-instruction guide Furthermore, three online workbooks are available; two to help students with Excel and practise numerical skills, and an advanced workbook to help undertake factorial experiment analysis using Excel 2010 The growing importance of spreadsheets in business is emphasized throughout the text by the use of the Excel spreadsheet The use of software in statistics modules is more or less mandatory at both diploma and degree level, and the emphasis within the text is on the use of Excel 2010 to undertake the required calculations How to use the book effectively The sequence of chapters has been arranged so that there is a progressive accumulation of knowledge Each chapter guides students step by step through the theoretical and spreadsheet skills required Chapters also contain exercises that give students the chance to check their progress vi Preface Hints on using the book (a) Be patient and work slowly and methodically, especially in the early stages when progress may be slow (b) Do not omit or ‘jump around’ between chapters; each chapter builds upon knowledge and skills gained previously You may also find that the Excel applications described earlier in the book are required to develop applications in later chapters (c) Try not to compare your progress with others too much Fastest is not always best! (d) Don’t try to achieve too much in one session Time for rest and reflection is important (e) Mistakes are part of learning Do not worry about them The more you repeat something, the fewer mistakes you will make (f ) Make time to complete the exercises, especially if you are learning on your own They are your best guide to your progress (g) The visual walkthroughs have been developed to solve a particular statistical problem using Excel If you are not sure about the Excel solution then use the visual walkthrough (flash movies) as a reminder Brief contents How to use this book How to use the Online Resource Centre xiv xvi Visualizing and presenting data Data descriptors Introduction to probability 107 Probability distributions 135 Sampling distributions and estimating 185 Introduction to parametric hypothesis testing 243 Chi-square and non-parametric hypothesis testing 296 Linear correlation and regression analysis 343 Time series data and analysis 406 Glossary Index 58 468 477 Detailed contents How to use this book How to use the Online Resource Centre Visualizing and presenting data Overview Learning objectives 1.1 The different types of data variable 1.2 Tables 1.2.1 What a table looks like 1.2.2 Creating a frequency distribution 1.2.3 Types of data 1.2.4 Creating a table using Excel PivotTable 11 1.2.5 Principles of table construction 21 1.3 Graphical representation of data 10 21 1.3.1 Bar charts 22 1.3.2 Pie charts 27 1.3.3 Histograms 31 1.3.4 Histograms with unequal class intervals 40 1.3.5 Frequency polygon 42 1.3.6 Scatter and time series plots 47 1.3.7 Superimposing two sets of data onto one graph Techniques in practice Summary Key terms Further reading xiv xvi Data descriptors 51 54 56 57 57 58 Overview Learning objectives 58 59 2.1 Measures of central tendency 59 2.1.1 Mean, median, and mode 59 2.1.2 Percentiles and quartiles 63 2.1.3 Averages from frequency distributions 67 2.1.4 Weighted averages 77 2.2 Measures of dispersion 80 2.2.1 The range 82 2.2.2 The interquartile range and semi-interquartile range (SIQR) 82 2.2.3 The standard deviation and variance 83 2.2.4 The coefficient of variation 88 Detailed contents 2.2.5 Measures of skewness and kurtosis 2.3 Exploratory data analysis 2.3.1 Five-number summary 2.3.2 Box plots 2.3.3 Using the Excel ToolPak add-in Techniques in practice Summary Key terms Further reading Introduction to probability 94 94 96 100 102 104 105 105 107 Overview Learning objectives 107 107 3.1 Basic ideas 107 3.2 Relative frequency 109 3.3 Sample space 112 3.4 The probability laws 114 3.5 The general addition law 115 3.6 Conditional probability 117 3.7 Statistical independence 120 3.8 Probability tree diagrams 123 3.9 Introduction to probability distributions 124 3.10 Expectation and variance for a probability distribution 127 131 133 133 133 Techniques in practice Summary Key terms Further reading 89 Probability distributions 135 Overview Learning objectives 135 135 4.1 Continuous probability distributions 136 4.1.1 Introduction 136 4.1.2 The normal distribution 136 4.1.3 The standard normal distribution (Z distribution) 140 4.1.4 Checking for normality 149 4.1.5 Other continuous probability distributions 153 4.1.6 Probability density function and cumulative distribution function 4.2 Discrete probability distributions 154 155 4.2.1 Introduction 155 4.2.2 Binomial probability distribution 155 ix 472 Glossary values in the time series The differences between these values are represented as absolute percentage values, i.e the effects of the sign are ignored Mean error (ME) The mean value of all the differences between the actual and forecasted values in the time series Mean percentage error (MPE) The mean value of all the differences between the actual and forecasted values in the time series The differences between these values are represented as percentage values Mean square error (MSE) The mean value of all the differences between the actual and forecasted values in the time series The differences between these values are squared to avoid positive and negative differences cancelling each other Median The median is the value halfway through the ordered data set Mixed model The mixed time series blends both additive and multiplicative components together to identify the actual time series value Mode The mode is the most frequently occurring value in a set of discrete data Moving average Averages calculated for a limited number of periods in a time series Every subsequent period excludes the first observation from the previous period and includes the one following the previous period This becomes a series of moving averages Moving average trend The moving average trend is a method of forecasting or smoothing a time series by averaging each successive group of data points Multiple regression model Multiple linear regression aims to find a linear relationship between a dependent variable and several possible independent variables Multiplication law Multiplication law is a result used to determine the probability that two events, A and B, both occur Multiplication law for independent events Multiplication law for independent events is the chance that they both happen simultaneously is the product of the chances that each occurs individually, e.g P(A and B) = P(A)*P(B) Multiplication law for joint events see Multiplication law Multiplicative model The multiplicative time series model is a model whereby the separate components of the time series are multiplied together to identify the actual time series value Multivariate methods Methods that use more than one variable and try to predict the future values of one of the variables by using the values of other variables Mutually exclusive Mutually exclusive events are ones that cannot occur at the same time Nominal scale A set of data is said to be nominal if the values belonging to it can be assigned a label rather than a number Non-parametric Non-parametric tests are often used in place of their parametric counterparts when certain assumptions about the underlying population are questionable Non-seasonal Non-seasonal is the component of variation in a time series which is not dependent on the time of year Non-stationary time series A time series that does not have a constant mean and oscillates around this moving mean Normal approximation to the binomial If the number of trials, n, is large, the binomial distribution is approximately equal to the normal distribution Normal distribution The normal distribution is a symmetrical, bell-shaped curve, centred at its expected value Normal probability plot Graphical technique to assess whether the data is normally distributed Normality of errors Normality of errors assumption states that the errors should be normally distributed—technically normality is necessary only for the t-tests to be valid, estimation of the coefficients only requires that the errors be identically and independently distributed Null hypothesis (H0) The null hypothesis, H0, represents a theory that has been put forward but has not been proved Observed frequency In a contingency table the observed frequencies are the frequencies actually obtained in each cell of the table, from our random sample One sample test A one sample test is a hypothesis test for answering questions about the mean (or median) where the data are a random sample of independent observations from an underlying distribution One sample t-test for the population mean A one sample t-test is a hypothesis Glossary test for answering questions about the mean where the data are a random sample of independent observations from an underlying normal distribution where population variance is unknown One sample z-test for the population mean A one-sample z-test is used to test whether a population parameter is significantly different from some hypothesized value One tail test A one tail test is a statistical hypothesis test in which the values for which we can reject the null hypothesis, H0, are located entirely in one tail of the probability distribution Ordinal scale Ordinal scale is a scale where the values/observations belonging to it can be ranked (put in order) or have a rating scale attached You can count and order, but not measure, ordinal data Ordinal variable A set of data is said to be ordinal if the values belonging to it can be ranked Outcome An outcome is the result of an experiment or other situation involving uncertainty Outlier An outlier is an observation in a data set which is far removed in value from the others in the data set Parametric Any statistic computed by procedures that assumes the data were drawn from a particular distribution Pearson’s coefficient of correlation Pearson’s correlation coefficient measures the linear association between two variables that have been measured on interval or ratio scales Pie chart A pie chart is a way of summarizing a set of categorical data Point estimate A point estimate (or estimator) is any quantity calculated from the sample data which is used to provide information about the population Point estimate of the population mean Point estimate for the mean involves the use of the sample mean to provide a ‘best estimate’ of the unknown population mean Point estimate of the population proportion Point estimate for the proportion involves the use of the sample proportion to provide a ‘best estimate’ of the unknown population proportion Point estimate of the population variance Point estimate for the variance in- volves the use of the sample variance to provide a ‘best estimate’ of the unknown population variance Poisson distribution Poisson distributions model a range of discrete random data variables Poisson probability distribution The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event Polynomial line A polynomial line is a curved line whose curvature depends on the degree of the polynomial variable Polynomial trend A model that uses an equation of any polynomial curve (parabola, cubic curve, etc.) to approximate the time series Population mean The population mean is the mean value of all possible values Population standard deviation The population standard deviation is the standard deviation of all possible values Population variance The population variance is the variance of all possible values Power trend A model that uses an equation of a power curve (a parabola) to approximate the time series Probability Probability provides a quantitative description of the likely occurrence of a particular event Probability of event A given that event B has occurred See Conditional probability Probable Probable represents that an event or events is likely to happen or to be true P-value The p-value is the probability of getting a value of the test statistic as extreme as or more extreme than that observed by chance alone, if the null hypothesis is true Q1 Q1 is the lower quartile and is the data value a quarter way up through the ordered data set Q3 Q3 is the upper quartile and is the data value a quarter way down through the ordered data set Qualitative variable Variables can be classified as descriptive or categorical 473 474 Glossary Quantitative variable Variables can be classified using numbers Quartiles Quartiles are values that divide a sample of data into four groups containing an equal number of observations Random experiment A random experiment is an experiment, trial, or observation that can be repeated numerous times under the same conditions Random sample A random sample is a sampling technique where we select a sample from a population of values Random variable A random variable is a function that associates a unique numerical value with every outcome of an experiment Ranks List data in order of size Range The range of a data set is a measure of the dispersion of the observations Ratio scale Ratio scale consists not only of equidistant points but also has a meaningful zero point Raw data Raw data is data collected in original form Region of rejection The range of values that leads to rejection of the null hypothesis Regression analysis Regression analysis is used to model the relationship between a dependent variable and one or more independent variables Regression coefficient A regression coefficient is a measure of the relationship between a dependent variable and an independent variable Relative frequency Relative frequency is another term for proportion; it is the value calculated by dividing the number of times an event occurs by the total number of times an experiment is carried out Residual The residual represents the unexplained variation (or error) after fitting a regression model Residuals (R) The differences between the actual and predicted values Sometimes called forecasting errors Their behaviour and pattern has to be random Right-skewed Right-skewed (or positive skew) indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean Robust test If a test is robust, the validity of the test result will not be affected by poorly structured data In other words, it is resistant against violations of parametric assumptions Sample space The sample space is an exhaustive list of all the possible outcomes of an experiment Sample standard deviation A sample standard deviation is an estimate, based on a sample, of a population standard deviation Sampling distribution The sampling distribution describes probabilities associated with a statistic when a random sample is drawn from a population Sampling error Sampling error refers to the error that results from taking one sample rather than taking a census of the entire population Sampling frame A sampling frame is the source material or device from which a sample is drawn Scatter plot A scatter plot is a plot of one variable against another variable Seasonal Seasonal is the component of variation in a time series which is dependent on the time of year Seasonal component A component in the classical time series analysis approach to forecasting that covers seasonal movements of the time series, usually taking place inside one year’s horizon Seasonal time series A time series, represented in the units of time smaller than a year, that shows regular pattern in repeating itself over a number of these units of time Seasonal variations (S) The seasonal variations of the time series model that shows a periodic pattern over one year or less Shape The shape of the distribution refers to the shape of a probability distribution and involves the calculation of skewness and kurtosis Significance level, α The significance level of a statistical hypothesis test is a fixed probability of wrongly rejecting the null hypothesis, H0, if it is in fact true Sign test The sign test is designed to test a hypothesis about the location of a population distribution Simple exponential smoothing Simple exponential smoothing is a forecasting technique that uses a weighted average of past time series values to arrive at smoothed time series values that can be used as forecasts Glossary Simple index A simple index is designed to measure changes in some measure over time Skewness Skewness is defined as asymmetry in the distribution of the data values Slope Gradient of the fitted regression line Smoothing constant Smoothing constant is a parameter of the exponential smoothing model that provides the weight given to the most recent time series value in the calculation of the forecast value Spearman’s rank coefficient of correlation Spearman’s rank correlation coefficient is applied to data sets when it is not convenient to give actual values to variables but one can assign a rank order to instances of each variable Standard deviation Measure of the dispersion of the observations (A square root value of the variance) Standard error of forecast The square root of the variance of all forecasting errors adjusted for the sample size Standard error of the estimate (SEE) The standard error of the estimate (SEE) is an estimate of the average squared error in prediction Standard error of the mean The standard error of the mean (SEM) is the standard deviation of the sample mean’s estimate of a population mean Standard error of the proportion The standard error of the proportion is the standard deviation of the sample proportion’s estimate of a population proportion Standard normal distribution A standard normal distribution is a normal distribution with zero mean (μ = 0) and unit variance (σ2 = 1) Stated limits The lower and upper limits of a class interval Statistic A statistic is a quantity that is calculated from a sample of data Statistical independence Two events are independent if the occurrence of one of the events gives us no information about whether or not the other event will occur Statistical power The power of a statistical test is the probability that it will correctly lead to the rejection of a false null hypothesis Stationary time series A time series that does have a constant mean and oscillates around this mean Student’s t distribution The t distribution is the sampling distribution of the t statistic Sum of squares for error (SSE) The SSE measures the variation in the modelling errors Sum of squares for regression (SSR) The SSR measures how much variation there is in the modelled values Symmetrical A data set is symmetrical when the data values are distributed in the same way above and below the middle value Table A table shows the number of times that items occur Tally chart A tally chart is a method of counting frequencies, according to some classification, in a set of data Test statistic A test statistic is a quantity calculated from our sample of data Tied ranks Two or more data values share a rank value Time period An unit of time by which the variable is defined (an hour, a day, a month, a year, etc.) Time series A variable measured and represented per units of time Time series plot A chart of a change in variable against time Total sum of squares (SST) The SST measures how much variation there is in the observed data (SST = SSR + SSE) Trend (T) The trend is the long-run shift or movement in the time series observable over several periods of time Trend component A component in the classical time series analysis approach to forecasting that covers underlying directional movements of the time series True or mathematical limits True or mathematical limits separate one class in a grouped frequency distribution from another Two sample tests A two sample test is a hypothesis test for answering questions about the mean where the data are collected from two random samples of independent observations, each from an underlying distribution Two sample t-test for population mean (dependent or paired samples) A two sample t-test for population mean 475 476 Glossary (dependent or paired samples) is used to compare two dependent population means inferred from two samples (dependent indicates that the values from both samples are numerically dependent upon each other—there is a correlation between corresponding values) Two sample t-test for the population mean (independent samples, equal variance) A two sample t-test for the population mean (independent samples, equal variance) is used when two separate sets of independent and identically distributed samples are obtained, one from each of the two populations being compared Two sample t-test for population mean (independent samples, unequal variances) A two sample t-test for population mean (independent samples, unequal variances) is used when two separate sets of independent but differently distributed samples are obtained, one from each of the two populations being compared Two sample z-test for the population mean A two sample z-test for the population mean is used to evaluate the difference between two group means Two sample z-test for the population proportion A two sample z-test for the population proportion is used to evaluate the difference between two group proportions Two tail test A two tail test is a statistical hypothesis test in which the values for which we can reject the null hypothesis, H0, are located in both tails of the probability distribution Type I error, α A type I error occurs when the null hypothesis is rejected when it is in fact true Type II error, α A type II error occurs when the null hypothesis, H0, is not rejected when it is in fact false Types of trends The type of trend can include line and curve fits to the data set Unbiased When the mean of the sampling distribution of a statistic is equal to a population parameter, that statistic is said to be an unbiased estimator of the parameter Uncertainty Uncertainty is a state of having limited knowledge where it is impossible to describe exactly the existing state or future outcome of a particular event occurring Univariate methods Methods that use only one variable and try to predict its future value on the basis of the past values of the same variable Upper one tail test A upper one tail test is a statistical hypothesis test in which the values for which we can reject the null hypothesis, H0 are located entirely in the right tail of the probability distribution Variable A variable is a symbol that can take on any of a specified set of values Variance Measure of the dispersion of the observations Variation Variation is a measure that describes how spread out or scattered a set of data is Wilcoxon signed rank sum test The Wilcoxon signed ranks test is designed to test a hypothesis about the location of the population median (one or two matched pairs) Index Note: Excel functions are shown in capitals A A ABS 253–4, 256, 258–60, 267–8, 270, 326, 454–5 absolute error 453, 456 absolute references 368 actual time series 425, 432 Add Trendline 51, 366, 397, 421, 433, 435 addition law 115, 118, 133 addition law for mutually exclusive events 115, 133 additive model 419, 445–6, 466 adjusted r squared 399, 404–5 aggregate indices 415–16, 465–6 alpha 232, 248, 251, 264, 294, 340, 442 alternative hypotheses 243–6, 251–2, 280, 286, 288, 355–6, 358–9 true 251, 290–1 Analysis menu 98–9 analysis of variance (ANOVA) 153, 246–7, 286, 294, 371, 381, 394–5 Analysis ToolPak see data analysis, tool ANOVA see analysis of variance approximation 173, 176, 180, 205, 232, 322 normal 175–6, 179–80, 183, 303, 307, 322, 325 arithmetic mean 59, 78, 105 arrays 455, 459–60 association 246–7, 296–8, 300–2, 340–1, 343, 349–50, 356–7 chi-square test of 297–8, 300–1, 340 linear 347, 349 autocorrelations 370, 457 AVERAGE 65–6, 214, 261–2, 270, 275–6, 383–4, 462–3 average values 58, 80, 315, 320, 430–1, 436, 452 estimated 316–17 averages see also mean; median; mode moving see moving averages samples 261–2 weighted 77–8, 436 axes horizontal 32, 44, 74 titles 24, 26, 37, 50, 52, 99 vertical 44, 344 B bar charts 21–7, 30–2, 57 bars 22, 26, 32, 37, 39–40, 52, 61 error 52, 96 base year 413–14, 417 Beta 251, 290 bias 187, 192–3, 198, 219–20, 241 Bin Range 8–10, 34–5, 214 BINOM.DIST 160–4, 173, 177, 179, 321–2 binomial 158, 166, 173–9, 182–3, 212, 313, 322 coefficients 159–60 experiment 156, 173 probabilities 159–60, 175, 177, 179, 322 binomial probability distributions 175–7 binomial probability function 173 Bins 10, 42, 199–200, 215, 319 box-and-whisker plots 96, 99, 149 box plots 94, 96–9, 105 Brown’s exponential smoothing method 438, 441, 445 see also exponential smoothing bull hits 121, 157–8, 160 C calculated test statistics 255, 259–60, 270, 272, 276–7, 282, 328–30 calculated two-tail p-values 255, 259 categorical/category variables 2, 296, 298, 301–2 categorical data 19, 21–2, 56, 306, 310, 324, 340 CDF see cumulative distribution function central limit theorem 183, 185–6, 205–6, 208, 235, 241, 248 central tendency 33 measures of 83, 90, 104 chance 107, 122, 133, 136, 188 Chart Tools 26, 30 charts 21–2, 24, 26–7, 44, 50–2, 56–7, 96–9 appearance 24, 30, 50, 98 bar 21–7, 30–2, 57 column 37, 96 line 46, 97–8, 344, 421 titles 24, 26, 30, 37, 50, 52, 99 chi-square 294, 296–7, 299, 301, 309, 315–17, 341 critical values 316–17 distribution 136, 153, 183, 297–8, 301, 307, 312 goodness-of-fit test 168, 297, 313–14 and non-parametric hypothesis testing 296– 341 test 246, 296, 303, 306–7, 309, 313, 340–1 of association 297, 300–1, 340 and independent sampes 303–7 McNemar see McNemar test 478 Index CHISQ.DIST 300–1, 305, 315–16 CHISQ.DIST.RT 300, 305–6, 310, 315–16 CHISQ.INV.RT 300–1, 305, 307, 315–16 CHISQ.TEST 300, 305 CI see confidence interval class boundaries 8, 11, 32, 39, 57 lower see lower class boundaries upper 8, 11, 32–5, 42–3, 71, 82, 86–7 class frequency 41–2, 74 class intervals 10, 42–3, 58, 73 unequal 40, 42 class limits 10, 39, 57 class mid-points 43–4, 71, 73–4, 128 class widths 8, 11, 32–3, 35, 39, 41–2, 126 equal 8, 34, 37 standard 41–2 unequal 2, 42, 73 classical time series analysis 419–20, 466 classifications 6, 110–11, 155 cluster sampling 190 clusters 185, 188, 190 COD see coefficients, of determination coefficients 370, 374, 398, 425, 446, 457 correlation see correlation coefficients of determination (COD) 348, 366, 372–4, 387, 395, 399–400, 404–5 Pearson’s see Pearson’s correlation coefficient; Pearson’s coefficient of skewness regression 346, 370 of skewness Fisher’s 90 Pearson’s 90, 105 Spearman’s rank 343–4, 347, 356–60, 404–5 of variation 81, 88–9 column charts 37, 96 columns 11–12, 19–21, 24, 29, 297–8, 444–5, 462 totals 17, 298, 301 variables 297–8, 305, 308 COMBIN 160 conditional probability 119 confidence 225, 227, 231, 234–5, 241, 243, 285 level of 235, 241, 287–8, 290, 461 confidence intervals (CIs) 217– 19, 225–8, 230–6, 238–9, 382–3, 387, 458–61 constant 462–3 estimates 217–18, 226–35, 238, 382–3 lower 227, 231, 234, 236 population 225–42 upper 227, 231, 234, 236 confidence measurement see confidence intervals constant values 406, 417–18, 461 constant variance 370–1, 388, 397, 456 see also equal variances constants 363, 371, 387, 408, 423, 432, 448–9 Consumer Price Index see CPI contingency tables 22, 153, 298, 300–1, 303, 305–8 continuous data 2, 10, 60 continuous distributions 176, 219, 246, 313, 319 continuous random variables 136, 183 continuous variables 3, 136 convenience 156, 185, 187, 191 sampling 191 CORREL 349, 353 correlation coefficients 344, 347–8, 350, 352–4, 374, 405, 425 Pearson’s 90, 105, 343–4, 347–9, 351, 355–8, 404–5 Spearman’s rank 343–4, 347, 356–9, 404 correlations 279, 344, 346–7, 350, 355, 357–8, 370 linear see linear correlations negative 350–1, 356, 358 positive 350–1, 356, 358 serial 370 strong 350, 353 COUNT 261, 270, 275, 285, 300, 383–4, 455 COUNTA 305 COUNTIF 320–1, 326, 333 COVAR 347 covariance 347–9, 405 coverage error 185, 193 CPI (Consumer Price Index) 415–18, 465 critical f 287–8, 380 critical tables 143, 460 critical test statistic 251–2, 258–60, 272, 277, 282, 316–17, 329–30 critical values 91–2, 228, 259–60, 288–9, 311, 315–16, 358–60 estimates of 294, 340 lower 287–8 upper 286–9 critical z 228, 255, 323, 325, 329, 336 cross tabulation 22, 57, 343 tables 22 see also contingency tables cumulative distribution function (CDF) 154–5 cumulative frequency 69–70, 74–7, 133 curve 22, 70, 75–6 distributions 56, 69–70, 74 tables 69 curly brackets 455 cyclical variations 419, 466 D damping factor 440, 444 see also smoothing, constant data analysis 212–14, 252, 263–4, 273, 278, 370–2, 439–40 application of 273, 278, 282 exploratory 59, 94, 250 tool 252, 263–4, 272–3, 278, 282, 289–90, 444 regression solution 385–8 data descriptors 58–105 data distribution 60, 95, 149 data labels 30, 43, 50, 52 data sets 58–63, 84–5, 94–7, 104– 5, 343–7, 364–8, 370–2 ordered 59, 62, 64 data types 22, 56, 105, 247, 341 data variables 32, 39, 47, 56, 60, 343–4, 356 discrete random 155, 165 types 2–3 deflating of values 416–19 Index delta 446–7, 449 denominators 220, 287, 378, 380 dependent samples 246–7, 279, 297, 303, 307, 310, 322 dependent variables 343–6, 348, 350, 362–3, 375, 378–9, 399–400 descriptive statistics 66, 85, 94, 100 descriptors, data 58–105 deviations 220, 378, 425, 436–7, 451, 455–6, 459 mean absolute 453, 455–6, 466 standard see standard deviation unexplained 378 die rolling 108–9, 112–13, 120–1, 136, 195 differences median 322, 326–7 negative 448 non-parametric tests of 246 paired 325, 330 positive 322, 327 significant 244–6, 255, 259–60, 268, 304, 306–7, 311 squared 358, 363 discrete data 10–11, 41, 59, 68 discrete probability distributions 135–6, 155–83 discrete random variables 136, 155, 165–6, 183 discrete variables 32, 155–6, 183, 307 dispersion 33, 40, 56, 58–9, 80–1, 104–5, 127 measures of 58–9, 81, 96, 104 display 1, 11, 16–17, 303 distribution-free tests see nonparametric tests distribution functions 155, 246, 318 cumulative 154–5 distribution shape 59, 325 distributions 80–2, 89–91, 152–3, 182–3, 229, 246–50, 286–8 binomial probability 155, 175–7 chi-square 136, 153, 183, 297–8, 301, 307, 312 continuous 176, 219, 246, 313, 319 F 133, 153, 183, 286–8, 294, 378, 380 frequency see frequency, distributions left-skewed 95, 149, 152 leptokurtic 91, 93, 229 Mann-Whitney 335 mesokurtic 91, 93 non-normal 204, 340 normal 83, 92, 136–40 sampling 254, 262, 267, 271, 276 standard 141, 229, 248 null 153, 286 Poisson 133, 135–6, 155, 165–70, 173–5, 180–1, 313–17 population 149, 183, 205, 250, 254, 257–8, 262 probability see probability, distributions right-skewed 95, 149, 152 skewed 82–3, 89–90, 92, 95, 149, 152 left-skewed 95, 149, 152 right-skewed 95, 149, 152 standard normal 141, 229, 248 symmetric 90, 92, 94, 149, 205, 330 uniform 154 Durbin-Watson statistic 370 E empirical probability 110 equal chance 136, 188 equal variances 275, 295, 371, 456 error(s) absolute 453, 456 coverage 185, 193 degree of 217, 219–20, 368 forecasting 406, 420, 456, 459, 466 independence of 370, 405 interpretation 455–6 margin error 238–9 margin of 185, 193, 239 mean absolute percentage 453, 455–6, 466 mean percentage 453–6, 466 mean square 378, 381, 386, 448–9, 453, 455–6, 466 measurement error 185, 193 measurement of 51, 450–3, 455, 466 non-response 185, 193 normality of 370, 397 prediction 370 sampling 185–7, 193, 210, 225, 229, 248, 353 sum of squares for error 369, 381, 405 term 52, 383 type I 251, 290, 295 type II 251, 290–1, 295 types of 192–3, 251, 453–5 variance 371 ex-post forecasts 451, 466 expected frequencies 166, 169, 298–301, 305–7, 313, 315–17 expected values 107, 127–8, 131, 136, 218–19 experimental probability see empirical probability experiments 107–10, 112–14, 136, 156, 158–9, 161–2, 164–6 die 109, 197 factorial 246–7, 250, 294, 340 exploratory data analysis 59 exponential curves 392, 423 modified 392 exponential smoothing 248, 406, 432, 436–8, 466 forecasting with 438–45 seasonal 449 simple 436, 438–9, 446 single 437 exponential trends 423, 429 extrapolation 406–7, 419, 466 extreme values 11, 60, 62, 83–4, 90, 95–6, 149–50 F F distribution 133, 153, 183, 286–8, 294, 378, 380 F statistic 286–8 F test 285–90 factorial experiments 246–7, 250, 294, 340 factorials 159, 161, 166 fair die 121, 136, 195 false null hypotheses 251, 290–1 F.DIST 286 479 480 Index finite populations 156, 207 F.INV 286, 288–9 F.INV.RT 286–9, 380 first quartile 64–5, 76, 82–3, 94–7, 105, 149 Fisher-Snedecor distribution see F distribution Fisher’s kurtosis 92, 105 Fisher’s skewness coefficient 90 fitted lines 364–5 fitting 313–14, 344–6, 362–3, 365, 404, 420, 425 five-number summary 94–5, 105, 149 fixed costs 130–1 forecast values 420, 429, 436–7, 448–9, 451–3, 455 forecasting 407, 419–20, 423, 431–3, 435, 437–9, 450–1 errors 406, 420, 456, 459, 466 with exponential smoothing 438–45 methods 406, 423, 432, 435–6, 439, 456, 465–6 with moving averages 431–5 freedom, degrees of 229–30, 232, 258–9, 276–7, 287–9, 300–1, 377–8 frequency 3–7, 9–10, 26–7, 32–4, 37–44, 69–76, 199–200 class 41–2, 74 cumulative see cumulative frequency distributions 2–4, 6–8, 67–9, 73–4, 126–7, 167–8, 215 creation 6–10 grouped 6, 8, 10, 35, 57, 71–2, 82 expected 166, 169, 298–301, 305–7, 313, 315–17 high 89 highest 69, 73 histograms 21 low 89 polygons 2, 21–2, 42–6, 74, 96 relative 27, 32, 107, 110, 124–7, 133, 185 tables 9, 31, 36, 69 cumulative 69 grouped 9, 36 total 42, 75–6, 126 Friedman test 246, 340 F.TEST 286 G general addition law 115–16, 133 Gompertz curves 393 goodness-of-fit 153, 246, 297, 313 graphical methods 56, 73–6, 96, 105, 370 graphical representations 42 see also tables graphs 3, 21–2, 30–2, 49, 51–2, 57–8, 150–1 line 46, 97–8, 344, 421 grouped frequency distributions 6, 8, 10, 35, 57, 71–2, 82 tables 9, 36 GROWTH 429 H H0 see null hypotheses H1 see alternative hypotheses histograms 8, 21–2, 56–7, 73–5, 126, 199–200, 213–15 frequency 21 with unequal class intervals 40–2 homoscedasticity see constant variance; equal variances horizontal axis 32, 44, 74 horizontal grid lines 27, 52 hyperbola curves 391 hypotheses 243, 245–6, 248, 251, 257, 288, 294 alternative see alternative hypotheses initial research 244 non-parametric 296–7, 299, 301, 305, 307, 309, 311 null see null hypotheses statistical 186, 241, 298 hypothesis statements 244, 255, 259, 280, 287–8, 302, 307 alternative hypothesis (H1) 244–5 null hypothesis (H0) 244–5 hypothesis tests 228, 243–4, 246–7, 249, 257, 353, 358 see also non-parametric tests; parametric tests statistical 185, 244, 248–9 I independence 122, 313, 370, 456 of errors 370, 405 statistical 117, 120, 133 independent events 120–1 independent observations 220, 246, 257, 297 independent samples 246–7, 295–7, 303–7, 319, 331, 334 and chi-square test 303–7 random 266, 332 and unequal variances 274–9 independent variables 343, 345–6, 362–3, 370–1, 374–5, 390, 396–8 index numbers 406, 411–19, 466 indices 406, 413–14, 465 aggregate 415–16, 465–6 simple 412–15 inference measures 366, 368 inference(s) 187, 218, 243, 297, 371 statistical 194 infinite populations 156, 207 inflation 407, 409, 415–16 input data series 24, 29, 37, 50, 52 input ranges 35, 214, 444 INT 276 integers 10, 115–16 odd 115–16 INTERCEPT 365–6, 372, 376, 379, 382, 384, 427 intercepts 365, 405, 426–7, 446 interest rates 4, 409 interpolation, linear 64 interquartile range 58, 81–3, 96, 105 interval measurement scales interval scales 3, 8, 94, 96 intervals 3, 20–2, 35, 225, 237–9, 432–3, 461–3 confidence see confidence intervals estimates 183, 217–19, 225, 228, 462 prediction 383, 385 K Kruskal-Wallis test 246, 340 kurtosis 59, 81, 100, 105 Index Fisher’s 92, 105 measures of 89–93 L labels 4, 8, 24, 29, 35, 37, 52 Layout tool menu 26, 98–9 least squares 348, 362–3, 394, 400 regression 343, 363, 393, 404 left-skewed distributions 95, 149, 152 legends 24, 26–7, 50 leptokurtic distribution 91, 93, 229 level of confidence 235, 241, 287–8, 290, 461 line charts/graphs 46, 97–8, 344, 421 linear associations 347, 349 linear correlations and regression analysis 343–405 significance 353–6 linear interpolation 64 linear models 395, 399 linear regression 372, 394, 407, 436 analysis 405 fitting of line to sample data 364–8 multiple 344, 362, 404 simple 343, 348, 362 linear relationships 343, 348, 362, 375, 377, 380, 383 linear trends 423, 425, 429, 445–6, 466 linearity 370, 388 assumption 371, 388, 397 location 17, 58–9, 89, 131, 246, 318, 324 logarithmic trends 423, 466 logarithms 166, 423 logistic curves 392 lower class boundaries 8, 11, 32–3, 35, 42–3, 82, 86–7 true 75–6 lower confidence intervals 227, 231, 234, 236 lower critical values 287–8 lower one tail tests 249, 288, 294, 319, 326 lower quartile see first quartile lower tail p-values 254, 288 LQ see first quartile M McNemar test 303, 308–10, 325, 341 see also z tests MAD see mean absolute deviation Mann-Whitney U test 246, 279, 297, 318, 324, 332, 340–1 MAPE see mean absolute percentage error margin errors 238–9 margin of error 185, 193, 239 matched pairs test, Wilcoxon see Wilcoxon signed rank sum test mathematical limits 10–11, 70 MAX 81, 321–2 mean 58–63, 80–92, 194–211, 217–35, 241–8, 253–5, 452–3 arithmetic 59, 78, 105 overall 197 population see population(s), mean samples 194–211, 213–14, 217–20, 222, 224–8, 230–1, 253–5 standard error of the 222, 229, 254, 258 weighted 78 mean absolute deviation (MAD) 453, 455–6, 466 mean absolute percentage error (MAPE) 453, 455–6, 466 mean percentage error (MPE) 453–6, 466 mean square due to regression 378, 381 mean square error (MSE) 378, 381, 386, 448–9, 453, 455–6, 466 measurement 2–3, 31, 194, 220–1, 294, 296, 356 error 185, 193 interval/ratio level of 246, 294 scales 3–4 units of 3, 89 median 58–63, 69–70, 74–6, 89– 90, 94–100, 149, 318–20 classes 72, 75, 87 differences 322, 326–7 population 319, 324, 327 position 72, 87–8 mesokurtic distributions 91, 93 midhinge 95, 149 midrange 95, 149 mixed models 419, 466 modal class 69, 72–4 modal values 60, 63, 74 mode 3, 59–63, 67–9, 72–4, 80, 89–90, 105 model reliability 399 testing 364, 372–4, 387 modelling errors 369 moving averages 406, 430–45 plots 433, 435 trend 423 MPE see mean percentage error MSE see mean square error multiple linear regression 344, 362, 404 multiplication law 117–20, 133 multiplicative model 419, 445–7, 466 multistage sampling 190 multivariate methods 409–10 mutually exclusive events 114– 15, 119–21, 133 N negative correlations 350–1, 356, 358 negative differences 448 negative values 83, 92, 347, 453 nominal data 4, 307–8 nominal scales non-linear relationships 344, 390–1, 393 non-normal distributions 204, 340 non-parametric hypothesis 296–7, 299, 301, 305, 307, 309, 311 non-parametric methods 246, 307 non-parametric tests 243, 246, 250, 296–7, 318–19, 331, 340 non-probability sampling methods 190–1 non-proportional quota sampling 192 non-response error 185, 193 non-stationary time series 407– 9, 431, 445 481 482 Index non-symmetry 95, 149 normal approximations 175–6, 179–80, 183, 303, 307, 322, 325 probability 177, 179, 181 solution 179, 181 normal curves 137–9, 141–2, 146–8, 154–5, 199, 208–9, 250 normal distributions 92, 135–7, 140–1, 143, 148, 175–6, 229 approximations 136, 185 to binomial distribution 175–9 normal equations 363 normal populations 186, 198–9, 218, 226, 319 normal probability curves 151–2 plots 149–53, 183, 371, 386, 388, 394, 396–7 normal sampling distributions 254, 262, 267, 271, 276 normality 149–53, 325, 370, 388, 456 of errors 370, 397 NORM.DIST 138–40, 142–3, 145–7, 177, 202–4, 206, 208–9 NORM.INV 147–8, 155 NORM.S.DIST 141–7, 202–4, 206, 208–11, 253–4, 262–3, 267–8 NORM.S.INV 147–8, 151, 227, 253, 255, 267, 310–11 null distributions 153, 286 null hypotheses 244–6, 248–9, 251–2, 288, 297–8, 309, 318–19 false 251, 290–1 testing 254, 258, 262, 267, 271, 276, 281 true 251, 290 numerators 153, 287, 378, 380 O observations 2–3, 81, 108, 330–1, 431–6, 438–9, 459–60 first 431–2, 440 independent 220, 246, 257, 297 last 427, 433, 440 paired 319, 324–5, 329, 356 tied 330 observed frequencies 169, 305, 312–13, 315 observed values 141, 155, 334–5, 363, 368, 373 ogive 22, 70, 74–6 one sample t-tests 246–7, 251, 291–2, 294, 319, 324 one sample z-tests 246, 294 one tail p-values 259, 263, 281, 290, 321, 327–9, 336 one tail tests lower 249, 288, 294, 319, 326 upper 249, 262, 268, 280, 288, 295, 326–8 order of size 60, 62–3, 69, 296 ordinal data 3–4, 21–2, 60, 105, 246, 340–1, 343–4 ordinal scales ordinal variables 21, 57 outcomes 107–8, 112–13, 116, 120, 124–5, 136, 155–6 possible 108–9, 112, 165 outliers 59–60, 62, 84, 95–6, 104–5, 149–50, 346–7 suspected 95–6, 150 overall mean 197 P p-values 251–2, 254–6, 286–8, 300–1, 305–7, 310–11, 315–17 calculated 300, 306, 316, 323, 387 two tail 255, 259 exact 307, 322–3, 336 lower 286, 333 lower tail 254, 288 measured 294, 340 method 254, 258–9, 262–3, 267–8, 270–1, 276–7, 281 one tail 259, 263, 281, 290, 321, 327–9, 336 two tail 253–5, 258–9, 267–8, 270–2, 276–7, 286–7, 310–11 upper 262, 280, 286 upper tail 254, 288 paired differences 325, 330 paired observations 319, 324–5, 329, 356 paired ranks 327 paired samples 279, 281, 294, 297, 307–8, 319, 324 pairs, matched 324 parabolas 391, 393, 420, 423 parameter conditions 313 parameters 135, 180, 194–5, 218, 393–4, 426–7, 437 population 183, 189, 193–5, 217–20, 225, 241, 246 unknown 217–18, 458 sample 183, 194, 246 parametric tests 243–97, 318–19, 331, 340 patterns 3, 48–50, 58, 157, 159, 370, 456 PDF see probability, density function peakedness 59, 81, 105 PEARSON 349, 353, 355 Pearson’s coefficient of skewness 90, 105 Pearson’s correlation coefficient 343–4, 347, 348–53, 355–8, 404–5 percentiles 62, 64, 75–6, 358 classes 76 perfect correlations 350–1 pie charts 19, 21–2, 27–30, 297 PivotCharts 11, 17, 19–20 PivotTables 11–20 plots 21, 126, 149, 186, 213, 344, 370–1 box 94, 96–9, 105 box-and-whisker 96, 99, 149 moving average 433, 435 normal probability 149–53, 183, 371, 386, 388, 394, 396–7 residual 370–1, 386, 388, 394, 396–7 scatter 21–2, 47–51, 344–7, 349–51, 364–5, 397, 399 time series 21–2, 47–51, 57, 408, 432, 447–8, 450 point estimates 185–6, 217–19, 222–3, 225, 241–2, 375, 436 Poisson distributions 133, 135–6, 155, 165–70, 173–5, 180–1, 313–17 approximation to binomial distribution 173–5 POISSON.DIST 168, 171–3, 181, 314, 316 Index polygons, frequency 2, 21–2, 42–6, 74, 96 polynomial curves 423 polynomial lines 411, 466 polynomial trends 423 pooled estimates 275 population(s) confidence intervals 225–42 distributions 149, 183, 205, 250, 254, 257–8, 262 estimates 185, 222, 224 finite 156, 207 infinite 156, 207 median 319, 324, 327 non-normal 186, 204 normal 186, 198–9, 218, 226, 319 parameters 183, 189, 193–5, 217–20, 225, 241, 246 unknown 217–18, 458 point estimates mean and variance 218–22 proportion and variance 222–4 type of 218 proportion 210–11, 217, 222–4, 236, 242, 246, 295 slope 364, 375, 399, 404 v samples 194 values 153, 185, 199, 217, 228, 353 true 219, 353, 363 variables 153 variances 85–6, 217, 219–20, 241–2, 246–7, 261–2, 286–8 positive correlations 350–1, 356, 358 positive relationships 47, 345, 357 positive values 92, 254, 348 power 245, 251, 292 curves 423 function 423 statistical 251, 290–2, 294 trends 423 precision 189–90 prediction 362, 372, 375, 460–1 errors 370 intervals 383–5 values 375, 462–3 predictor models 381, 387, 390 predictor variables 344, 348, 362, 364, 374–82, 387, 399 presentation 1–57 probability 107–33, 137–46, 154–61, 176–7, 179–81, 201–12, 251 conditional 119 density function (PDF) 95, 138–9, 141–2, 154–5, 301 distributions 107, 124–7, 129–31, 133, 135–83, 185, 249 binomial 175–7 continuous 135–6, 153–5, 183, 286 discrete 135–6, 155–83 Poisson see Poisson distributions empirical 110 frequency definition of 124 laws 107, 114–15, 133 general addition law 115–16, 133 normal approximation 177, 179, 181 samples 188, 190 theoretical 113 theory 135, 243 properties 195, 398, 456 proportional quota sampling 191 purposive sampling 191 Q Q1 see first quartile Q2 see second quartile Q3 see third quartile qualitative variables 2–3, 325 quantitative variables 2–3 QUARTILE.INC 64, 81 quartiles 64–5, 75, 88, 96 first 64–5, 76, 82–3, 94–7, 105, 149 ranges 59 second 63–4 third 64–5, 76, 83, 94–7, 149 quota sampling 191 non-proportional 192 proportional 191 R random experiments 108 random number generation 154, 212–13 random samples 110, 137, 163, 188–90, 193–4, 203–7, 209–13 independent 266, 307, 332 simple 188, 190 random sampling 186, 188 simple 188–90 stratified 189–90, 192 systematic 189 random variables 136, 154–6, 161, 163, 173, 179, 183 continuous 136, 183 discrete 136, 155, 165–6, 183 rank correlation coefficient, Spearman’s see Spearman’s rank correlation coefficient RANK.AVG 326–7, 332, 334 ranks 62, 296, 319, 322, 327–8, 332, 334 paired 327 shared 322, 327, 334 tied 329–30, 337, 356 ratios 3–4, 21–2, 57, 88, 105, 109–10, 306–7 raw data 1–2, 4, 8, 12, 57, 105, 279 rectangles 32, 40–2 region of rejection 249, 254–5, 263, 268, 287–8, 310–11, 322–3 regression 343, 362–5, 369–71, 378, 381, 395, 405 analysis 343, 345, 347, 349, 363, 367–71, 387–91 advanced topics 390–405 linear 405 and linear correlation 343– 405 assumptions 370–2, 375 coefficients 346, 370 equations 365, 373, 375, 383 least squares 343, 363, 393, 404 linear see linear regression lines 365–6, 368, 372–4, 378, 423 mean square due to 378, 381 models 362, 365, 375, 378 linear multiple 344, 404 multiple 362, 398–400, 404 non-linear 390–7 sum of squares 369, 381, 405, 425 Regression tool 381, 385, 387, 398, 404 483 484 Index rejection 244–5, 249, 251–2, 272, 290, 292, 294 regions/zones of 249, 254–5, 263, 268, 287–8, 310–11, 322–3 relative frequency 27, 32, 107, 110, 124–7, 133, 185 reliability 344, 372, 385, 398 models 399 residual plots 370–1, 386, 388, 394, 396–7 residual values 373, 420 residuals 365, 368, 370–4, 388, 396, 420, 456–7 response variables 348, 362 right-skewed distributions 95, 149, 152 risk 248, 350 rows 11–12, 20–1, 213, 297–8, 301, 307, 454 variables 305, 308 RSQ 373–4, 380 S sample parameters 183, 194, 246 sample size 189–90, 199–200, 204–11, 218–23, 234–6, 238–9, 248–50 calculating 237–9 sample space 107–8, 118–20, 133, 163 sample statistics 194, 217–18, 224, 246, 324 samples 185–210, 212–14, 217–20, 222–31, 246–8, 253–5, 257–62, 330–2 see also sampling averages 261–2 dependent 246–7, 279, 297, 303, 307, 310, 322 independent see independent samples large 218, 232, 459 mean 194–211, 213–14, 217–20, 222, 224–8, 230–1, 253–5 percentiles 371, 388, 396 proportion 183, 194, 210–11, 222–3, 235–6, 267, 307–8 random see random samples simple random 188, 190 small 233, 250, 324, 347, 459 standardized 201, 211 types of 188–92 v populations 194 variance 85, 91, 219–20, 222, 231–2, 234, 287 sampling 85, 156, 182, 185–7, 191–3, 198–9, 204–8 cluster 190 concept 186–93 convenience 191 distributions 185, 187, 189, 191, 193–5, 197, 248–9 and estimation 185–242 and mean 194–8 normal 254, 262, 267, 271, 276 and proportion 210–12 error 185–7, 193, 210, 225, 229, 248, 353 frame 187, 190, 242 multistage 190 from non-normal population 204–10 non-probability 187, 190 from normal population 198– 204 purposive 191 quota see quota sampling snowball 191–2 stratified 189–90 terminology 187 scales 3–4, 62, 92, 297, 346 interval 3, 8, 94, 96 ordinal y-axis 49–50 scatter plots 21–2, 47–51, 344–7, 349–51, 364–5, 397, 399 scores 3, 6, 32–3, 92, 109, 113, 122 SD see standard deviation SE see standard error seasonal components 48, 419–20, 445–6 seasonal exponential smoothing 449 seasonal forecasts 447, 450 seasonal time series 406–7, 409, 445, 466 seasonal variations 419, 466 second quartile 48–9, 63–4 SEE, see standard error, of estimate semi-interquartile range 58, 82–3 sequential numbers 407–8, 426, 429 serial correlation 370 shared ranks 322, 327, 334 signed rank sum test 246, 279, 297, 318–20, 324–5, 329–30, 340–1 significance 248, 254–5, 271–3, 276–8, 281–2, 286–8, 358–60 level 248–9, 255, 259, 261–3, 287–90, 354–5, 358–60 simple exponential smoothing 436, 438–9, 446 simple linear regression 343, 348, 362 simple random sampling 188–90 single exponential smoothing 437 SIQR 58–9, 81–5, 87–8, 105 skewed distributions 82–3, 89–90, 92, 95, 149, 152 skewness 58–9, 62, 75, 89–92, 96, 100, 105 coefficient of 90, 105 measures of 90 right 95–6, 149 SLOPE 365–6, 372, 376, 379, 382, 384, 427 slopes 365, 375, 382, 405, 426–7, 446 population 364, 375, 399, 404 smoothing 407, 423, 432, 437, 440, 466 constant 248, 437–8, 442, 444, 446, 466 see also damping factor exponential see exponential smoothing time series 430–45 snowball sampling 191–2 Solver 449 Spearman’s rank correlation coefficient 343–4, 347, 356–8, 404–5 critical values 360 spread 33, 40, 58–9, 80–3, 89, 94, 104–5 measures of 82–3 SQRT 86, 92, 145, 147–8, 177–8, 181, 459–60 square roots 35, 81, 83, 386, 425, 459 squared differences 358, 363 squared error 372, 453 Index squares least see least squares regression sum of 369, 381, 405, 425 sum of 369, 374, 381, 405 SSE see sum of squares, for error SSR see sum of squares, for regression SST see total sum of squares standard class widths (CWs) 41–2 standard deviation 83–9, 139–43, 196–9, 201–10, 219–23, 225–31, 458–9 standard error 197–8, 202–12, 220–5, 227, 229–31, 241–2, 372–3 of estimate 366, 372–3, 386–7 of forecast 459 of the mean 222, 229, 254, 258 population and sample 458–9 of the proportion 223 standard normal distribution 140–1, 183, 229, 248 STANDARDIZE 143 stated limits 10–11, 57 stationary time series 407–9, 431, 433–4, 436, 445 statistical independence 117, 120, 122, 133 statistical power 251, 290–2, 294 statistical tests 185, 192, 241, 244, 248–9, 251, 318 see also parametric tests; non-parametric tests choice of 247 STDEV.P 82, 195 STDEV.S 220, 222, 231, 270, 275–6, 285–6, 349 STEYX 373, 375, 377, 383, 385, 459–61, 463 straight lines 42, 51, 151, 344, 362, 411, 420 strata 189–90 stratification 190 stratified random sampling 189–90, 192 strength of correlations/ associations/ relationships 343, 347–8, 350, 374 Student’s t distribution 183 Student’s t-test 153, 246, 250, 376, 396 SUM 86–7, 161–4, 167–8, 195–7, 299–300, 304–5, 454–5 sum of squares 369, 374, 381, 405 for error 369, 374, 378, 381, 386, 405, 425 for regression 369, 378, 381, 386, 405, 425 total 369, 374, 378, 381, 386, 405, 425 SUMIF 326, 333 summary, five-number 94–5, 149 SUMPRODUCT 68, 72, 78, 86 SUMXMY2 447, 449, 455, 459–60, 463 suspected outliers 95–6, 150 symmetric distributions 90, 92, 94, 149, 205, 330 symmetry 59, 75, 92, 96, 105, 152, 250 T t-tests 278, 281–2, 324–5, 355, 370–1, 374–5, 377–8 model assumptions 250 one-sample see one-sample t-tests paired 247, 319, 324 Student’s 153, 246, 250, 376, 396 two sample see two sample t-tests tables 1, 4–6, 11, 56, 175, 297–8, 329 construction 21 contingency 22, 153, 298, 300–1, 303, 305–8 creation using PivotTable 11– 20 critical 143, 460 cross tabulation 22 cumulative frequency 69 data types 10–11 grouped frequency 9, 36 tally charts 6, 57 T.DIST 292 T.DIST.2T 258–9, 270–2, 276–8, 377 T.DIST.RT 259, 280–2 test statistics 228–9, 251–2, 286–8, 300–1, 316–17, 322–3, 336–7 calculated 255, 259–60, 270, 272, 276–7, 282, 328–30 critical see critical test statistic third quartile 64–5, 76, 83, 94–7, 149 tied observations 330 tied ranks 329–30, 337, 356 time, units of 407, 409, 446, 465 time periods 48, 188, 370, 407, 415, 419, 424 time points 49–50, 408–10, 421–2, 432–5, 440–2, 447–8, 462–3 time series 406–11, 419–21, 423, 430–1, 433–6, 445, 459 actual 425, 432 analysis 48, 406–66 classical 419–20, 466 data 406, 417, 423, 436, 443, 447 forecast 424–5 graphs 48–50, 408, 465 model 419 non-stationary 407–9, 431, 445 plots 21–2, 47–51, 57, 408, 432, 447–8, 450 seasonal 406–7, 409, 445, 466 short 430–1, 438, 460 smoothing 430–45 stationary 408, 431, 433–4, 436, 445 trend 420 univariate 409, 419, 465 values 407, 419, 436–7 T.INV 259, 280, 292 T.INV.2T 231, 258–9, 270, 276, 355–6, 359, 383–4 total probability 126, 137, 158–9, 161–2, 164 total sample size 111, 118, 191, 298 total sum of squares 369, 374, 378, 381, 386, 405, 425 tree diagrams 107, 123, 129, 136, 156–7 TREND 367, 460 trend chart functions 424–5 trend-fitting see fitting trend lines 51, 365, 367, 420–2, 424–6, 433 Trendline, Add 51, 366, 397, 421, 433, 435 trends 48, 368, 419–20, 423–5, 427–9, 445, 461–3 components 420, 466 fitting to time series 420–3 types of 423–4 485 486 Index trials 108, 110, 124, 156, 161, 163, 173 true limits see mathematical limits two sample t-tests 246–7, 269, 271, 274–6, 279, 281, 294–5 dependent samples 279–82 two sample z-tests 246, 295 two tail p-values 253–5, 258–9, 267–8, 270–2, 276–7, 286–7, 310–11 two tail tests 244, 249, 254, 258, 267, 286–8, 309–10 type I errors 251, 290, 295 type II errors 251, 290–1, 295 U UCB see upper class boundaries unbiased estimates 219–24, 241 unbiased estimators 195, 197, 210, 217–19, 221 uncertainty 107–8, 133, 191, 406, 450–1, 453, 461 underlying trends 419–20 unequal class intervals 40–2 unequal class widths 2, 42, 73 unequal variances 247, 295 unexplained deviation 378 unexplained variation 365, 369 uniform distribution 154 univariate methods 409–10, 466 univariate time series 409, 419, 465 upper class boundaries 8, 11, 32–5, 42–3, 71, 82, 86–7 upper confidence intervals 227, 231, 234, 236 upper critical values 286–9 upper one tail tests 249, 262, 268, 280, 288, 295, 326–8 upper p-values 262, 280, 286 upper quartile (UQ) see third quartile upper tail 255, 259 p-values 254, 288 UQ see third quartile V validity 243, 250, 319 VAR 82–3, 85, 127–8, 159, 164, 168, 170 variability 89, 198, 371, 373–4 variables 2–3, 21–2, 135–7, 343–5, 347–8, 393–7, 407–9 categorical 2, 296, 298, 301–2 column 297–8, 305, 308 dependent 343–6, 348, 350, 362–3, 375, 378–9, 399–400 discrete 32, 155–6, 183, 307 discrete random 136, 155, 165–6, 183 independent 343, 345–6, 362–3, 370–1, 374–5, 390, 396–8 qualitative 2–3, 325 quantitative 2–3 response 348, 362 row 305, 308 variance of errors assumption 397 variance ratio test 246–7, 294 variance(s) 83–6, 163–4, 167–8, 173–6, 218–19, 221–4, 286–7 analysis of see analysis of variance constant 370–1, 388, 397, 456 equal 275, 295, 371 error 371 population 85–6, 217, 219–20, 241–2, 246–7, 261–2, 286–8 samples 85, 91, 219–20, 222, 231–2, 234, 287 unequal 247, 295 variation 80–1, 83, 88–9, 369, 374, 395, 399–400 coefficient of 81, 88–9 cyclical 419, 466 irregular 419, 466 total 369, 374 unexplained 365, 369 VAR.P 82, 85 VAR.S 85, 219, 222, 231, 234 vertical axes 44, 344 visualization 1–57 W weighted averages 77–8, 436 weighted mean 78 weightings 41, 77, 436 Wilcoxon signed rank sum test 246, 279, 297, 318–20, 324–5, 329–30, 340–1 X x-axis 32–3, 39–40, 42, 49, 350 Y y-axis 32, 39, 350, 411 scales 49–50 Z z distribution see standard normal distribution z tests 246–7, 264, 282, 303, 307, 309, 319 see also McNemar test z-tests one-sample 246 two sample see two sample z-tests z-values 263, 458–9 ... (degrees; one decimal place) Conservative 400 (3 60/111 0)* 400 129.7 Labour 510 (3 60/111 0)* 510 165.4 Democrat 78 (3 60/111 0)* 78 25.3 Green 55 (3 60/111 0)* 55 17.8 67 (3 60/111 0)* 67 Other Total 1110 21.7... adopted are as follows: (a) aim for simplicity; (b) the table must have a comprehensive and explanatory title; (c) the source should be stated; (d) units must be stated clearly; (e) the headings for... resources Whigham, D (2 00 7) Business Data Analysis using Excel Oxford: Oxford University Press Lindsey, J K (2 00 3) Introduction to Applied Statistics: A Modelling Approach (2 nd edn) Oxford: Oxford

Ngày đăng: 23/08/2021, 17:23

Từ khóa liên quan

Mục lục

  • Cover

  • Contents

  • How to use this book

  • How to use the Online Resource Centre

  • 1 Visualizing and presenting data

    • Overview

    • Learning objectives

    • 1.1 The different types of data variable

    • 1.2 Tables

      • 1.2.1 What a table looks like

      • 1.2.2 Creating a frequency distribution

      • 1.2.3 Types of data

      • 1.2.4 Creating a table using Excel PivotTable

      • 1.2.5 Principles of table construction

      • 1.3 Graphical representation of data

        • 1.3.1 Bar charts

        • 1.3.2 Pie charts

        • 1.3.3 Histograms

        • 1.3.4 Histograms with unequal class intervals

        • 1.3.5 Frequency polygon

        • 1.3.6 Scatter and time series plots

        • 1.3.7 Superimposing two sets of data onto one graph

        • Techniques in practice

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan