Statistics from a to z

Free ebooks ==> www.Ebook777.com STATISTICS FROM A TO Z www.Ebook777.com STATISTICS FROM A TO Z Confusing Concepts Clariﬁed ANDREW A JAWLIK Free ebooks ==> www.Ebook777.com Copyright © 2016 by John Wiley & Sons, Inc All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose No warranty may be created or extended by sales representatives or written sales materials The advice and strategies contained herein may not be suitable for your situation You should consult with a professional where appropriate Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002 Wiley also publishes its books in a variety of electronic formats Some content that appears in print may not be available in electronic formats For more information about Wiley products, visit our web site at www.wiley.com Library of Congress Cataloging-in-Publication Data Names: Jawlik, Andrew Title: Statistics from A to Z : confusing concepts clarified / Andrew Jawlik Description: Hoboken, New Jersey : John Wiley & Sons, Inc., [2016] Identifiers: LCCN 2016017318 | ISBN 9781119272038 (pbk.) | ISBN 9781119272007 (epub) Subjects: LCSH: Mathematical statistics–Dictionaries | Statistics–Dictionaries Classification: LCC QA276.14 J39 2016 | DDC 519.503–dc23 LC record available at https://lccn.loc.gov/2016017318 Printed in United States of America 10 www.Ebook777.com To my wonderful wife, Jane, who is a Sigma∗ ∗ See the article, “Sigma”, in this book Free ebooks ==> www.Ebook777.com CONTENTS OTHER CONCEPTS COVERED IN THE ARTICLES WHY THIS BOOK IS NEEDED xi xix WHAT MAKES THIS BOOK UNIQUE? xxiii HOW TO USE THIS BOOK xxv ALPHA, 𝜶 ALPHA AND BETA ERRORS ALPHA, p, CRITICAL VALUE, AND TEST STATISTIC – HOW THEY WORK TOGETHER 14 ALTERNATIVE HYPOTHESIS 22 ANALYSIS OF MEANS (ANOM) 27 ANOVA – PART 1: WHAT IT DOES 32 ANOVA – PART 2: HOW IT DOES IT 36 ANOVA – PART 3: 1-WAY (AKA SINGLE FACTOR) 42 ANOVA – PART 4: 2-WAY (AKA 2-FACTOR) 48 ANOVA vs REGRESSION 55 vii www.Ebook777.com viii CONTENTS BINOMIAL DISTRIBUTION 62 CHARTS/GRAPHS/PLOTS – WHICH TO USE WHEN 69 CHI-SQUARE – THE TEST STATISTIC AND ITS DISTRIBUTIONS 76 CHI-SQUARE TEST FOR GOODNESS OF FIT 82 CHI-SQUARE TEST FOR INDEPENDENCE 89 CHI-SQUARE TEST FOR THE VARIANCE 98 CONFIDENCE INTERVALS – PART 1: GENERAL CONCEPTS 101 CONFIDENCE INTERVALS – PART 2: SOME SPECIFICS 108 CONTROL CHARTS – PART 1: GENERAL CONCEPTS AND PRINCIPLES 113 CONTROL CHARTS – PART 2: WHICH TO USE WHEN 119 CORRELATION – PART 124 CORRELATION – PART 129 CRITICAL VALUE 135 DEGREES OF FREEDOM 141 DESIGN OF EXPERIMENTS (DOE) – PART 146 DESIGN OF EXPERIMENTS (DOE) – PART 151 DESIGN OF EXPERIMENTS (DOE) – PART 158 DISTRIBUTIONS – PART 1: WHAT THEY ARE 165 DISTRIBUTIONS – PART 2: HOW THEY ARE USED 171 DISTRIBUTIONS – PART 3: WHICH TO USE WHEN 177 ERRORS – TYPES, USES, AND INTERRELATIONSHIPS 178 EXPONENTIAL DISTRIBUTION 184 F 189 FAIL TO REJECT THE NULL HYPOTHESIS 195 HYPERGEOMETRIC DISTRIBUTION 200 Free ebooks ==> www.Ebook777.com CONTENTS ix HYPOTHESIS TESTING – PART 1: OVERVIEW 202 HYPOTHESIS TESTING – PART 2: HOW TO 208 INFERENTIAL STATISTICS 212 MARGIN OF ERROR 220 NONPARAMETRIC 223 NORMAL DISTRIBUTION 230 NULL HYPOTHESIS 235 p, p-VALUE 241 p, t, AND F: “>” OR “ www.Ebook777.com OTHER CONCEPTS COVERED IN THE ARTICLES 1-Sided or 1-Tailed: see the articles Alternative Hypothesis and Alpha, 𝛼 1-Way: an analysis that has one Independent (x) Variable, e.g., 1-way ANOVA 2-Sided or 2-Tailed: see the articles Alternative Hypothesis and Alpha, 𝛼 2-Way: an analysis that has two Independent (x) Variables, e.g., 2-way ANOVA 68-95-99.7 Rule: same as the Empirical Rule See the article Normal Distribution Acceptance Region: see the article Alpha, 𝛼 Adjusted R2 : see the article r, Multiple R, r2 , R2 , R Square, R2 Adjusted aka: also known as Alias: see the article Design of Experiments (DOE) – Part Associated, Association: see the article Chi-Square Test for Independence Assumptions: requirements for being able to use a particular test or analysis For example, ANOM and ANOVA require approximately Normal data Attributes data, Attributes Variable: same as Categorical or Nominal data or Variable See the articles Variables and Chi-Square Test for Independence Autocorrelation: see the article Residuals Average Absolute Deviation: see the article Variance xi www.Ebook777.com xii OTHER CONCEPTS COVERED IN THE ARTICLES Average: same as the Mean – the sum of a set of numerical values divided by the Count of values in the set Bernoulli Trial: see the article Binomial Distribution Beta: the probability of a Beta Error See the article Alpha and Beta Errors Beta Error: featured in the article Alpha and Beta Errors Bias: see the article Sample, Sampling Bin, Binning: see the articles Chi-Square Test for Goodness of Fit and Charts/Graphs/Plots – Which to Use When Block, Blocking: see the article Design of Experiments (DOE) – Part Box Plot, Box and Whiskers Plot: see the article Charts/Graphs/Plots – Which to Use When Cm , Cp , Cr , or CPK : see the article Process Capability Analysis (PCA) Capability, Capability Index: see the article Process Capability Analysis (PCA) Categorical data, Categorical Variable: same as Attribute or Nominal data/Variable See the articles Variables and Chi-Square Test for Independence CDF: see Cumulative Density Function Central Limit Theorem: see the article Normal Distribution Central Location: same as Central Tendency See the article Distributions – Part 1: What They Are Central Tendency: same as Central Location See the article Distributions – Part 1: What They Are Chebyshev’s Theorem: see the article Standard Deviation Confidence Coefficient: same as Confidence Level See the article Alpha, 𝛼 Confidence Level: (aka Level of Confidence aka Confidence Coefficient) equals – Alpha See the article Alpha, 𝛼 Confounding: see the article Design of Experiments (DOE) – Part Contingency Table: see the article Chi-Square Test for Independence Continuous data or Variables: see the articles Variables and Distributions – Part 3: Which to Use When Control, “in ” or “out of ”: see the article Control Charts – Part 1: General Concepts and Principles Control Limits, Upper and Lower: see the article Control Charts – Part 1: General Concepts and Principles Free ebooks ==> www.Ebook777.com 405 VARIATION/VARIABILITY/DISPERSION/SPREAD Explanation Variation (also known as Variability, Dispersion, and Spread) is one of three major categories of measures describing a Distribution or data set A fifth synonym is “Scatter.” The other two categories are Central Tendency (Mean, Mode, Median) and Shape (Skew and Kurtosis) The Distribution can be of a Population, Process, Sample, or other data set Larger Variation Smaller Variation There are a number of different measures of Variation – each with its pros and cons Range Effect of Is defined very high by highest or very and lowest low values Identifies Clustering N around Mean Is in Units Y of the data Use Least useful in statistics Mean Abs Deviation (MAD) Handled Overly the same as emphasized other by Squaring values InterQuartile Variance Range (IQR) None Standard Deviation Somewhat disproportionate Y Y Y Y Y N Y Y In Box-andWhiskers Plot For calculating Standard Deviation Least Common Most Common This is not an exhaustive list www.Ebook777.com 406 VARIATION/VARIABILITY/DISPERSION/SPREAD Range: Range is simply the difference between the highest and lowest values It may be the least useful in statistics It only tells you about two values – out of the many which may be in the Distribution or dataset It tells you nothing of the values in between the highest and lowest values InterQuartile Range (IQR): The InterQuartile Range provides information on 50% of the data values, which is why it is also called the “middle 50.” It is the Range of the values around the Mean which comprise 50% of the total values The lower boundary value of the IQR box in the diagram below is called the 25th percentile, and the upper boundary value is called the 75th percentile This is because 25% (one quarter) of the Distribution’s values are below the lower limit of the IQR, and 25% are above the upper limit of the IQR The 50th percentile is, by definition, the Median The IQR is used to define Outliers and Extremes IQRs are often depicted via Boxplots – or Box-and-Whiskers Plots, such as the one below Outliers 1.5 Box Lengths Percentiles 25th 50th 75th Outliers 1.5 Box Lengths IQR Box 10 20 30 40 50 60 70 80 90 cm 100 IQR The box defines the boundaries of the “middle fifty.” The IQR (Box Length) in this example is 20 cm (50 – 30) The thickness (height) of the box is meaningless; it just serves to make the rectangular shape that differentiates the box from the “whiskers” to the left and right 1.5 box length is 30 cm Outliers and Extremes are any values outside 1.5 box lengths and box lengths, respectively, beyond from the 25th and 75th percentile The Box-and-Whiskers Plot is very useful for conveying a lot of information visually Showing several vertically oriented Boxplots together is a good way to compare the Variations of several data sets See the article Charts, Graphs, Plots – Which to Use When Variance: There is a separate article on Variance But briefly, it is the average of the squares of the distances of each data value from the Mean Its units are the square of the data units (e.g., square gallons, square degrees Centigrade, Free ebooks ==> www.Ebook777.com 407 VARIATION/VARIABILITY/DISPERSION/SPREAD etc) As a result it is not very useful by itself Its main use is as an interim step in calculating the Standard Deviation – which is its square root Mean Absolute Deviation (MAD): MAD is the average (unsquared) distance of the data points from the Mean It is useful when it is desirable to avoid emphasizing the effects of outliers But it is not very common It is in the same units as the data Standard Deviation: The Standard Deviation is the most commonly used measure of Variation It is the square root of the Variance As a result, it is in the same units as the data See the article Standard Deviation Distributions are often succinctly described by stating the Mean (for Central Tendency) and the Standard Deviation (for Variation) The Mean is the most common and most useful measure of Central Tendency, and the Standard Deviation is the same for Variation The two are often quoted together to portray a Distribution One reason for this is that the percentages of the values which fall within a given number of Standard Deviations from the Mean have been determined For a Normal Distribution, they can be stated very precisely However, for any Distribution, lower-bound estimates are known Percent of Values Found within this Number of Standard Deviations from the Mean Normal Distribution (Empirical Rule) All Distributions (Chebyshev's Theorem) 68.5% 95.5% 99.7% >75% >88.9% >93.7% Related Articles in This Book: Distributions – Part 1: What They Are; Variance; Standard Deviation; Charts, Graphs, Plots – Which to use When www.Ebook777.com WHICH STATISTICAL TOOL TO USE TO SOLVE SOME COMMON PROBLEMS There are similar “Which to Use When” articles for Charts/Graphs/Plots, for Control Charts, and for Distributions They can be found in this book alphabetically by the topic name EXPECTED FREQUENCIES vs OBSERVED COUNTS Problem/Question/Need Tool (article which describes it) Is our prediction of Expected percentages a good fit with the actual Observed data subsequently collected? For example, We predict the following allocation of customers at our bar by day of the week: M – Th 12.5% each:, Fri 30%, Sat 20% Chi-Square Test for Goodness of Fit (article by the same name) FITTING A FUNCTION (line or a curve) to DATA Problem/Question/Need What is the straight-line (y = bx + a) function (Model) that describes the relationship between one independent (Factor) Variable x and the dependent (Response) Variable y? For example, Total crop harvested as a function of acres planted What is the straight-line (y = b1 x1 + b2 x2 + + bn xn ) function (Model) that describes the relationship between multiple independent (Factor) Variables and the dependent (Response) Variable? For example, House price as a function of the number of bedrooms and bathrooms Tool (articles which describe it) First: Scatterplot and Correlation analysis to verify linear Correlation (Charts, Plots, and Graphs – Which to Use When; Correlation – Parts and 2) Then, Simple Linear Regression (Regression – Part 2: Simple Linear) First: Scatterplots and Correlation analyses to verify linear Correlation between each x Variable and the y Variable – and not between x variables (Charts, Plots, and Graphs – Which to Use When; Correlation – Parts and 2) Then, Multiple Linear Regression (Regression – Part 4: Multiple Linear) Statistics from A to Z: Confusing Concepts Clarified, First Edition Andrew A Jawlik © 2016 John Wiley & Sons, Inc Published 2016 by John Wiley & Sons, Inc 408 Free ebooks ==> www.Ebook777.com WHICH STATISTICAL TOOL TO USE TO SOLVE SOME COMMON PROBLEMS 409 FITTING A FUNCTION (line or a curve) to DATA (Continued) Tool Problem/Question/Need (articles which describe it) What is the nonlinear function Simple Nonlinear Regression (Model) that fits a curve y = f(x) (Regression – Part 5: Simple Nonlinear) to the data? How I validate a Regression Model? Not with the data used to produce it Design of Experiments (Design of A controlled experiment must be Experiments, DOE – Parts 1–3) used to test predictions from the Model with new data INDEPENDENCE of Categorical Variables Problem/Question/Need Tool (articles which describe it) Are the Proportions associated with categories of one Categorical Variable influenced by those of a second Categorical Variable? Chi-Square Test for For example, Is the preference for a “flavor of ice Independence cream” (which is the Categorical Variable with (article by the same name) values of “chocolate,” “strawberry,” and “vanilla”) influenced by gender (the Categorical Variable with values “male” and “female”)? MEANS – Measurement/Continuous data These all assume data which are roughly Normal For non-Normal data, use the Median Tool Problem/Question/Need (article which describes it) Is this Mean different from a specified Mean? For example: r Is our school’s average test score different from the national average? r Has the Mean of a measurement or a defect 1- Sample t-test rate in the Process has changed from its (t-tests – Parts and 2) historical value? r Does the Mean reduction in blood pressure meet or exceed the target for this new treatment? (Continued) www.Ebook777.com 410 WHICH STATISTICAL TOOL TO USE TO SOLVE SOME COMMON PROBLEMS MEANS – Measurement/Continuous data (Continued) These all assume data which are roughly Normal For non-Normal data, use the Median Tool Problem/Question/Need (article which describes it) Are these two Means different? For example: r Are our high school’s test scores different from another school’s? r Do these two treatments have different effects? 2-Sample t-test (t-tests – Parts and 2) Are these two Means different for the same Paired t-test subjects? (t-tests – Parts and 2) For example, Do individuals perform better after this new training than before? Is there a difference among several (more than two) Means, compared with each other? ANOVA For example, There are three types of training (ANOVA – Parts and 4) given to our workers Do they result in different effects on worker performance? Which of several Means are different from the Overall Mean? ANOM For example, Which of several production (ANOM) facilities does significantly better or worse than the others? MEDIANS For Non-Normal data, use Medians instead of Means (See the article “Nonparametric”) Problem/Question/Need Is this Median different from a specified Median? Independent Samples: Are these two Medians different? Paired Samples: Are these two Medians different? Nonparametric Test Wilcoxon Signed Rank Mann–Whitney Wilcoxon Signed Rank Variable: Is there a difference among several Medians? Kruskal–Wallis Variables: Is there a difference among several Medians? Friedman Free ebooks ==> www.Ebook777.com WHICH STATISTICAL TOOL TO USE TO SOLVE SOME COMMON PROBLEMS 411 PROPORTION Problem/Question/Need Confidence Interval estimate of a Proportion from Sample data Is there a difference between the Proportions from Populations or Processes? For example, 0.52 of women and 0.475 of men preferred Candidate A Is there a difference among the Proportions from or more Populations or Processes? Tool (article which describes it) z (Proportion) z (Proportion) Chi-Square Test for Independence (article by the same name) VARIATION Problem/Question/Need Is this Variance (or Standard Deviation) different from a Specified Variance (Standard Deviation)? For example, Has the Variation in our Process increased from the historical value? Are these two Variances different? For example, Two treatments have the same Mean effect The tie-breaker would be whether one had a significantly smaller Variance – that is, it was more consistent Tool (article which describes it) Chi-Square Test for the Variance (article by the same name) F-test (article: F) www.Ebook777.com Z Summary of Keys to Understanding z is a Test Statistic whose Probabilities not vary by Sample Size z has only one associated Distribution, the Standard Normal Distribution A value of z tells you – in units of Standard Deviations – how far away from the Mean an individual data value is For Normal or near-Normal Distributions, z can be used to solve problems like: – Given a value x, what is the Probability of exceeding x? or not exceeding x? – Conversely, given a Cumulative Probability, what value of x defines its boundary? z can also be used to calculate a Confidence Interval estimate of a Proportion For your convenience, here are the z scores for several common values of Alpha: 𝛼 0.025 0.05 0.1 0.9 0.95 0.975 z −1.960 −1.645 −1.282 1.282 1.645 1.960 For analyzing Means, z has several significant similarities to t and some key differences Use z only for large Samples (n > 30) and only when the Population Standard Deviation (𝝈) is known Otherwise, use t Statistics from A to Z: Confusing Concepts Clarified, First Edition Andrew A Jawlik © 2016 John Wiley & Sons, Inc Published 2016 by John Wiley & Sons, Inc 412 Free ebooks ==> www.Ebook777.com Z 413 Explanation z is a Test Statistic whose Probabilities not vary by Sample Size A Test Statistic is a Statistic (a property of a Sample) with a known Probability Distribution That is, for each value of the Test Statistic, we know the Probability of that value occurring z has several uses: r For data distributed roughly like a Normal Distribution, z can give you the Cumulative Probability associated with a given value for x Conversely, given a Cumulative Probability, z can give you the associated value of x r z can also be used in analyzing Means, but t is a better choice for that purpose r z can solve problems involving Proportions of 2-category Count data (See the article Proportion.) The common Test Statistics other than z – e.g t, F, and Chi-Square – have a different Distribution for each value of Degrees of Freedom (which is related to Sample Size) Unlike other Test Statistics, there is no “n” for Sample Size in the formulas for z So z does not vary with Degrees of Freedom This is because z has only one associated Distribution, the Standard Normal Distribution A Normal Distribution is the familiar bell-shaped curve The Probabilities of many properties of Populations and Processes approximate the Normal Distribution Point Probability The z Distribution the Standard Normal Distribution (Mean = 0, Standard Deviation = 1) 0.4 0.3 0.2 0.1 –3 –2 –1 z, in units of Std Dev www.Ebook777.com 414 Z The Standard Normal Distribution is an idealized version – the Normal Distribution with a Mean of zero and a Standard Deviation of This is the Distribution for z z is the Variable along the Horizontal axis in the Standard Normal Distribution Point Probability As for any other Distribution, the vertical axis is the Point Probability So, for any value of z (any point on the horizontal axis), its Point Probability (the Probability of it occurring) is the height of the curve above z In the diagram above, we can see that the Point Probability of z = is about 0.4 Cumulative Probability A Cumulative Probability is the total of the Point Probabilities for a range of values It is represented by the area under the curve of the range For example, we can see that Cumulative Probability of the range z > is 50% – half the area under the curve Here are useful Cumulative Probabilities for the Standard Normal Distribution: –3 –2 –1 Standard Deviations 68% 95% 99.7% We see that values of z within plus or minus Standard Deviation of the Mean (1𝜎) occur 68% of the time Within 2𝜎, it’s 95%; and 99.7% for 3𝜎 These percentages were straightforward to calculate for the idealized z-Distribution But they can be used for every Normal Distribution, because of the Empirical Rule: Free ebooks ==> www.Ebook777.com Z 415 Empirical Rule (aka the 68, 95, 99.7 Rule): Given a value – expressed in Standard Deviations – on the horizontal axis of any Normal Distribution, the Probability of that value (vertical axis) is the same for every Normal Distribution Next, we’ll see exactly how we use this to find Probabilities for x values in any Normal or near-Normal Distribution A value of z tells you – in units of Standard Deviations, – how far away from the Mean an individual data value, x, is What does an individual value of z tell me? A value of z is often called a “z-score” – It tells you how far away the corresponding data point (x) is from the Mean – It tells you this in units of Standard Deviations Example: The height of adult males in a Population has a Mean, 𝜇, of 175 cm and a Standard Deviation, 𝜎, of cm We use the formula, z = (x − 𝝁) / 𝝈 to convert x values into z values (aka z scores) For x = 168 cm, z = (168 − 175)/7 = −1; for x = 175 cm, z = (175 − 175)/7 = 0; for 189 cm, z = (189 − 175)/7 = +2 value of x 154 161 168 175 182 189 196 value of z –3 –2 –1 Height in cm Distance from the mean in Std Dev For Normal or near-Normal Distributions, z can be used to solve problems like: – Given a value x, what is the Probability of exceeding x? or not exceeding x? www.Ebook777.com 416 Z – Conversely, given a Cumulative Probability, what value of x defines its boundary? z can also be used to calculate a Confidence Interval estimate of a Proportion Examples of the kind of problems z can be used to solve: The Mean lifetime for a brand of light bulb is 1000 hours, with a Standard Deviation of 100 hours What percentage (Cumulative Probability) of light bulbs can be expected to burn out before 900 hours (x)? 0.005 0.004 0.003 0.002 15.9% 0.001 0 500 1000 1500 Using z, we can find in a table or via a spreadsheet that the percentage = 15.9% This percentage – of the total area under the curve – is represented by the shaded area left of x = 900 For the same brand of light bulbs, how many hours (x) can we expect 90% (𝜶) of the light bulbs to exceed? 0.005 0.004 0.003 0.002 90% 0.001 0 500 1000 Using z, we find that x = 872 hours 1500 2000 Free ebooks ==> www.Ebook777.com Z 417 The shaded area to the right of 872 hours comprises 90% of the area under the curve How to it: To find a Cumulative Probability for a given x, we Convert x to its z value Find the Cumulative Probability for that z on the Standard Normal Distribution To find x, given a Cumulative Probability, we Find the z value corresponding to that Probability on the Standard Normal Distribution Convert the z to the corresponding x value It may help to depict these steps graphically: If you know x and want to find its Cumulative Probability: x Step Step Convert to z z = (x – μ) / 𝝈 or z = (x – x)/s Find α via table lookup or software z Cumulative Probability If you know a Cumulative Probability and want to find the corresponding x: Cumulative Probability Step Step Find z via table lookup or software Convert to x x = 𝝈z + μ or x = sz + x z x z can also be used to calculate a Confidence Interval estimate of a Proportion This is explained in the article Proportion For analyzing Means, z has several significant similarities to t and some key differences Use z only for large Samples (n > 30) and only when the Population Standard Deviation (𝝈) is known Otherwise, use t www.Ebook777.com 418 Z z It is a Probability Distribution(s) Mean = Mode = Median Formula t Test Statistic Bell-shaped, Symmetrical, Never touches the horizontal axis for each value of df Yes, and they all = √ z = (̄x − 𝝁) / 𝝈 t = (̄x − 𝝁) / (s/ n) Varies with Sample Size Accuracy No less Yes more Standard Deviation =1 Always > Note that this formula for calculating z from Sample data for use as a Test Statistic is different from the formula for z-score shown earlier z is the simplest Test Statistic, and so it is useful to start with it when learning the concept But it has some significant limitations, due to the fact that it does not take Sample Size into account There is a much larger risk for errors when a Sample Size of is used vs 500, but z treats them equally z should NOT be used When the Population is not Normal or near-Normal When the Population Standard Deviation (Sigma, 𝝈) is not known When the Sample Size is small – some say when n < 30, others say when n www.Ebook777.com HOW TO FIND CONCEPTS IN THIS BOOK This book is alphabetically organized, like a dictionary or an encyclopedia, so an index is not needed Readers can quickly find articles on statistical concepts by flipping through the book like a dictionary or a mini-encyclopedia The Contents at the beginning of the book lists all the articles on the major concepts Immediately following the Contents is a Section called “Other Concepts Covered in the Articles.” This lists additional concepts and statistical terms which not headline an article For example, Acceptance Region: See the article Alpha, 𝛼 www.Ebook777.com ... Capability Analysis (PCA) Capability, Capability Index: see the article Process Capability Analysis (PCA) Categorical data, Categorical Variable: same as Attribute or Nominal data/Variable See the articles... require approximately Normal data Attributes data, Attributes Variable: same as Categorical or Nominal data or Variable See the articles Variables and Chi-Square Test for Independence Autocorrelation:... ANALYSIS OF MEANS (ANOM) 27 ANOVA – PART 1: WHAT IT DOES 32 ANOVA – PART 2: HOW IT DOES IT 36 ANOVA – PART 3: 1-WAY (AKA SINGLE FACTOR) 42 ANOVA – PART 4: 2-WAY (AKA 2-FACTOR) 48 ANOVA vs REGRESSION

Định dạng
Số trang	441
Dung lượng	16,05 MB