CHAPTER 9 The Normal Distribution Introduction A branch of mathematics that uses probability is called statistics. Statistics is the branch of mathematics that uses observations and measurements called data to analyze, summarize, make inferences, and draw conclusions based on the data gathered. This chapter will explain some basic concepts of statistics such as measures of average and measures of variation. Finally, the relationship between probability and normal distribution will be explained in the last two sections. 147 Copyright © 2005 by The McGraw-Hill Companies, Inc. Click here for terms of use. Measures of Average There are three statistical measures that are commonly used for average. They are the mean, median, and mode. The mean is found by adding the data values and dividing by the number of values. EXAMPLE: Find the mean of 18, 24, 16, 15, and 12. SOLUTION: Add the values: 18 þ 24 þ 16 þ 15 þ 12 ¼ 85 Divide by the number of values, 5: 85 Ä 5 ¼ 17 Hence the mean is 17. EXAMPLE: The ages of 6 executives are 48, 56, 42, 52, 53 and 52. Find the mean. SOLUTION: Add: 48 þ 56 þ 42 þ 52 þ 53 þ 52 ¼ 303 Divide by 6: 303 Ä 6 ¼ 50.5 Hence the mean age is 50.5. The median is the middle data value if there is an odd number of data values or the number halfway between the two data values at the center, if there is an even number of data values, when the data values are arranged in order. EXAMPLE: Find the median of 18, 24, 16, 15, and 12. SOLUTION: Arrange the data in order: 12, 15, 16, 18, 24 Find the middle value: 12, 15, 16, 18, 24 The median is 16. EXAMPLE: Find the median of the number of minutes 10 people had to wait in a checkout line at a local supermarket: 3, 0, 8, 2, 5, 6, 1, 4, 1, and 0. SOLUTION: Arrange the data in order: 0, 0, 1, 1, 2, 3, 4, 5, 6, 8 The middle falls between 2 and 3; hence, the median is (2 þ 3) Ä 2 ¼ 2.5. CHAPTER 9 The Normal Distribution 148 The third measure of average is called the mode. The mode is the data value that occurs most frequently. EXAMPLE: Find the mode for 22, 27, 30, 42, 16, 30, and 18. SOLUTION: Since 30 occurs twice and more frequently than any other value, the mode is 30. EXAMPLE: Find the mode for 2, 3, 3, 3, 4, 4, 6, 6, 6, 8, 9, and 10. SOLUTION: In this example, 3 and 6 occur most often; hence, 3 and 6 are used as the mode. In this case, we say that the distribution is bimodal. EXAMPLE: Find the mode for 18, 24, 16, 15, and 12. SOLUTION: Since no value occurs more than any other value, there is no mode. A distribution can have one mode, more than one mode, or no mode. Also, the mean, median, and mode for a set of values most often differ somewhat. PRACTICE 1. Find the mean, median, and mode for the number of sick days nine employees used last year. The data are 3, 6, 8, 2, 0, 5, 7, 8, and 5. 2. Find the mean, median, and mode for the number of rooms seven hotels in a large city have. The data are 332, 256, 300, 275, 216, 314, and 192. 3. Find the mean, median, and mode for the number of tornadoes that occurred in a specific state over the last 5 years. The data are 18, 6, 3, 9, and 10. 4. Find the mean, median, and mode for the number of items 9 people purchased at the express checkout register. The data are 12, 8, 6, 1, 5, 4, 6, 2, and 6. 5. Find the mean, median, and mode for the ages of 10 children who participated in a field trip to the zoo. The ages are 7, 12, 11, 11, 5, 8, 11, 7, 8, and 6. CHAPTER 9 The Normal Distribution 149 ANSWERS 1. Mean ¼ 3 þ 6 þ 8 þ 2 þ 0 þ 5 þ Normal Distribution (Pinkie Length) Normal Distribution (Pinkie Length) By: OpenStaxCollege Normal Distribution (Pinkie Length) Class Time: Names: Student Learning Outcomes • The student will compare empirical data and a theoretical distribution to determine if data from the experiment follow a continuous distribution Collect the Data Measure the length of your pinky finger (in centimeters) Randomly survey 30 adults for their pinky finger lengths Round the lengths to the nearest 0.5 cm _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Construct a histogram Make five to six intervals Sketch the graph using a ruler and pencil Scale the axes 1/3 Normal Distribution (Pinkie Length) Calculate the following ¯ x = _ s = _ Draw a smooth curve through the top of the bars of the histogram Write one to two complete sentences to describe the general shape of the curve (Keep it simple Does the graph go straight across, does it have a v-shape, does it have a hump in the middle or at either end, and so on?) Analyze the Distribution Using your sample mean, sample standard deviation, and histogram, what was the approximate theoretical distribution of the data you collected? • X ~ _( _, _) • How does the histogram help you arrive at the approximate distribution? Describe the Data Using the data you collected complete the following statements (Hint: order the data) Remember (IQR = Q3 – Q1) • • • • • IQR = _ The 15th percentile is _ The 85th percentile is _ Median is _ What is the theoretical probability that a randomly chosen pinky length is more than 6.5 cm? • Explain the meaning of the 85th percentile of this data Theoretical Distribution Using the theoretical distribution, complete the following statements Use a normal approximation based on the sample mean and standard deviation 2/3 Normal Distribution (Pinkie Length) • • • • • IQR = _ The 15th percentile is _ The 85th percentile is _ Median is _ What is the theoretical probability that a randomly chosen pinky length is more than 6.5 cm? • Explain the meaning of the 85th percentile of this data Discussion QuestionsDo the data you collected give a close approximation to the theoretical distribution? In complete sentences and comparing the results in the sections titled Describe the Data and Theoretical Distribution, explain why or why not 3/3 Applied Econometrics Normal Distribution 1 Applied Econometrics Lecture 1: Normal Distribution For many random variables, the probability distribution is a specific bell-shaped curve, called the normal curve, or Gaussian curve. This is the most common and useful distribution in statistics. 1) Standard normal distribution The standard normal distribution has the probability density function as follows: e z 2π 1 P(z)Y 2 2 1 − == Features of the curve are: 1) z 2 increases in the negative exponent. Therefore, P(z) decreases, approaching 0 symmetrically in both tails. 2) The mean, which is zero (μ = 0), is the balancing point or the center of symmetry. 3) The standard deviation is one (σ = 1) Example 1.1: If z has a standard normal distribution, find: P(-2<z<2) 1 Solution: P(-2<z<2) = 1 – P(z<-2) – P(z>2) = 1 – 2. (0.023) = 0.954 2) General normal distribution The general normal distribution has the probability density function as follows: e σ μX 2πσ 1 Y 2 2 1 ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − = − The quantity Y, which is the height of the curve at any point along the scale of X, is known as the probability density of that particular value of the variable quantity, X. Example 2.1: The local authorities in a certain city install 2,000 electricity lamps in the streets of the city. If these lamps have an average life of 1,000 burning hours, with a standard deviation of 200 hours, what number of the lamps might be expected to fail in the first 700 burning hours? 1 If z is continuous, P(z≥c) = P(X>c). In other words, ≥ and > can be used interchangeably for any continuous random variable. Written by Nguyen Hoang Bao May 17, 2004 Applied Econometrics Normal Distribution 2 Solution: In this case, we want to find the probability corresponding to the area of the probability curve below t = [(700-1000)/200] = -1.5. We ignore the sign and enter our table at 1.5 to find that the probability for lives less than 700 hours is P = 0.067. Hence the expected number of failures will be 2,000 x 0.067 = 134. Example 2.2: What number of lamps may be expected to fail between 900 and 1,300 burning hours? Solution: z The number of lamps, which will fail under 900 hours: The corresponding value of t = [(900 – 1000)/200] = -0.5. Entering the table with this value of t, we find for the probability of failure below 900 hours: P = 0.309. z The number of lamps, which will fail over 1,300: The corresponding value of t = [(1,300 – 1,000)/200] = 1.5. Entering the table with this value of t, we find for the probability of failure over 1,300 hours: P = 0.067. z Hence the probability of failure outside the limits 900 to 1,300 hours will be 0.376 (0.309+0.067 = 0.376). It follows that the number of lamps we may expect to fail outside these limits is: 2,000 x 0.376 = 752. But we were asked to find the number, which are likely to fail inside the limits stated. This is 2,000 – 752 = 1,248. Example 2.3: After what period of burning hours would you expect that 10% of the lamps would have failed? Solution: What we want here is the value of t corresponding to a probability P = 0.1. Looking along our table, we find that when t = 1.25 the probability is P = 0.106. This is near enough for our purpose of prediction. Hence we may take it that 10% of the lamps will fail at 1.25 standard deviations. Since one standard deviation is equal to 200 hours, it follows that 10% of the lamps will fail before 1,000 – 1.25 x (200) = 1,000 – 250 = 750 hours. 3) Moment-based characteristics of a distribution First moment Mean > Median: the distribution is skewed to the right Mean ≅ Median ≅ Mode: the distribution is symmetrically distributed Mean < Median: the distribution is skewed to the left Written by Nguyen Hoang Bao May 17, 2004 Open Access Available online http://arthritis-research.com/content/11/3/R85 Page 1 of 13 (page number not for citation purposes) Vol 11 No 3 Research article Mesenchymal progenitor cell markers in human articular cartilage: normal distribution and changes in osteoarthritis Shawn P Grogan 1,2 , Shigeru Miyaki 1 , Hiroshi Asahara 1 , Darryl D D'Lima 1,2 and Martin K Lotz 1 1 Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California, 92037, USA 2 Shiley Center for Orthopaedic Research and Education at Scripps Clinic, 11025 North Torrey Pines Road, Suite 140, La Jolla, California, 92037, USA Corresponding author: Martin K Lotz, mlotz@scripps.edu Received: 24 Feb 2009 Revisions requested: 1 Apr 2009 Revisions received: 7 May 2009 Accepted: 5 Jun 2009 Published: 5 Jun 2009 Arthritis Research & Therapy 2009, 11:R85 (doi:10.1186/ar2719) This article is online at: http://arthritis-research.com/content/11/3/R85 © 2009 Grogan et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Introduction Recent findings suggest that articular cartilage contains mesenchymal progenitor cells. The aim of this study was to examine the distribution of stem cell markers (Notch-1, Stro-1 and VCAM-1) and of molecules that modulate progenitor differentiation (Notch-1 and Sox9) in normal adult human articular cartilage and in osteoarthritis (OA) cartilage. Methods Expression of the markers was analyzed by immunohistochemistry (IHC) and flow cytometry. Hoechst 33342 dye was used to identify and sort the cartilage side population (SP). Multilineage differentiation assays including chondrogenesis, osteogenesis and adipogenesis were performed on SP and non-SP (NSP) cells. Results A surprisingly high number (>45%) of cells were positive for Notch-1, Stro-1 and VCAM-1 throughout normal cartilage. Expression of these markers was higher in the superficial zone (SZ) of normal cartilage as compared to the middle zone (MZ) and deep zone (DZ). Non-fibrillated OA cartilage SZ showed reduced Notch-1 and Sox9 staining frequency, while Notch-1, Stro-1 and VCAM-1 positive cells were increased in the MZ. Most cells in OA clusters were positive for each molecule tested. The frequency of SP cells in cartilage was 0.14 ± 0.05% and no difference was found between normal and OA. SP cells displayed chondrogenic and osteogenic but not adipogenic differentiation potential. Conclusions These results show a surprisingly high number of cells that express putative progenitor cell markers in human cartilage. In contrast, the percentage of SP cells is much lower and within the range of expected stem cell frequency. Thus, markers such as Notch-1, Stro-1 or VCAM-1 may not be useful to identify progenitors in cartilage. Instead, their increased expression in OA cartilage implicates involvement in the abnormal cell activation and differentiation process characteristic of OA. Introduction The limited repair capacity of adult articular cartilage repre- sents one factor involved in the development of progressive cartilage degeneration and osteoarthritis (OA) following carti- lage injury. This notion was previously related to the absence of an inflammatory response, the putative absence and lack of access to stem cells in cartilage [1,2], and intrinsic limitations of adult human articular chondrocytes (AHAC) to repair tissue damage [3]. Yet, when cultured under appropriate conditions, cells isolated from cartilage can be induced to form cartilage- like tissue in vitro [4] and monolayer-expanded AHAC can form hyaline-like tissue when implanted into cartilage defects in vivo [5]. ABCG2: ATP-binding cassette, sub-family G; AHAC: adult human Original article Optimum truncation points for independent culling level selection on a multivariate normal distribution, with an application to dairy cattle selection V. Ducrocq J.J. Colleau Institut National de la Recherche Agronomique, Station de Génétique Quantitative et Appliquée, Centre de Recherches de Jouy-en-Josas, 78350 Jouy-en-Josas, France (received 1 March 1988, accepted 19 September 1988) Summary — Independent culling level selection is often practiced in breeding programs because extreme animals for some particular traits are rejected by breeders or because records on which genetic evaluation is based are collected sequentially. Optimizing these selection procedures for a given overall breeding objective is equivalent to finding the combination of truncation thresholds or culling levels which maximizes the expected value of the overall genetic value for selected animals. A general Newton-type algorithm has been derived to perform this maximization for any number of normally distributed traits and when the overall probability of being selected is fixed. Using a power- ful method for the computation of multivariate normal probability integrals, it has been possible to undertake the numerical calculation of the optimal truncation points when up to 6 correlated traits or stages of selection are considered simultaneously. The extension of this algorithm to the more com- plex situation of maximizing annual genetic response subject to nonlinear constraints is demonstra- ted using a dairy cattle model involving milk production and a secondary trait such as type. Conside- ration is given to three of the four pathways of selection: dams of bulls; sires of bulls; and sires of cows. Independent culling level selection - dairy cattle - multistage selection - genetic galn - multivariate normal distrlbution Résumé — Seuils de troncature optimaux lors d’une sélection à niveaux Indépendants sur une distribution multlnormale, avec une application à la sélection chez les bovins laitiers. Une sélection à niveaux indépendants est souvent pratiquée dans les programmes génétiques, parce que les animaux extrèmes pour certains caractères sont rejetés, ou parce que les données qui servent à l’évaluation génétique des animaux sont recueillies séquentiellement. L’optimisation, pour un objectif donné, de ces règles de sélection équivaut à la recherche des seuils de troncature qui maximisent l’espérance de I objectif de sélection pour les animaux retenus. Un algorithme géné- ral de type Newton est établi pour effectuer cette maximisation pour un nombre quelconque de caractères distribués selon une loi mulünormale et lorsque la probabilité finale d’être retenu est bxée. A partir d’une méthode puissante de calcul d’intégrales de lois multinormales, il a été possible d’entreprendre numériquement le calcul des seuils de troncature quand jusqu’à 6 caractères ou étapes de sélection corrélés sont considérés simultanément. L’extension de cet algorithme à des situations plus complexes, comme la maximisation du progrès génétique annuel sous plusieurs contraintes non linéaires, est illustrée à travers le calcul de règles optimales de sélection des mères à taureau, pères à taureau et pères de service pour la production laitière et pour un caractère secondaire tel que le pointage laitier dans un schéma de sélection typique des bovins laitiers. sélection à niveaux Indépendants - bovins laltlers - sélectlon par étapes - progrès génétique - distribution multlnormsle Introduction It is ... Distribution Using the theoretical distribution, complete the following statements Use a normal approximation based on the sample mean and standard deviation 2/3 Normal Distribution (Pinkie Length). . .Normal Distribution (Pinkie Length) Calculate the following ¯ x = _ s = _ Draw a smooth curve through the... at either end, and so on?) Analyze the Distribution Using your sample mean, sample standard deviation, and histogram, what was the approximate theoretical distribution of the data you collected?