Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 19 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
19
Dung lượng
116,12 KB
Nội dung
1. Exploratory Data Analysis 1.3. EDA Techniques 1.3.6. Probability Distributions 1.3.6.2.Related Distributions Probability distributions are typically defined in terms of the probability density function. However, there are a number of probability functions used in applications. Probability Density Function For a continuous function, the probability density function (pdf) is the probability that the variate has the value x. Since for continuous distributions the probability at a single point is zero, this is often expressed in terms of an integral between two points. For a discrete distribution, the pdf is the probability that the variate takes the value x. The following is the plot of the normal probability density function. 1.3.6.2. Related Distributions http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm (1 of 8) [5/1/2006 9:57:51 AM] Cumulative Distribution Function The cumulative distribution function (cdf) is the probability that the variable takes a value less than or equal to x. That is For a continuous distribution, this can be expressed mathematically as For a discrete distribution, the cdf can be expressed as The following is the plot of the normal cumulative distribution function. 1.3.6.2. Related Distributions http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm (2 of 8) [5/1/2006 9:57:51 AM] The horizontal axis is the allowable domain for the given probability function. Since the vertical axis is a probability, it must fall between zero and one. It increases from zero to one as we go from left to right on the horizontal axis. Percent Point Function The percent point function (ppf) is the inverse of the cumulative distribution function. For this reason, the percent point function is also commonly referred to as the inverse distribution function. That is, for a distribution function we calculate the probability that the variable is less than or equal to x for a given x. For the percent point function, we start with the probability and compute the corresponding x for the cumulative distribution. Mathematically, this can be expressed as or alternatively The following is the plot of the normal percent point function. 1.3.6.2. Related Distributions http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm (3 of 8) [5/1/2006 9:57:51 AM] Since the horizontal axis is a probability, it goes from zero to one. The vertical axis goes from the smallest to the largest value of the cumulative distribution function. Hazard Function The hazard function is the ratio of the probability density function to the survival function, S(x). The following is the plot of the normal distribution hazard function. 1.3.6.2. Related Distributions http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm (4 of 8) [5/1/2006 9:57:51 AM] Hazard plots are most commonly used in reliability applications. Note that Johnson, Kotz, and Balakrishnan refer to this as the conditional failure density function rather than the hazard function. Cumulative Hazard Function The cumulative hazard function is the integral of the hazard function. It can be interpreted as the probability of failure at time x given survival until time x. This can alternatively be expressed as The following is the plot of the normal cumulative hazard function. 1.3.6.2. Related Distributions http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm (5 of 8) [5/1/2006 9:57:51 AM] Cumulative hazard plots are most commonly used in reliability applications. Note that Johnson, Kotz, and Balakrishnan refer to this as the hazard function rather than the cumulative hazard function. Survival Function Survival functions are most often used in reliability and related fields. The survival function is the probability that the variate takes a value greater than x. The following is the plot of the normal distribution survival function. 1.3.6.2. Related Distributions http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm (6 of 8) [5/1/2006 9:57:51 AM] For a survival function, the y value on the graph starts at 1 and monotonically decreases to zero. The survival function should be compared to the cumulative distribution function. Inverse Survival Function Just as the percent point function is the inverse of the cumulative distribution function, the survival function also has an inverse function. The inverse survival function can be defined in terms of the percent point function. The following is the plot of the normal distribution inverse survival function. 1.3.6.2. Related Distributions http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm (7 of 8) [5/1/2006 9:57:51 AM] As with the percent point function, the horizontal axis is a probability. Therefore the horizontal axis goes from 0 to 1 regardless of the particular distribution. The appearance is similar to the percent point function. However, instead of going from the smallest to the largest value on the vertical axis, it goes from the largest to the smallest value. 1.3.6.2. Related Distributions http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm (8 of 8) [5/1/2006 9:57:51 AM] The Weibull distribution has a relatively simple distributional form. However, the shape parameter allows the Weibull to assume a wide variety of shapes. This combination of simplicity and flexibility in the shape of the Weibull distribution has made it an effective distributional model in reliability applications. This ability to model a wide variety of distributional shapes using a relatively simple distributional form is possible with many other distributional families as well. PPCC Plots The PPCC plot is an effective graphical tool for selecting the member of a distributional family with a single shape parameter that best fits a given set of data. 1.3.6.3. Families of Distributions http://www.itl.nist.gov/div898/handbook/eda/section3/eda363.htm (2 of 2) [5/1/2006 9:57:52 AM] Location Parameter The next plot shows the probability density function for a normal distribution with a location parameter of 10 and a scale parameter of 1. The effect of the location parameter is to translate the graph, relative to the standard normal distribution, 10 units to the right on the horizontal axis. A location parameter of -10 would have shifted the graph 10 units to the left on the horizontal axis. That is, a location parameter simply shifts the graph left or right on the horizontal axis. Scale Parameter The next plot has a scale parameter of 3 (and a location parameter of zero). The effect of the scale parameter is to stretch out the graph. The maximum y value is approximately 0.13 as opposed 0.4 in the previous graphs. The y value, i.e., the vertical axis value, approaches zero at about (+/-) 9 as opposed to (+/-) 3 with the first graph. 1.3.6.4. Location and Scale Parameters http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm (2 of 5) [5/1/2006 9:57:52 AM] [...]... F((x-a)/b;0 ,1) f(x;a,b) = (1/ b)f((x-a)/b;0 ,1) G( ;a,b) = a + bG( ;0 ,1) h(x;a,b) = (1/ b)h((x-a)/b;0 ,1) H(x;a,b) = H((x-a)/b;0 ,1) S(x;a,b) = S((x-a)/b;0 ,1) Z( ;a,b) = a + bZ( ;0 ,1) Y(a,b) = a + bY(0 ,1) http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda364.htm (5 of 5) [5 /1/ 2006 9:57:52 AM] 1. 3.6.5 .1 Method of Moments 1 Exploratory Data Analysis 1. 3 EDA Techniques 1. 3.6 Probability Distributions 1. 3.6.5... Dataplot supports MLE for a limited number of distributions http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda3652.htm (3 of 3) [5 /1/ 2006 9:57:53 AM] 1. 3.6.5.4 PPCC and Probability Plots 1 Exploratory Data Analysis 1. 3 EDA Techniques 1. 3.6 Probability Distributions 1. 3.6.5 Estimating the Parameters of a Distribution 1. 3.6.5.4 PPCC and Probability Plots PPCC and Probability Plots The PPCC plot can be... http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda364.htm (3 of 5) [5 /1/ 2006 9:57:52 AM] 1. 3.6.4 Location and Scale Parameters parameter of 1 leaves the pdf unchanged (if the scale parameter is 1 to begin with) and non-positive scale parameters are not allowed Location and Scale Together The following graph shows the effect of both a location and a scale parameter The plot has been shifted right 10 units and stretched.. .1. 3.6.4 Location and Scale Parameters In contrast, the next graph has a scale parameter of 1/ 3 (=0.333) The effect of this scale parameter is to squeeze the pdf That is, the maximum y value is approximately 1. 2 as opposed to 0.4 and the y value is near zero at (+/-) 1 as opposed to (+/-) 3 The effect of a scale parameter greater than... However, when utilized, the method of moment formulas tend to be straightforward and can be easily implemented in most statistical software programs http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda36 51. htm [5 /1/ 2006 9:57:52 AM] 1. 3.6.5.2 Maximum Likelihood q generate confidence bounds and hypothesis tests for the parameters Several popular statistical software packages provide excellent algorithms... formulas for converting from the standard form to the form with other location and scale parameters These formulas are independent of the particular probability distribution http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda364.htm (4 of 5) [5 /1/ 2006 9:57:52 AM] 1. 3.6.4 Location and Scale Parameters Formulas for Location and Scale Based on the Standard Form The following are the formulas for computing... Comparing the maximum correlation coefficient achieved for each distribution can help in selecting which is the best distribution to use http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda3654.htm (1 of 2) [5 /1/ 2006 9:57:53 AM] 1. 3.6.5.4 PPCC and Probability Plots Disadvantages The disadvantages of this method are: q It is limited to distributions with a single shape parameter q PPCC plots are... can be applied to many different MLE problems The drawback is that you have to specify the maximum likelihood equations to the software As http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda3652.htm (2 of 3) [5 /1/ 2006 9:57:53 AM] 1. 3.6.5.2 Maximum Likelihood the functions can be non-trivial, there is potential for error in entering the equations The advantage of the specific MLE procedures is that... reliability applications, the hazard plot and the Weibull plot are alternative graphical methods that are commonly used to estimate parameters http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda3654.htm (2 of 2) [5 /1/ 2006 9:57:53 AM] 1. 3.6.6 Gallery of Distributions t Distribution F Distribution Chi-Square Distribution Exponential Distribution Weibull Distribution Lognormal Distribution Fatigue Life... http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda364.htm (5 of 5) [5 /1/ 2006 9:57:52 AM] 1. 3.6.5 .1 Method of Moments 1 Exploratory Data Analysis 1. 3 EDA Techniques 1. 3.6 Probability Distributions 1. 3.6.5 Estimating the Parameters of a Distribution 1. 3.6.5 .1 Method of Moments Method of Moments The method of moments equates sample moments to parameter estimates When moment methods are available, they have the advantage of simplicity The disadvantage is . F(x;a,b) = F((x-a)/b;0 ,1) Probability Density Function f(x;a,b) = (1/ b)f((x-a)/b;0 ,1) Percent Point Function G( ;a,b) = a + bG( ;0 ,1) Hazard Function h(x;a,b) = (1/ b)h((x-a)/b;0 ,1) Cumulative Hazard. distributions. 1. 3.6.4. Location and Scale Parameters http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda364.htm (5 of 5) [5 /1/ 2006 9:57:52 AM] 1. Exploratory Data Analysis 1. 3. EDA Techniques 1. 3.6 distributions. 1. 3.6.5.2. Maximum Likelihood http://www.itl.nist.gov/div898 /handbook/ eda/section3/eda3652.htm (3 of 3) [5 /1/ 2006 9:57:53 AM] 1. Exploratory Data Analysis 1. 3. EDA Techniques 1. 3.6. Probability