Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 16 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
16
Dung lượng
153 KB
Nội dung
Abstract Number : 003-0287 Title: g-Correlation – A Six Sigma tool to discover the correlation between variables Name of the Conference: Sixteenth Annual Conference of POMS, Chicago, IL, April 29 - May 2, 2005 Authors: Author Name : Sarika Tyagi Institution : Laboratory for Responsible Manufacturing Address : 334 SN, Department of MIE Northeastern University 360 Huntington Ave Boston, MA 02115 USA E-mail : sarikatyagi@gmail.com Phone : 617-373-7635 Fax : 617-373-2921 Author Name : Prof Sagar Kamarthi Institution : Laboratory for Responsible Manufacturing Address : 334 SN, Department of MIE Northeastern University 360 Huntington Ave Boston, MA 02115 USA E-mail : sagar@coe.neu.edu Phone : 617-373-3070 Fax : 617-373-2921 Author Name : Srikanth Vadde Institution : Laboratory for Responsible Manufacturing Address : 334 SN, Department of MIE Northeastern University 360 Huntington Ave Boston, MA 02115 USA E-mail : srikant@coe.neu.edu Phone : 617-373-7635 Fax : 617-373-2921 Abstract In Six Sigma it is important to discover the interconnections between the variables involved in the problem being examined at the Analyze phase of the DMAIC methodology Understanding the strength of the correlation between pairs of variables would be helpful for quality personnel in deciding the amount of attention that each factor needs to be given in order to improve the sigma level of the process Normally, Pearson correlation coefficient (which assumes variables to be normally distributed or to have linear/parametric relationship) and Spearman correlation coefficient (which deals with the variables that are measured on ordinal scale or that have non-parametric relationship) are employed to determine the correlation between a given pair of variables Similarly, Fechner Correlation coefficient, to a lesser extent, is also used to calculate the correlation between two monotonously related variables This paper presents a novel correlation called g-Correlation which is more general and less restrictive The effectiveness of gCorrelation is demonstrated on two representative examples What is Six Sigma The basic foundation of Six Sigma is based on the concept of normal curve, introduced by Carl Frederick Gauss in 1800s [1] Six Sigma is a thoroughly well-structured and a datadriven methodology for process improvement It targets to eliminate a broad range of defects, quality problems and anything that doesn’t add value [2] It also aims at improving customer satisfaction by considering their needs and requirements while maintaining the profitability and competitiveness It is being widely used in a variety of fields – manufacturing, service delivery, management, and other business operations Six Sigma methodologies integrate principles of business, statistics and engineering to achieve tangible results This methodology combines well-established statistical quality control techniques, simple and advanced data analysis methods, and the systematic training of all personnel at every level in the organization The Six Sigma methodology and management strategies together provide an overall framework for organizing company wide quality control efforts There are numerous Six Sigma success stories in US-based as well as international companies Six Sigma is not defined with a single view in mind [3] It is perceived as a (1) vision to improve the existing processes to or design completely new processes keeping customers’ requirements in mind; (2) philosophy that emphasizes reduction in variation, customer-focus and data driven decisions; (3) symbol for quality improvement; (4) metric that is derived from data collection and data analysis; (5) goal of achieving a Six Sigma process that will have less than defects in a million opportunities; and (6) methodology for structured problem solving Statistical Explanation: Six Sigma Six Sigma is a statistical measure [4] to objectively evaluate processes Six Sigma at many organizations simply means a measure of quality that strives for near perfection but the statistical implications of a Six Sigma program go well beyond the qualitative eradication of customer-perceptible defects It's a methodology that is well rooted in mathematics and statistics The graph in Figure shows a percentage slip of variation expected for different sigma processes For Example, 1σ process will have (34.13 % + 34.13 % = 68.26 %) events within the allowable specifications Lower Limit Upper Limit 34.13 % 34.13 % 13.06 % 13.06 % 2.14 % 2.14 % 0.13 % 0.13 % -3σ -2σ -1σ Mean (μ) +1σ +2σ +3σ Figure 1: Statistical explanation of Six Sigma [4] The processes are prone to being influenced by special and/or assignable causes that impact the overall performance of the process relative to the customers’ specifications This means the long term performance of a Six Sigma process will always be less than 6σ The difference between the short term process capability and the long term process capability is known as a “Shift” and is depicted as Z shift For a typical process, the value of shift is 1.5 Therefore, the statistical term “Six Sigma” indicates the short term capability of the process to be 6σ and the long term capability to be 4.5σ As the process sigma value increases from zero to six, the variation of the process around the mean value decreases DMAIC Process Six Sigma implementation follows a well known methodology known as DMAIC (Define, Measure, Analyze, Improve, and Control) It is an improvement system for existing processes falling below specification limits that need incremental improvement This methodology should be used when a product or process already exists but either has not been meeting customer specification or has not been performing adequately Each step in the cyclical DMAIC process is required to ensure the best possible results [6] The Define phase is concerned with the definition of project goals, involvement of core business processes, identification of the customers’ expectations, and the identification of issues that need to be addressed to achieve the higher Sigma level The tools being used in this step include input-process-output diagrams, process flow maps, and quality function deployment The Measure phase measures the current process performance by gathering information (baseline data) about the current process and identifies the problem areas by calculating the difference between the current and the expected process Usually data collection plans/forms and sampling techniques are employed at this stage The Analyze phase identifies the root cause of quality problems and confirms those causes using appropriate data analysis tools The cause and effect diagrams and failure mode and effect analysis are used for this purpose The Improve phase implements solutions that address root causes identified during the Analyze phase to improve the target processes Brain storming, mistake proofing, and design of experiments are some of the techniques that are used in this phase The Control phase evaluates and monitors the results after every defined intervals of time Control charts and time series methods are generally used in this phase Correlation Methods After identifying the root causes (variables that came into play), the Analyze phase of Six Sigma attempts to discover the correlations that measure the strength or closeness of relationship between pairs of variables These correlations can then be used for prioritizing and eliminating the causes for an effect There are several correlation measures [7] being used in practice: Parametric correlation analysis like Pearson correlation coefficient and Non Parametric correlation analysis like Spearman correlation coefficient, Kendall’s Tau correlation, and Fechner correlation coefficient Pearson correlation coefficient (rp ) is a parametric statistic which has the following underlying assumptions: The data set is a randomly collected sample Both the variables are either interval or ratio Both the variables are more or less normally distributed The relationship between two variables is always linear The Pearson correlation coefficient is given by r p Cov ( x1 , x 2) Sx Sx where the numerator expresses the extent to which x1 and x2 covary, and S x1 and S x2 are the product of the standard deviations Spearman correlation coefficient (rs) is nonparametric statistic and is also known as a rank correlation Its underlying assumption is that the variables under consideration should be measured on an ordinal (rank order) scale or the individual observations should be ranked into two ordered series In other words the variables should have monotonic relationship The Spearman correlation coefficient is given by: r s 1 6 D N N 1 where rs is the Spearman correlation coefficient, N is the number of pairs, and D is the difference between each pair Kendall tau correlation is equivalent to Spearman correlation coefficient with regard to the underlying assumptions However, both will give different values because their underlying logic as well as their computational formulas are different The Kendall’s correlation represents probability whereas the Spearman correlation doesn’t Fechner correlation coefficient is used, however to a lesser extent, to calculate the correlation between two monotonously related variables The correlation coefficients reviewed above impose certain assumptions on the variables in question Existing correlations work well if all their assumptions are met However, if the data doesn’t meet any of those assumptions, these coefficients not give the correct relational interpretation between the variables under the question Therefore this paper presents a more general non-parametric correlation coefficient called g-Correlation which has no restrictions on the variables being analyzed g-Correlation can also detect correlations which are neither functional nor monotonic It can work better than the existing correlation coefficients when it comes to the weaker correlations It allows one to compute a coarse correlation between pairs of variables in situations where accurate correlations are not possible g-Correlation Definition: For two random variables X and Y, first, consider the line y = μy , where μy is the median of the random variable Y, to divide the space of measurements into the two classes[7]: C1 := { (x, y ) R2 : y μy } and C2 := { (x, y ) R2 : y < μy } A random variable X is called g-Correlated with another random variable Y if there exists a value c R such that the criterion x0 c allows to assign the coordinate point (x0, y0) of the random vector (X, Y) to the classes C1 and C2 with an accuracy of more than 50%, i.e if P (X c, (X, Y) C1 ) + P (X < c, (X, Y) C2 ) ≠ Otherwise, X is not g-Correlated with Y [7] Properties of g-Correlation If a random variable X is 100% correlated with a random variable Y, then Y is 100% g-Correlated with X If Y=f(X) for a strictly monotonic function f and if Y has a unique median μy, then X is 100% g-Correlated with Y Estimating g-Correlation To estimate the g-Correlation coefficient, divide the given data set into two subsets: a training set T of size q and an evaluation set E of size (n-q) First, estimates for the separating lines y = μy and x = c with an appropriate value of c have to be computed using the training data set A value “c” which gives an optimal classification for the given set T of measurements with respect to the classes C1 and C2 can be computed using the following algorithm Step 1: Sort all pairs in the sequence s = [( x1, y1), (x2, y2),… (xq, yq )] in ascending order with regard to the x- values Step 2: Examine the arithmetic means of the x-values of all successive pairs in s as candidates for c Start this examination with the smallest value c and proceed successively to the highest value Step 3: For the first candidate for c count the number p1 of pairs (xi , yi ) of s with x i < c and yi < μy as well as the number p2 of pairs with xi c and yi μy For all the other candidates update the numbers p1 and p2 depending upon whether the pairs belong to C1 or C2 Step 4: Store the maximum classification percentage max {p1 + p2, q – p1 – p2}/q achieved for the training set T along with the corresponding candidate for c Finally, g-Correlation coefficient is approximated based on the calculated values μy and c using the g-Correlation definition for the evaluation set E Examples Figure 2: Scatter plot to demonstrate correlation between crime rate and low status [11] Example 1: This example describes the relationship between per capita crime rate by town and the percent of lower status of the population Figure 3: Scatter plot to demonstrate correlation between price and quantity bought Example 2: This example shows the relationship between the price of a commodity and the quantity bought Here, with the increase in the price of a commodity over a period of the year’s time, the sale of the commodity increases even with the increase in the price This happens due to seasonal requirements The commodity represents things required in some particular time period of the year For example, on east coast of the USA, sale of winter clothes will be higher during the second half of a year and hence the price of winter clothes is high during that time of the year Conclusion In Example 1, [see figure 2], various correlations between crime rate and low statistics are, Pearson correlation coefficient = 0.40 Spearman correlation coefficient = 0.44 g-Correlation coefficient = 75% In Example 2, [see figure 3], various correlations between price and quantity bought (hypothetical situation) are, Pearson correlation coefficient = 0.45 Spearman correlation coefficient = 0.26 g-Correlation coefficient = 79.5% The results above explain that for some set of data where Pearson and Spearman correlation coefficients fail to indicate the considerable correlation that exists between pair of variables, g-Correlation can give a clear and a better understanding of the relationship Correlations like Pearson and Spearman can fail to give accurate or near accurate results for data set representing seasonal variation whereas g-Correlation can give a good estimate of relationship between such variables as seen in Example References : What is Six Sigma, Hiatt, C., (Oct 9, 2001), “Six Sigma Definition”, Presentation, , Boise State University Six Sigma perception, Christopher, J Z., (2004), “Organizing Six Sigma – Statistical Explanation of Six Sigma”, Presentation, Boise State University Sheskin, D J., (2000), Handbook of Parametric and Nonparametric Statistical Procedures, 2nd Ed., Boca Raton, NJ: Chapman & Hall GE's DMAIC Approach, (Sept 9, 2003), Pittner, S and Kamarthi, S V., (2005), “Measures of Correlation and Predictability – Applications to Surface Roughness in Finish Turning”, working paper, Department of Mechanical and Industrial Engineering, Northeastern University, Boston, MA Lindley, D V., (1965), Introduction to Probability and Statistics from a Bayesian Viewpoint, Part 1: Probability, Cambridge, MA: Cambridge University Press Eckes, George, (Oct 2000) The Six Sigma Revolution : How General Electric and Others Turned Process Into Profits, VA, USA: John Wiley & Sons 10 Pande, P S., Neuman, R P., & Cavanagh, R R (2000) The Six Sigma Way: How GE, Motorola and other Top Companies are Honing Their Performance New York, New York: McGraw Hill 11 UCI Machine Learning Repository, Examples, ... causes (variables that came into play), the Analyze phase of Six Sigma attempts to discover the correlations that measure the strength or closeness of relationship between pairs of variables These... Statistical Explanation: Six Sigma Six Sigma is a statistical measure [4] to objectively evaluate processes Six Sigma at many organizations simply means a measure of quality that strives for near perfection... (which deals with the variables that are measured on ordinal scale or that have non-parametric relationship) are employed to determine the correlation between a given pair of variables Similarly,