Chapter 20 - Discriminant, factor and cluster analysis. In this chapter, the following content will be discussed: Discriminant analysis, objectives of discriminant analysis, basic concept, discriminant function, discriminant function – a graphical illustration,...
1 Marketing Research Aaker, Kumar, Leone and Day Twelfth Edition Instructor’s Chapter Twenty Discriminant, Factor and Cluster Analysis / Marketing Research 12th Edition Discriminant Analysis • • Used to classify individuals into one of two or more alternative groups on the basis of a set of measurements Used to identify variables that discriminate between naturally occurring groups Major Uses Prediction / Description Marketing Research 12th Edition Objectives of Discriminant Analysis • • • • / Determining linear combinations of the predictor variables to separate groups by measuring betweengroup variation relative to withingroup variation Developing procedures for assigning new objects, firms, or individuals, whose profiles, but not group identity are known, to one of the two groups Testing whether significant differences exist between the two groups based on the group centroids Determining which variables count most in explaining intergroup differences Marketing Research 12th Edition Basic Concept If we can assume that two populations have the same variance, then the usual value of C is where X1 and XII are the mean values for the two groups, respectively Distribution of two populations / Marketing Research 12th Edition Discriminant Function Zi = b1 X1 + b2 X2 + b3 X3 + + bn Xn Where Z = discriminant score b = discriminant weights X = predictor (independent) variables In a particular group, each individual has a discriminant score (zi) Σ zi = centroid (group mean); where i = individual Indicates most typical location of an individual from a particular group / Marketing Research 12th Edition Discriminant Function – A Graphical Illustration / Marketing Research 12th Edition Cutoff Score • Criterion against which each individual’s discriminant score is judged to determine into which group the individual should be classified For equal group sizes / For unequal group sizes Marketing Research 12th Edition Determination of Significance • • • / Null Hypothesis: In the population, the group means the discriminant function are equal Ho : μA = μB Generally, predictors with relatively large standardized coefficients contribute more to the discriminating power of the function Canonical or discriminant loadings show the variance that the predictor shares with the function Marketing Research 12th Edition 10 Classification and Validation Holdout Method • • • / Uses part of sample to construct classification rule; other subsample used for validation Uses classification matrix and hit ratio to evaluate groups classification Uses discriminant weights to generate discriminant scores for cases in subsample Marketing Research 12th Edition 34 Hierarchical Clustering • Single Linkage ▫ • Complete Linkage ▫ / Clustering criterion based on the shortest distance Clustering criterion based on the longest distance Marketing Research 12th Edition 35 Hierarchical Clustering (Contd.) • Average Linkage ▫ Clustering criterion based on the average distance • Ward's Method ▫ / Based on the loss of information resulting from grouping of the objects into clusters (minimize within cluster variation) Marketing Research 12th Edition 36 Hierarchical Clustering (Contd.) • Centroid Method ▫ Based on the distance between the group centroids (the point whose coordinates are the means of all the observations in the cluster) / Marketing Research 12th Edition 37 Hierarchical Cluster Analysis Example / Marketing Research 12th Edition 38 Hierarchical Cluster Analysis (Contd.) A dendrogram for hierarchical clustering of bank data / Marketing Research 12th Edition 39 Hierarchical Cluster Analysis (Contd.) / Marketing Research 12th Edition 40 Nonhierarchical Clustering • Sequential Threshold ▫ • Parallel Threshold ▫ • Several cluster centers are selected and objects within threshold level are assigned to the nearest center Optimizing ▫ / Cluster center is selected and all objects within a prespecified threshold value are grouped Objects can be later reassigned to clusters on the basis of optimizing some overall criterion measure Marketing Research 12th Edition 41 Nonhierarchical Cluster Analysis Example / Marketing Research 12th Edition 42 Nonhierarchical Cluster Analysis – Example (Contd.) / Marketing Research 12th Edition 43 Nonhierarchical Cluster Analysis – Example (Contd.) / Marketing Research 12th Edition 44 Nonhierarchical Cluster Analysis – Example (Contd.) / Marketing Research 12th Edition 45 Criteria for Determining the Number of Clusters / ▫ Number of clusters is specified by the analyst for theoretical or practical reasons ▫ Level of clustering with respect to clustering criterion is specified ▫ Determine the number of clusters from the pattern of clusters generated. The distances between clusters or error variability measure at successive steps can be used to decide the number of clusters (from the plot of error sum of squares with the number of clusters) ▫ The ratio of total withingroup variance to between group variance is plotted against the number of clusters and the point at which an elbow Marketing Research 12th Edition 46 Methods to Validate a Cluster Analysis Solution • • • • / Apply two or more different clustering approaches to same data or use different distance measures and compare the results Split the data randomly into two halves and perform clustering on each half and then examine the average profile values of each cluster across sub samples Delete various columns (variables) from the original data, compute dissimilarity measures across remaining variables and compare these results with the results obtained using full set Using simulation procedures create a data set with the properties matching the overall properties of the original data but containing no clusters. Use the same clustering method on both original and the artificial data and compare the results Marketing Research 12th Edition Assumptions and Limitations of Cluster Analysis • 47 Assumptions The basic measure of similarity on which the clustering is based is a valid measure of the similarity between the objects ▫ There is theoretical justification for structuring the objects into clusters ▫ • Limitations It is difficult to evaluate the quality of the clustering ▫ It is difficult to know exactly which clusters are very similar and which objects are difficult to assign ▫ It is difficult to select a clustering criterion and program on any basis other than availability ▫ / Marketing Research 12th Edition 48 End of Chapter Twenty / Marketing Research 12th Edition ... Common Factor Analysis – Results (Contd.) / Marketing Research 12th Edition 29 Common Factor Analysis Results / Marketing Research 12th Edition 30 Common Factor Analysis – Results (Contd.) / Marketing. ..2 Chapter Twenty Discriminant, Factor and Cluster Analysis / Marketing Research 12th Edition Discriminant Analysis • • Used to classify individuals into one of two or more ... to a smaller set of factors • Common Factor Analysis ▫ Uncovers underlying dimensions surrounding the original variables / Marketing Research 12th Edition 19 Factor Analysis Example / Marketing Research 12th Edition