Understanding Factor and Cluster Analysis in Multivariate Analysis for Effective Market Segmentation
Explore the essentials of factor and cluster analysis in multivariate analysis, focusing on data reduction, segmentation, and classification for marketing strategies.
File
SESSION FACTOR AND CLUSTER ANALYSIS
Added on 09/30/2024
Speakers
add Add new speaker

Speaker 1: Hello friends today I will talk about a very important topic of multivariate analysis where I will focus on both factor and cluster analysis. Factor analysis is the most popular analysis of interdependence technique. The purpose of factor analysis is primarily a data reduction where it summarizes set of variable into few sets of factors and where it's tested out few observed variables into few set of factors. I am giving an example of a departmental store where respondents are responding to a lifestyle statements given to them on a five point or ten seven point scale where they have supposed to respond and based on that we are going to factor them out. Depend upon that we're going to identify their lifestyle factors on which we identified what kind of lifestyle they are living in. We're going to come back to this slide later on once we look at the kind of factor analysis we got to analyze. You will remember that as I mentioned earlier factor analysis always will be a base of cluster analysis that's why when you remember in your marketing management you always been understanding in your segmentation targeting and positioning. The first step of segmentation targeting and positioning is basis of segmentation but based on the basis of segmentation only you define your segments. So factor analysis always give the basis of segmentation and cluster analysis give the segmentation classification. So factor analysis gives you the basis. In this case we'll identify the basis of segmentation and cluster analysis will show you how to segment people or classify the people. We'll see to it. So I am giving you another example apart from that say for example in bank. So a bank normally rate on importance of 15 point bank attributes on a five-point scale. Five being very important one being not at all important. So we run a principal component analysis which is one of the analysis method adopted in factor analysis widely being used. They identified that there are four factors came out from this particular factor analysis which are traditional services, convenience, visibility and competence. These are factor profiling which have been done after the last stage of factor analysis. So similarly when we will do the factor analysis the last stage will always be factor profiling. So we will do that once we will do our factor analysis. Thus this particular bank understood that these are the factors on which they have to design their particular campaign ultimately to build their particular image brand image. Where you can possibly use and apply a factor analysis in marketing. As I mentioned primarily for segmentation. So factor analysis will always be a base for cluster analysis for segmentation. Even in product management you can use factor analysis. So for producing new product or launch of new products also factor analysis is a very useful product. Even launching a new campaign many a times factor analysis can be many useful for and also for understanding media consumption pattern factor analysis can be very useful. Understanding the price consumption pricing also and price strategy also factor analysis can be very useful. I am going further as I mentioned earlier I am going to the same example. Say for example in departmental store I am understanding a consumer seeking upon to buy a toothpaste from the departmental store. I am seeking the responses from the 30 respondents. I have interviewed them on a seven-point scale to about their degree of agreement. One being strongly disagree seven being strongly agree. I was talking about I was showing you the cluster matrix that is how factor cluster matrix looks like. I was showing you after iteration how rotated factor matrix looks like. This is how rotated factor matrix looks like. Based on this this will be the final table which will be given to you in a factor loading and based on that you're going to pick up a variable. You will be able to know that which a variable will be in which particular factor. That is variable will be a member of a particular factor family. So once this variable is clubbed into this factor say for example variable number one having higher correlation value in this but here is negative here so it will be clubbed in factor one. Similarly variable three will be clubbed in factor number one. Variable variable six will be variable six will be not be captured here but this is very low low here so it will be captured in factor number two. So similarly you will see where it is captured. So one will capture variable one and variable three. Whatever a variable they are based on that you're going to profile these factors. This is factor profiling. This is how you do factor profiling but if you look at this is even values because if you see even values are more than one more factors are retained. If the factors are even value would be less than one any of the factors if you get less than one value that factor immediately will be eliminated from the factor solutions. Here is the commonality. This commonality will tell any of the variables will be eliminated. Even values will tell any of the factors that will be eliminated. Simple. You have to understand that part. So once that has been decided finally I have identified that v1 and v3 having the high factor loading of factor one so it become labeled as health benefit factor. v2 and v4 and v6 were having high factor loading when the factor two they were been labeled as social benefit factor. Let me show you how my example analysis has been shown shown to you. So let me show you an example. The example which I have taken similar example I will go for and show you the example of that. So in that example my KMO value came out as a 0.660. My Bartlett's test of sparsity my test by test degree of freedom of 15 with a significance value of 0.000 came out to be significant. That means that the null hypothesis that population correlation matrix is identity matrix is rejected. There is a correlation that the matrix the correlation do exist that is it makes sense to conduct factor analysis and further to process that we have run a KMO test and the value is more than 0.5. So we can conclude that factor analysis is an appropriate technique to understand and analyze correlation matrix that is what we conclude. So if we combine these together we can conclude we can conclude that okay this factor analysis is appropriate technique in nutshell. As I mentioned earlier this is what will be shown you in SPSS that the commonality will be shown you. If the commonalities of extractions are close to 1 it is a good signal for you. If it is lower than lower than close to 0 variables may or may may be excluded from the further analysis. So principal component analysis has been used here for extraction method. If you look at this particular table I have used in initial even value C here. We are bothered about this value related percentage of even. So number of factors which got components that in number of component means number of factors. So this is the even value which you've done. At this for each even value you got total factor variance individual factor variance you got this is cumulative factor variance you got here. Based on that the even values will tell you whether the factor will survive or not. See automatically because the even value is less than 1 automatically factors it will these factors were eliminated. So you got two factor solutions. Automatically you got two factor solutions. Now in this two factor solution within that these are the variables clubbed within the two factors. Now what is a remaining job you have is to profile the factor. Last job of factor analysis. How to profile? You need to know which variable will be clubbed in which particular factor based on the factor loading. Factor loading is high in which variable based on the value of these based on that factor profiling will be done which I already mentioned you earlier. So you have already understood what factor analysis is. I am switching over to cluster analysis. So we already got the basis of cluster analysis that means we already got the basis for segmentation. I am getting into how to do the segmentation that is cluster analysis. Cluster analysis is a data analysis technique to classify people. You can classify sort people into groups to do clusters. The basic purpose and the basic thumb rule for classification technique or cluster analysis is high internal homogeneity within the cluster and high external heterogeneity between the cluster. That is the thumb rule which you have to achieve. This is a two-step process which you have to do in SPSS. Always remember that. I give an example. Suppose in a financial sector the companies operate on a dimension of risk, return and liquidity. So you put up any this particular input in a cluster analysis. It will look something like. So you normally wants to know what are the where I would like to invest my income. So these are possible segments or the set of clusters which you get based on the level of risk and return and the liquidity you will get and the level of level of risk and return you will get. So clusters tells you classifications and cohorts of this particular group. There is a high level of homogeneity across. This is a high level of heterogeneity across. This is what is a thumb rule of cluster analysis. This is what we normally do in cluster analysis is classification and group classification. Okay what are the steps of cluster analysis. First obviously is you have to collect the data. Selection of variables. You need to know on what variables you have to do data analysis. Okay before I go further remember one thing. This is a very important statement I am telling you. Factor analysis will club variables. Cluster analysis will always club cases or customers. Questions which you use in your questionnaire. The same question will be used for factor analysis. The same question will be used for cluster analysis. Only difference is in questionnaire the questions which are used for factor analysis. Factor analysis club the variables. Cluster analysis club the cases or the respondents and based on that they classify the respondents. Okay so then you select the variables on which you want to analyze. Obviously the variables will be same and the same variable which you have used in factor analysis. That's why you run fracture and cluster in tandem because you use the same variables. Second point is you generate a similarity matrix. We will show you what the similarity matrix is. The third is decide the number of clusters you want. That the first step will tell you an interpretation of that. Now last part is validation. We will see how the validation of case resolution will be done. Of hierarchical clustering the dendogram is a diagram to cross check agglomerative scheduling and dendogram also gives you the number of cluster in the diagram purpose in the form of a diagram and you can also cross check them how many clusters you got and those number of clusters you normally put as a prerequisite in your k-means or non-hierarchical clustering to get the final cluster solution. Okay this is very important. Before you go further in conducting hierarchical clustering analysis two aspects are very important to understand. What do you mean by distance measures and what do you mean by linkage rules. You might be remembering in your mathematics you might have studied coordinate geometry that concepts might come back again. Distance measure Euclidean distance and non-Euclidean square Euclidean distance and many others will come back again. You have to refresh those concepts here. So you have to first measure the distance in the two dimension matrix and that is what among the respondents measure responses. You have to calculate the distance between the responses. Based on the methods given we will talk about those methods. There are various methods on which distance measures been captured. Then based on the distances captured then you have to link those responses based on various linkage methods. The various methods given there. Okay. Examples are shown here. So let me show you the various methods which are used for distance measures. There are various methods. Euclidean distance, square Euclidean distance, city block, Manhattan distance also been known as that Chebyshev power distance or percentage distance. Normally widely used method for distance calculation distance calculation is square Euclidean distance. So I am recommending to use square Euclidean distance for your purpose when you calculate cluster analysis. How they look like while calculating manually. Okay. But you have to use square Euclidean. But when you open in SPSS you automatically get an option of clicking the square Euclidean. Automatically you click square Euclidean you will get an option. Second thing is you have to pick up method for linkage rule. So how you will link those respondents. Not only you will measure the distance. Once you measure the distance now you're going to link those distances up. While linking the distances there are various methods. Nearest neighbor, single linkage method, complete linkage that is for furthest method, average linkage, within group method and Ward's method. So what we're going to do is we're going to use Ward's method. Normally most of the literature and the books recommend that you when you use distance measures and linkage method use please use the combination of square Euclidean distance and the Ward method. So for our analysis we're going to use square Euclidean and the Ward method for linkage. So this is just an picture picture presentation of all the linkage methods in front of you. Let me just show you the data set. These are the data set which we got for all the variables of attitudinal data for clustering the same variables which we got. So say for example cases. What are the cases? Respondents. These are the 20 respondents who have responded on a seven point scale on say see that toothpaste example which we had discussed same that same example is been discussed here. Same variables. The same variables I am going to cluster them out in hierarchical clustering. So I am going to use a combination of square Euclidean with the help of Ward method to link the clusters. Understand to how to take the number of clusters. Cross check this. This is a dendrogram which you get. This is the fusion coefficient. This is a fusion coefficients given to here. Okay. This is a dendrogram. So you got this one cluster, second cluster, third cluster. This is the third cluster. This is a second cluster. This is a third cluster. This is a fourth cluster. This is fifth cluster. Because in the fusion coefficient the farther the clusters are, the poorer they are. The farther they are, the weaker they are. The lower they are, the stronger they are. You have to pick up in this manner. So these number of clusters you have to take. So you can take three or two. I am taking two in this particular case because on the third one there is a steep drop in this case. So decide number of clusters. So the decision will be entirely based on the practitioner's decision. Based on how many number of segments you want to serve. So how many number do you want to serve, you want to concentrate on, that will be the major decision of the researcher and practitioner. Which and that is very important for non-hierarchical clustering as well. Okay. Now we'll go to the K-means. K-means is the second stage of clustering. The K-means will be used for interpretation and profiling of clusters. What you will do in K-means? K-means will be used to do multiple things. In K-means primarily will be do to capture the distance between final cluster centers and indicate that where the cluster revolves around and also not only do that, I will also do the profiling of clusters there. As we have done the profiling of vectors, we will also do the profiling of clusters here. Okay. So major kind of outputs which you will get in cluster solution will be finally four. Initial cluster centers, case listing of cluster memberships in each, final cluster centers. So you are more bothered about final cluster centers. So final cluster centers will give you cluster profiling and ANOVA table. You must know how to analyze them. I will give you an example how to analyze them. So final cluster centers describe that the mean value, the mean. Okay. Mean value of each variable among the three clusters. Say for example, cluster 1 described the mean value of variable 1. That is fun, bad for budget, eating out, best for buy. Similarly others. So cluster in cluster 1, these variables having a higher mean. So they are clubbed in cluster 1. These are the people who are clubbed in cluster 1. So they are economic shoppers and similarly other people are clubbed in cluster 2. So we have picked up two clusters of similarly. So we have profiled them. So these are second. Similarly I am also cross-checking them with what is their cluster membership also. Then there will be a table of identifying distance between the clusters. Higher the value, better is the heterogeneity between the cluster. The last one which you will see in your cluster, SPSS cluster solution in k-means will be ANOVA table, which shows that six variables are significantly different or not in across three clusters at the significant value of 0.10 level or probability value of 0.05. So if it is, they are significantly different from each other. That is what we prove here. So finally the interpretation says that we are clubbing them up various cases into different clusters. Finally we got three major classifications and profiling of clusters. Fun-loving shoppers, economic shoppers and empathetic shoppers.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript