Comprehensive Guide to Exploratory Factor Analysis: Step-by-Step Tutorial
Learn how to uncover data structures using exploratory factor analysis. This tutorial covers key concepts, methods, and practical examples for effective analysis.
File
Exploratory Factor Analysis
Added on 09/30/2024
Speakers
add Add new speaker

Speaker 1: This tutorial is about exploratory factor analysis and we get started right now. So the first question is, what is exploratory factor analysis? Exploratory factor analysis is a method that aims at uncovering structures in your data. If you have a data set with many variables, it is possible that some of them are interrelated. For example, they correlate with each other. These correlations are the basis of factor analysis. The task of factor analysis is to divide the variables into groups of those variables which are most correlated to each other. So the goal of factor analysis is to divide variables into groups. Variables that are highly correlated should be separated from those which are less correlated to each other. Within the groups, the variables should correlate as high as possible. Between the groups, the correlation should be as low as possible. But first let's talk about what factor means. In factor analysis, the so-called factor can be seen as a hidden variable influencing several actually observed variables. Therefore, we have our observed variables on the one side and behind them we have factors that influence these variables. Or let's say it in other words, several variables are observable phenomena of less underlying factors. So now a brief summary for you. Factor analysis combines variables that are highly correlated with each other. It is assumed that this correlation is due to an unmeasurable variable called factor. Now let's look at an example. One possible question could be, can different personality traits such as outgoing, curious, sociable or helpful be grouped into personality types such as conscientiousness, extroversion or openness? In order to find it out, let's say you created a small survey with DatatabSurvey. You now want to find out if some of the traits correlate strongly with each other and if they can be described by a factor. So your question is, if some of the traits, outgoing, sociable, hardworking, dutiful, warm-hearted or helpful correlate with each other and thus can be described by an underlying factor. Let's say you have interviewed 20 people and the results are in your Excel spreadsheet. You can find a link to the Excel spreadsheet in the video description. Now let's look at a little preview. At the end of factor analysis, we could then conclude that outgoing and sociable can be described by the factor extroversion, hardworking and dutiful by the factor conscientious and warm-hearted and helpful by agreeableness. But how do we get these results? Now we have to go through it step by step. In the first step, we calculate the factor analysis for the example and then I explain to you the procedure of factor analysis on the basis of these results. For the calculation of factor analysis, we use Datatab. You can find a link to Datatab in the video description. First, we click on the statistics calculator and then we paste our data into this table. In order to do this, we simply click on empty table first and then we copy our own data into the table. Now we have inserted our data into the table and we click on the PCA tab. Then down here we see the six variables that we want to use. Now we want to calculate the factor analysis for all six variables. So we just click on all six variables. So that's outgoing, sociable, hardworking, dutiful, warm-hearted and helpful. What we now have to do is to choose the number of factors. In our example, we choose the number of three. Why we assume that there are three factors will be explained later. First, we take a look at the correlation matrix. In the correlation matrix, we can see how strongly the individual traits correlate with each other. So for example, outgoing has a strong correlation with sociable, but outgoing and hardworking do not strongly correlate with each other. So with the help of the correlation matrix, we get an overview of the correlations between the traits. In addition, the correlation matrix is the basis for calculating the so-called eigenvalues and eigenvectors, but we'll look at this later. In our example, we have six variables now, so we can have a maximum of six factors. In this table, the factors are sorted by size. The first factor is the one that can explain most of the variance, and the last one is the one that explains least of the variance. The first factor on its own explains 31.2% of the total variance of the six variables. The second vector on its own explains 24.7%. And finally, the sixth factor alone only explains 3.44% of the total variance. On the right side, we can see the cumulative percentages. If we use the first two factors, we can already explain 55.9% of the total variance of the six variables. And if we use the first three factors, we can already explain 78% of the total variance, which is quite a lot. Therefore, if we take instead of six original variables, just these three factors, we can explain 78.6% of the original variable. But now we come to the big question, how many factors do we need? Factor analysis does not give you a clear answer to this question. But in order to determine the number of factors needed, there are two common methods, and we will go through them right now. So how many factors do we need now? For both methods to find that out, the eigenvalues are sorted by size and plotted on a graph. On the x-axis, we have the number of factors 1 to 6, and on the y-axis, we have the eigenvalues. Let's start with the eigenvalue criterion. With the eigenvalue criterion, you simply look at how many eigenvalues are greater than 1, which gives you the number of factors. In this case, two factors are greater than 1. Accordingly, in this case, the number of factors would be 2. The second method is the so-called scree test, which is a graphical method. Here you simply look in the diagram, where there is a kink or a so-called elbow. You can see the elbow also at this point, so accordingly, this method also leads us to the result of two factors. But in practice, of course, the diagram does not always look so nice. Let's just have a look at the diagram of our example. It's not always that easy to interpret the diagram. In our case, the diagram looks like this. And here, of course, we cannot read a kink or an elbow, and we should simply go the way of the eigenvalue criterion. So we just look at how many of the factors have an eigenvalue greater than 1, and we use that as the number of factors. In this case, we have three factors that have an eigenvalue greater than 1, and therefore we take 3 as the number of our factors. Once the number of factors is determined, the communalities can be calculated. The communalities indicate how much variance of the variables can be explained by the three factors. For example, of the variable outgoing, 77.5% of the variance can be explained purely by the three factors. Or let's look at the variable sociable. Here we can see that 88.3% of the variance can be explained purely by the three factors. So now there are three terms that we have heard over and over again, which is factor loading, eigenvalue and commonality. Before we come to the final interpretation of the results, I would now like to go into those three terms in an example. What is the factor loading? The factor loading indicates, for example, how high the correlation between outgoing and extraversion is. An eigenvalue indicates, for example, how much variance can be explained by the factor conscientiousness of all variables. And finally, the communalities indicate, for example, how much variance of the six variables can be explained by the three factors. So now I cheated a bit. We don't really have such a nice result yet. In order to get such a nice result, for example, to assign exactly one factor to the variables, there are two more steps necessary. And we look at them now. So let's output the component matrix now. The component matrix gives us the loadings of the factors on the variables. This is the first factor, the second factor and the third factor. For example, we can read here that the first factor correlates with the variable outgoing with 0.67. Or the first factor correlates with the variable sociable with 0.8. Since the first factor explains most of the variance, the values of the first factor are the largest in terms of amount. Of course, this is not quite optimal, because we actually want to distribute the individual variables among the three factors. And we do not want a large number of variables to be bundled in the first factor. Actually, we want to assign the variables to the factors and thus form the groups. In this case, however, many variables would be assigned to the first factor. Therefore, this component matrix usually cannot be interpreted in a meaningful way. And we need one last step, and that is the rotation with the rotation matrix. There are different methods for this rotation, but the most common is the analytical VARIMAX rotation. VARIMAX rotation is used to analytically ensure that per factor certain variables load as high as possible and the other variables load as low as possible. This result is obtained when the variance of the factor loadings per factor is as high as possible. In the rotated table, we can see that ongoing and sociable have the largest value in terms of amount for the first factor and are thus assigned to the first factor. Hardworking and dutiful have the greatest value in terms of amount in the third factor and are therefore assigned to the third factor. And finally, warmhearted and helpful have the highest scores on the second factor and are thus assigned to the second factor. The factor analysis does not tell us how to name the factors. It is the task of the researcher to determine these names. I've used a simplified form of the so-called big five personality traits here, so it is also easy to name the factors in this case. For example, we could describe the variables outgoing and sociable with the factor extraversion, hardworking and dutiful with the factor conscientiousness, and finally, warmhearted and helpful with the factor agreeableness. After this procedure, we have now assigned the six personality traits to three factors. Now let's go through the whole procedure again for a short repetition. So in the first step, we copied our data into the table and set the number of factors initially. The first thing we got was the correlation matrix. The correlation matrix gave us an overview of how the different personality traits correlate. From the correlation matrix, the eigenvalues and eigenvectors can be calculated. And further, we could get the table of the explained total variance. Then we used the eigenvalue criterion to determine the number of factors and here we got the result of 3. If we had used a different value for the number of factors at the beginning, we would simply change it now into 3. Further, Datadab calculated the communalities for us and the component matrix. And finally, we got the rotated component matrix as an output. In the rotated component matrix, we could then see which personality trait can be assigned to which factor. So finally, we found out that outgoing and sociable can be assigned to the first factor, because they are where the largest values, hardworking and dutiful can be assigned to the third factor and finally, warmhearted and helpful will be assigned to the second factor. I hope you enjoyed the video. Bye and see you next time.

{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript