20,000+ Professional Language Experts Ready to Help. Expertise in a variety of Niches.
Unmatched expertise at affordable rates tailored for your needs. Our services empower you to boost your productivity.
GoTranscript is the chosen service for top media organizations, universities, and Fortune 50 companies.
Speed Up Research, 10% Discount
Ensure Compliance, Secure Confidentiality
Court-Ready Transcriptions
HIPAA-Compliant Accuracy
Boost your revenue
Streamline Your Team’s Communication
We're with you from start to finish, whether you're a first-time user or a long-time client.
Give Support a Call
+1 (831) 222-8398
Get a reply & call within 24 hours
Let's chat about how to work together
Direct line to our Head of Sales for bulk/API inquiries
Question about your orders with GoTranscript?
Ask any general questions about GoTranscript
Interested in working at GoTranscript?
Speaker 1: StatQuest is the best, if you don't think so, then we have different opinions. Hello, I'm Josh Starmer and welcome to StatQuest. Today we're going to be talking about the main ideas behind principal component analysis and we're going to cover those concepts in five minutes. If you want more details than you get here, be sure to check out my other PCA video. Let's say we had some normal cells. Psst. If you're not a biologist, imagine that these could be people, or cars, or cities, or etc. They could be anything. Even though they look the same, we suspect that there are differences. These might be one type of cell, or one type of person, or car, or city, etc. These might be another type of cell, and lastly, these might be a third type of cell. Unfortunately, we can't observe differences from the outside, so we sequence the messenger RNA in each cell to identify which genes are active. This tells us what the cell is doing. If they were people, we could measure their weight, blood pressure, reading level, etc. Okay, here's the data. Each column shows how much each gene is transcribed in each cell. For now, let's imagine there are only two cells. If we just have two cells, then we can plot the measurements for each gene. This gene, gene 1, is highly transcribed in cell 1, and lowly transcribed in cell 2. And this gene, gene 9, is lowly transcribed in cell 1, and highly transcribed in cell 2. In general, cell 1 and cell 2 have an inverse correlation. This means that they are probably two different types of cells, since they are using different genes. Now let's imagine there are three cells. We've already seen how we can plot the first two cells to see how closely they are related. Now we can also compare cell 1 to cell 3. Cell 1 and cell 3 are positively correlated, suggesting they are doing similar things. Lastly, we can also compare cell 2 to cell 3. The negative correlation suggests that cell 2 is doing something different from cell 3. Alternatively, we could try to plot all three cells at once on a three-dimensional graph. Cell 1 could be the vertical axis, cell 2 could be the horizontal axis, and cell 3 could be depth. We could then rotate this graph around to see how the cells are related to each other. But what do we do when we have four or more cells? Draw tons and tons of two-cell plots and try to make sense of them all? Or draw some crazy graph that has an axis for each cell and makes our brain explode? No, both of those options are just plain silly. Instead, we draw principal component analysis, or PCA, plot. A PCA plot converts the correlations, or lack thereof, among the cells into a 2D graph. Cells that are highly correlated cluster together. This cluster of cells are highly correlated with each other, so are these, and so are these. To make the clusters easier to see, we can color code them. Once we've identified the clusters in the PCA plot, we can go back to the original cells and see that they represent three different types of cells doing three different types of things with their genes. Bam. Here's one last main idea about how to interpret PCA plots. The axes are ranked in order of importance. Differences among the first principal component axis, PC1, are more important than differences along the second principal component axis, PC2. If the plot looked like this, where the distance between these two clusters is about the same as the distance between these two clusters, then these two clusters are more different from each other than these two clusters. Before we go, you should know that PCA is just one way to make sense of this type of data. There are lots of other methods that are variations on this theme of dimension reduction. These methods include heat maps, t-SNE plots, and multiple dimension scaling plots. The good news is that I've got stat quests for all of these, so you can check those out if you want to learn more. Note, if the concept of dimension reduction is freaking you out, check out the original stat quest on PCA. I take it nice and slow so it's clearly explained. Hooray. We've made it to the end of another exciting stat quest. If you like this stat quest and want to see more of them, please subscribe. And if you have any ideas for additional stat quests, well, put them in the comments below. Until next time, quest on.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now