Mastering Data Visualization with R: A Comprehensive Guide
Learn to tell compelling stories with your data using R programming. This video covers graphics, plots, and ggplot for effective data visualization.
File
Visualize your data using ggplot. R programming is the best platform for creating plots and graphs.
Added on 09/08/2024
Speakers
add Add new speaker

Speaker 1: Today we're talking about using our programming for data visualization. This is going to be the best video you've ever watched on data visualization, all your money back, guaranteed. Let's do this. Boom shakalaka. Just a reminder, this is part five of a series of videos, right? We started with explore, then clean, manipulate, describe and summarize. Now we're talking about visualize your data. Then it's going to be analyzed and after that present your data, right? This is the pipeline. This is how you approach your data from beginning to end. In this video, I'm going to talk you through how to tell a story with your data. The human brain is very good at pattern recognition, so let's put something in front of all of those brains out there for them to recognize and for them to see the patterns and see the story that you're trying to tell with your data. Next, I'm going to walk you through which graphics, which plots to use given the type of data that you have, numerical, categorical, or combinations of numeric and categorical. And finally, I'm going to walk you through some of the code that I use, right? We're going to walk through the code that I used to produce the graphic at the bottom right-hand side of the screen at the moment, and I'm going to teach you about the grammar of graphics using data, mapping, and geometry in a package called ggplot, which is super easy to use and you're going to love it, to produce graphics that tell the story you wanted to tell. Right? So stick with me. Don't go anywhere. This is going to be a lot of fun. Giddy up.

Speaker 2: If you want to learn about R programming, then you have come to the right place. On this YouTube channel, we're creating R programming videos on everything.

Speaker 1: Now all of the code that I use in this video is available to you right now on a cheat sheet. I'm going to tell you at the end of this video how to download that cheat sheet and you can replicate everything that I'm doing and experiment with it and improve on it. The data that I use in this video is also available. It's actually on your computer right now, right? R has got built in data sets that you can use to practice and those are the data sets that I use in order to to produce the graphics that I do in this video. So there's no going and fetching data or everything that I'm doing. You can do at home. Replicate it. Do it better. Booyah shaka. Let's keep going. I just want to use these plots to illustrate the fact that you can use data to tell a story. Right. Top left hand plot. This is weight of chickens by feed group. Each of these are feed groups. And the big dot is the the small little light dots are each observation, each individual chicken, their average, their weight. The big dots are the average weight within that group. This line down the middle is the average weight for all of the chickens. And you can immediately see that these three over here are above average, these three over here are below average, and you can see which are on the extremes. Tells a story, looking at the graph, you immediately know what's going on. This graph is all about temperature, and each of these density plots, and we'll talk more about what a density plot is later on in this video, each density plot represents the distribution of temperatures in a given month in the year 2016. Now, then we compare month on month and we can see that the density plots themselves move along and everything gets warmer. We're also using color here to represent temperature, so the more to the right you are, the higher the temperature, the more the color turns kind of yellowy-orange, right? And you can immediately see as the year unfolds, we go into the summer, everything gets warmer, and then obviously things get colder during the winter. And during a month, you can also see the same thing applies. There's this distribution of temperatures. Easy to see what's going on, it tells a story, it jumps off the page, you know exactly what is going on, right? You got it? Yippee-ki-yay. Let's keep going. This plot down here uses five variables, right? We've got a lot going on here. These two things here are two facets of one plot, so everything at the bottom is really one big plot, okay? What have we got going here? We've got life expectancy is the dependent variable, that's a function of something. Something's affecting life expectancy here. We've divided it into Africa and Europe, and the independent variable, the thing that we think may have an effect on life expectancy is GDP per capita. In other words, how wealthy a given country is. Each of the dots is a country. The color of the dot tells us what year that we're talking about that dot is from, and the size of the dot tells us the size of the population in the country at that point in time. And of course, we divide it into Africa and and Europe. So think about what we've got here. We've got we've actually got five variables, we've got one categorical, right, and it's continent, so Africa and Europe. And then we've got four numeric life expectancy, GDP per capita, year, and population size. And so we're representing five different variables on one plot. And it's very easy to see what's happening here. We can see that as GDP per capita goes up, in Africa, there's a dramatic increase in life expectancy. You You see the same thing happening in Europe, but it's a much shallower curve. All right, so the underlying message, the underlying story that you're telling jumps off the page, easy to see, and it's absolutely beautiful, right? You got it? Hot diggity. Let's keep going. Booyah shaka. Okay, stop watching the video. Stop watching the video. I want to give a quick, big thank you to Nested Knowledge for supporting the creation of this video. If you've ever gone through the struggle of doing a literature review or a systematic literature review, believe me, I know your pain. I've been there and what I'm about to tell you is gonna absolutely blow your mind. Nested Knowledge have a platform that supports the entire process, from designing your research question and search parameters to screening, tagging, and extracting the appropriate papers. The platform automatically generates visuals that you can use for qualitative analysis. So check out these interactive sunburst diagrams used to get an overview of trial endpoints, or, and this is gonna blow your mind, it can extract study results to do meta-analysis. So check out these ready for publication forest plots. And this next amazing feature is gonna become, I believe, the new standard for systematic review. It's being able to publish a living document that auto-updates as more data becomes available. So check them out by clicking on the link in the description below. Okay, let's get on with the video. Now to understand what graphic to use, what plot to use when, we're just gonna have a look at different variables and different combinations of variables. And to do that, we're gonna take a look at the Star Wars data set. Now, this data set is on your computer right now, so you can view it, and I've put it up on the screen over here, and we see we've got all the Star Wars characters, and then we've got a number of variables. We've got some numeric, we've got some categorical variables, right? Height, mass, hair color, skin color, eye color, et cetera, et cetera. We're gonna mostly use height, mass. I think we're gonna use hair color, and we're gonna use gender as well. Okay, and we're gonna look at combinations of categorical and numeric variables, and decide what kind of graphics we can use to represent those. We'll start off by looking at a single numeric variable, in this case, height. Typically, we might use a histogram. The data is put into buckets or bins. And for each bin, we count up the number of observations. And that is represented by the height of the column, right? So we get a sense of the shape of the data. Similarly, a density plot is basically the same idea, except it's not a count of the number of observations, but it's really the probability of an observation at a given height in this case. Box plot tells you about the distribution of the data. So the box itself represents 50% of the data. The line in the middle is the median. These little whiskers are 1.5, the interquartile range, and then everything else is an outlier. And a violin plot is very much like a box plot, but it's more of, it's similar to a density plot as well in that it's the distribution, and you can see the shape of the data. These are often more useful when we're looking at data that's been then disaggregated by a categorical variable, and we're gonna look at that later on in this video. Now let's look at one or more categorical variable, very straightforward, here we've got eye color and gender. If you're just looking at one categorical variable, a bar plot is perfectly fine, and each bar is just the height, it's a count of the number of observations in each category, very simple. Once you add in another category, you can disaggregate these bars by the new category, in this case, gender, masculine and feminine, and we've used color just to divide up what the bars are made up of. So this is a stacked bar plot. You can have a grouped bar plot where you've disaggregated and you've put them next to each other and a percentage or proportion where they all add up to one or add up to 100 and you can see the relative contribution of each of those categories. Okay, super duper easy, let's keep going. Okay, in the last example, we had eye color and gender but now we're adding in height. So we've got two categorical and one numeric, okay? Let's talk about how we'd represent that. Very often, once you introduce a numeric variable into play, the numeric variable really drives what the geometry is. In other words, in this case, we're looking at density plots. You could use histograms. We're looking at box plots, but you could use violin plots. But the actual geometry here has been driven by the numeric variable, and we're using the categorical variable to disaggregate that data, to disaggregate those graphics. Okay, and let's look at what we've done here. We've got a density plot of one numeric and two categorics. Here we've just got one numeric and one categorical, right? So we've got height, and we've disaggregated it, in this case, by gender, right? So we've got pink and blue, right? Same with the box plots, same idea. But we can disaggregate it further, And in this case, we've got eye color built into it. So we've got colors are being used to represent the different density plots, and then facets are used to divide it out even further in terms of eye color, right? And we've done that with density plots and with box plots over here, okay? Super duper easy, let's keep going. In the last example, we looked at two categoricals and one numeric. What about if we had two numerics and one categorical? Super duper easy, don't worry. So here we've got height, we've got mass, right? Two numerics and then one, I've used sex instead of gender, it doesn't really matter. Okay, let's look at the graphs. In the first graph, basically, I've just shown you two numerics, right? So before we build into what to do with the categorical. Okay, two numeric variables, scatter plot, which is basically each observation represents for that observation, what the mass and height were. We can see there's a nice relationship here. Just for the sake of showing you, I've put in what's called a smooth linear model into that with standard errors around that, it's a nice way of seeing what's going on in with the data, right? Now, when we add in sex, in this case, male and female, we can disaggregate all of those data points by giving them their respective colors based on mapping the color out against that particular variable, sex, male and female, right? And we can see that we've divided it up on one canvas, or you can divide the canvas up into facets and look at females and males separately, okay? Super easy, nothing complicated about that. Let's keep going. Okay, I'm gonna talk you through the code that I've used, and it's not much, it's very easy, to produce this graphic that's on the screen at the moment. It nicely illustrates how ggplot works and how the grammar of graphics works, right? So if you understand this, you'll be able to understand the code that you can download in the cheat sheet that I'm gonna tell you about in a minute. You'll understand all of that code because it all follows the exact same principles, right? Now we said earlier on that basically The grammar of graphics is firstly you define the data, secondly, you define the aesthetics or the mapping, and then finally, you just define the geometry. Right, those are the three components, and then you can add stuff to it afterwards, the theme, and you can do a few things after that. You don't stop there, but those are the three principal components, right? The data, the mapping, and then the geometry, right? And if you've got that, you're good to go. Let's look at how I use those three ideas in producing this graphic. First of all, I wanted to get the right data fed into ggplot, right? And to do that, now you might be, just first of all, tidyverse. Okay, library tidyverse gets you ggplot and other packages. It brings all of them into play at the same time. So I always use the tidyverse. ggplot comes along with that. And obviously you have to install it first. If you've never installed it, install packages tidyverse. Also install packages, Gapminder, right? And then you wanna say library Gapminder as well, then the data is available to you to use right now, right? So we start off by saying Gapminder. That's telling us we're using this data. This little pipe operator simply means, and then, right, we pipe what's ever on the left of the pipe into the next line of code, right? I'm not gonna try and teach you how to use dplyr and tidyverse in this video. I'm assuming you're familiar with that. I'm teaching you how to use ggplot, right? But what's nice about ggplot is that we're gonna use these pipe operators to feed into ggplot here, we start with ggplot here, to feed into ggplot the exact kind, filtered out data set that we want, right? And so we can actually manipulate the data as it's getting fed into ggplot. We can do things to it, which is lovely. That's why this is so beautiful, right? So first of all, we don't want all of the continents in our data set. So we filter, we say continents in, and this is a concatenation, and this isn't a video about how to do filters, watch another video for that if you're confused, but we basically said we're filtering it and we just want Africa and Europe, right? Then we're doing another filter and we're saying we just want GDP per capita of less than 30,000. Right, that was just to keep the graphic nice and manageable and so it's visually easy to see what's going on and to show you that you can be manipulating your data as it gets fed into ggplot and we can make changes. I mean, we could just, if we changed this number here, the graphic would immediately change. If we changed the number of continents there, that would change in the graphic straight away. Okay, so that's quite nice. Then we're feeding in this new data frame that we've now just defined into ggplot. The first argument in ggplot is usually data, right? So if we hadn't done all of that, I would have been saying data is equal to, and gapminder, comma, and then carry on. But because we're working in the tidyverse, we don't have to say data is equal to that because we're feeding the data in with the pipe operator. Okay, so that doesn't apply. Similarly, here we're just starting with the word, with aesthetics, but really what you don't need to type, but what's happening in the background is this is mapping is equal to aesthetics, but you don't have to say that, right? It's just, it's assumed, right? So we just say aesthetics, And there are, now there's four different aesthetics that I'm defining in this particular graphic. And we're gonna talk about how we deal with the fifth in just a minute. But basically mapping says for any given variable, for each of the observations in that variable, what should R do with that observation? What should it connect it with? And we've said, okay, well, the X variable, right, the X axis, we want that to be mapped out against GDP per capita. The Y axis, we want that to be mapped out against life expectancy. As we plot the different observations, we want the size of that observation to be a function of population size, and we want the color to be a function of the year. And you can add additional aesthetics in this mapping process. You could say we want the shape to be a function of something else. There's multiple things you can do here. The point is, you want to think about, for each variable that you want to represent on your canvas, what should that variable map out to on that canvas in terms of some sort of aesthetic that you can see, right, color, shape, size, you know, there's lots of things you can do. And so in this case, we've defined those four, these are all numeric variables, right? Then we have a little plus that says, okay, we're gonna, we've now, we've defined these things, geom point, this is telling our, the next component of our grammar of graphics, which is the geometry, right? and geom. is saying we want R to stick a point or a dot on the canvas for each of the observations. This is where, if this was a histogram, we would tell R that it's a histogram or a bar plot or a density plot or a violin plot or whatever it is that you want to plot, this is the geometry, this is where you would define that. You can add additional information inside these brackets, right? So these are the arguments you can put inside there And sometimes you want to add specific aesthetics into a particular geometry as you're layering one layer upon the next. And I'm not going to get into all of that in this video. I have other videos on that that you might want to have a look at. But just to let you know, that's the geometry right there. And then after that, we've got an opportunity to do something called a facet wrap. And I've said facet wrap by this little squiggle there is by continent. So we've said, take all of that data and disaggregate it by continent. and it's gonna produce these two facets over here. And then I've just got labels, labs is for labels, and I've said title. We've given it a title. You know, the x-axis, give that a label, the y-axis, give that a label, and we're done. Right, super duper easy. And if I run that code, which I've already done, it produces the graphic that's on the screen right now. Okay, now if you wanna learn how to create any of these graphics, what you gotta do is practice. Right, so download the cheat sheet, right, click on a link, there should be a link on the screen at the moment. There'll also be a link in the description below. click on the link, download the cheat sheet, go through the code and do it for yourself. Replicate that code on your computer. You've got the data, you've got the code. Redo these graphics and you'll get used to how to do it and you'll be able to apply it then in your own work to your own data. I've got more detailed lessons on how to use ggplot for data visualization at learnmore365.com. Now, stay and watch the next video on data analysis. And if you haven't subscribed, subscribe to this channel, hit the bell notification if you wanna get notified of future videos. Nice to see you here. Don't ever change. Don't do drugs. Always do your best. See you again soon. Take care. Bye.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript