Creating Scientific Graphs in R: Techniques for Box Plots, Histograms, and Linear Regression
Learn basic techniques for creating scientific graphs in R, including box plots, histograms, and linear regression, ideal for research and academic publications.
File
R Tutorial 33 Create Graphs in R for Scientific Journals and Academic Research
Added on 09/08/2024
Speakers
add Add new speaker

Speaker 1: Hi, recently my friend has invited me to create some graph for a scientific journal in the field of biology or biomedical. So here today I want to share some of the basic techniques for using R to create some graph for a scientific journal or even your research or academic publication. So let's say, so in one of the graph that I created for my friend is that I have a box plot at the top and then I have a histogram at the bottom, which is, which consists, which shows the distribution and the range for the data set for a sample, which is something like this. So how, what I'm, how to, so the steps that I'm creating, how I can create this graph is that first of all you have your data loaded. So here I'm just going to do a 10,000 R norm. So and then I have this layout function, which the, the input of the math is a, it's a, it's a matrix. So as you can see that in this matrix, I have two numbers, one and two, and then the, and then it is two rows with one column. So that, that means that's the, my, my setting for this plot. So now you can see that I have two rows and one column. So that your graph laid out as, as, as in the matrix form, and then the height will just this, the setting for the height for each graph. So in, so the top graph, I will have height of one and the bottom graph I'll have height of eight. So this layout first, predefine your, your table, how you want to, how you want to place your graph. We can think of it as a container or a placeholder for the graph that you are about to put in here. So now let me delete these and then, so, and lay out function here again. And here is a, it's a parameter where for, first of all, you, you are, this is only for your top graph. So this is telling that this is setting the dimension or the height and width for the, for the top graph so that you can see that. So here I have four parameters here, which is corresponding to the location or the coordinate for your, or actually it is the margin size. So for each four size for the top graph here. So if you forget what these numbers stands for, you can always try to, to, to change the number here and then you to, to adjust, to slightly address your graph, and then you can find out your, the, the, the numbers or the settings that is suitable for your graph. So these number is only suitable for my graph. So if you have different kind of graph that you want to put in your, in your, in your graph, then you need to adjust the number to, to, to see, to adjust for your graph. So here, this, so this is the R-norm box plot for, for my sampling data. So here you can see that. So for, for my first row, where I, which I defined in this matrix, I will have another row for here. And so I have two graphs, top and bottom, and then in, in, in one column here. And then this is the margin size for my top graph. And then in, within this, within the top graph, I'm, I'm making, I'm creating a box, box plot for, for the data, which I have a horizontal equal to true. So the box plot is horizontal. If you don't have this parameter, then your box plot will be a vertical. And then for the Y limits. So this is just the horizontal axis. I have negative five to positive five. So that's the top graph. Now for my bottom graph, I have this parameter, which is the margin size for the graph that I'm about to make in this region. So here I have a histogram. As you can see that I am plotting a histogram of the sampling data with breaks equal to 60. So that's how you break, how you break the graph into, into a smaller interval, instead of a wider interval. Same thing for X limit from negative five to five. So here's negative five to five. And then the main, the main is actually a title. So if you, if you do this, and then if I run it again, you will see there will be a title exist here. So you see, oh, it seems like the title overlapping with this one. So that's the reason I remove it. But even if you remove it, you can also using some other, some other commands to add it. So, and also the same thing for X label and the Y label. So if I do this, you will see the text for, so you have the title. This is controlled by main. My Y, X label is, this is controlled by the X label. The Y label is not showing here because I, somehow I put the, I put my left margin size. I think it's too small to reveal that, but if I, if I increase that, I will be able to show my Y label here, but now I will just change everything back. So this is something I have in one of the graph for my friends in his scientific journal. So which is showing the distribution of some, some test run for the dataset he has. So this is one of the graph I have in that scientific journal. And then I'm going to show you another graph, which is a linear regression graph. So here I have another sample of data and then my Y is just a one plus three X and then plus some noise here. And I have this parameter par here, which is telling that I want my graph to start from the zero axis. I want them to be connected for my, for the zero in the horizontal axis and the zero in the vertical axis. I want them to be connected and all start from zero. So now you can see that something like that. So you didn't see, you don't see the access and it's because I have this access equal to false function here. So here I'm just, so this PCH syntax is just telling that what kind of dot, what is the, what type, the type of marker. So if I change it to one, I'll get a different marker, which is a circle. And for 16 is a solid circle. So I think there's a table online. You can always check for this PCH. So for different number, you would get a different type of marker. So always look for, look for that table before you choose your favorite marker for a graph. So again, Y, XLIM and YLIM is only here to control the range for your Y, Y axis and X horizontal axis. The C axis is the width or the weight for your marker. So let's see if I change it to, I'll get a more, a higher weight for the marker. So again, let me, so this is my original setting, but this, and now I'm going to create to show the two axis. So as you can see that, so now the origin is connecting together. So you can see that as I define from 0 to 50 here, and 0 to 150 for my vertical axis. So using these two, I can bring up the axis in the graph again. So now I'm going to create a linear regression line for this graph. So my linear regression is to regress Y on X. And this one will control the width or the same as the weight for the graph. And then this one control the color. So I'll have a red line for this linear regression line with the weight or equal to 3. So it's a little bit thicker line. So now you can see that if I make it to 5, I'll have an even thicker line. And here, and this row, I'm telling, I'm creating a text box in this graph. So for the text, I have two coordinates, which is I'm telling the graph where to put, I'm telling R the coordinate where I want to put the text here. So if I run that, you can see that on 15, 15, I want to put a text box that is Y is equal to 1 plus 3X. So again, this is the size of the font. And this control the color, red color. Or I can also create a legend. So the same thing, the coordinate of my legend and the content. So here, I added Y equal to 1 plus 3X with R squared equal to 92%. And then the line here is the red line. But the text is black. And I also have this. So these parameter is always for the line. So you can see that for this one, it's the line type. And this one is the line, I guess, is the width. So make it darker, thicker, and make it back to solid line. So line type and width or weight for the line. So here, this is the basic setting for creating a graph in R. So there's some other package for creating a fancy graph. For example, ggplot, or there's some other plot for some specific model. But for me or for my friends' request, a basic plot function is more than enough to display the content of your data. So for me, I prefer this because I can control most of the syntax and the parameter or the content on the graph. Again, so let me show you what are the final outputs for these two graphs. Here, I have a top graph. I have a box plot on the top and a histogram on the bottom. And these parameter has helped me to adjust the location and the margin size or the details on the graph besides the data. And then I can also add the title, the labels, and more text in this graph. So you can always consider each line of the syntax is adding one more stuff on your final output. So let me remove that. And then let me redo this again. So this one, you can see that I added a dot here first. My first layer is all the dots. Second layer is the axis. Then the third layer, I added a line. The fourth layer, I added either the text or the legend for the text that can explain what the graph is doing. And also, I can also add, I guess, is there a title function? OK, yeah, you can also add a title. Or even you can change the y or x label here. So these are the basic function for graphing. And these two formats are used for my friend's scientific journal. So I hope if you are doing some research or trying to create some graph for your journals, I hope these functions can help you to make a, or I would say pretty simple and useful graph for a very basic data structure. So where I can show the distribution in this one, your distribution of your biology data. And here you can also show the relationship if you have two data or if you want to build a model or modeling if you need a model for your data to show like there's a positive trend and that there is a simple linear regression that can model your data. So thank you very much. And let me know if you have any question.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript