Top Websites to Find Data Sets for Projects and Research: A Comprehensive Guide
Discover the best websites to find data sets for personal projects or work, including Kaggle, Google Dataset Search, FiveThirtyEight, data.gov, GitHub, and NASA.
File
Best Places to Find Datasets for Your Projects
Added on 09/28/2024
Speakers
add Add new speaker

Speaker 1: What's going on, everybody? Welcome back to another video. Today, we're going to be talking about the best places to find data sets. Now, why do we need to find data sets at all? Well, you may just be wanting to build a personal project, and you don't have the data for it. So you want data on a country, or a product, or just really anything you could want. This is where a data set would come in handy, where you can just download it and use it. You may also need a data set for your actual job. And this happened all the time for me. I would need a data set on some very specific data, but my company just doesn't have that data. So I would go out, and I'd search for it, and I'd find it. And I would use it in the actual work I was doing. So let's jump onto my screen. I'm going to show you some of my favorite websites for getting data sets. All right, so I'm going to start with some of the more popular ones I think you might recognize or know. And then we'll get to some of the ones that probably are a little bit more nuanced or you may never have heard of. The first one is, of course, Kaggle. This one is fantastic. I've gotten a ton of data sets from Kaggle. It's open source. Anybody can post data sets here. And what's great is you can just download it right from here. So if we come in here, we'll go over to, I don't know, this earthquake data set. And then we'll come down, and it'll give a little description of it. And then you can have this earthquake data set right here. You just click on it, you download it, and you have the actual data. It's as simple as that. What's really great is you can actually just search these data sets. So if we go back, let's say I want one on sales. So I type in sales, and then we have all these different data sets on sales. Now, some of these are very old, four, five, six, eight years old, but they're still really good. And if you're just trying to build a product or just a project, these could be perfectly fine. I really recommend Kaggle for people who are just starting to build projects because they have such a wide variety of actual data sets. The next place to find data sets is Google Dataset Search. Now, this is a little bit different than Kaggle, and let's try this out. You can still search for data sets, and let's just try the coronavirus, COVID-19. It prompted us to do it, so why not? What this is gonna do is it's gonna search the web and find data sets for you. So then you can come over here, and you can see it's a CSV, it's a zip file. We have a PDF, and they even can source in Kaggle data sets as well. But then you can come in here, and you can say it have found data sets, and it's gonna explore them. So you can find these data sets at these locations. So it's often not as straightforward. You have to do a little bit more digging than Kaggle, but typically, you're getting more reliable sources of data. So with this, if you're looking for an actual COVID-19 data set, things like Data World is another kind of reputable place to find data sets, although I think you may have to pay for some of them on some of these websites. But you can find data sets from the entire internet, not just one website, and that's what this is really, really good for. So you may have to search around to kind of find it, and then click on these links, and then download it from those websites. But this is kind of an all-in-one place to search, which is really helpful, again, just a little bit different than Kaggle. Now, the next website is called FiveThirtyEight. Now, if you've never heard of them, they're an analytical news website. So it's like a data-driven website for news. I follow them every so often. I think they're pretty cool, but they also have open access to the data that they use for their news articles, which is just really, really cool. So you can come over here, and let's say I wanted to do NHL predictions, so hockey. I'll click on this link, and it's gonna actually download all of the data in here. Now, you can click in here as well, and you can actually look at the data that you're about to download. And so you can just download a lot of these free data sets, and they're pretty wide and varied. But you can kind of look through and find some really, really interesting data sets. And if you find the news article, you can actually go and look at the exact data that they use, and you can kind of verify what they're posting and talking about. The next website for finding data sets is data.gov. Now, this one is for people in the US, although anybody can use it. But there should be one very similar to this for almost any local government or agency, for most countries as well. So this one is just one that I've used. But you can come in here. It can prompt you to search for things. I'll search for healthcare. And what it's gonna do is it's gonna find all the data sets that they have on healthcare. They'll tell you if it's a CSV, if it's an API. That's not an API. But sometimes they'll have PDFs, APIs, JSON, XML. And then you can click into these, like this licensed healthcare facility listing. This isn't a data set that I'm interested in. But then you can go and just download it. Click on this and download that data set. This is definitely more government data. So for this website, it's more US-specific, state-specific, local government-specific, or federal government-specific. But they have really, really good stuff. I've used a lot of data from this website. The next place that we're gonna look is GitHub. Now, GitHub isn't typically known for data sets, but I personally put a ton of data sets in here for free for all my YouTube stuff. And so do a lot of other people. As you can see, if you search for a data set in GitHub, there's 276,000 repositories that have data set in them. So if you wanna search for something like healthcare data set or something like that, you can scroll down and you can find stuff for that. So this one's NHSR data sets. It looks like this one's for the R community, but we can also use that. So you just find their working files. And now here's some mortality rates with a CSV. So this one is a little bit more tricky. You have to really kind of understand how GitHub works, but I found a lot of great data sets from GitHub from people who are doing projects similar to mine. So if you have a project that you wanna build, you may be able to find the entire project with code and everything on GitHub, but also be able to download that data set as well. Now, the last one I'm gonna show you is kind of a nuanced one, right? This is pretty specific, but one that I personally am super interested in. And there's another kind of government one. This is data.nasa.gov. So NASA has their own portal for their data. So you can come in here, go to their data sets, and now you can see all of their data that they have just available for free. Most of this is NASA specific, as you could guess it, but it's also super interesting stuff. The data in here is extremely specific and very, very detailed. So if you're looking for really specific data sets, this is not where I'd go. I'd go to Kaggle just for sample data sets. This one is like real data. It's thousands and millions of rows of actual data that you could use to build a product or do some real research for NASA. Again, you may not use this data at all, but if you have a really specific niche that you're looking for, something like Google Dataset Search might be the exact thing, because then you can find websites like this for your specific data and niche. So there you go. Those are some of my favorite websites to get data sets. I have used all of those many times for a lot of different things. And some of those websites I've even found through Google Dataset Search, just trying to find specific data. And then I'm like, whoa, this is a great place to find a ton of other data. And so, you know, as you go down that rabbit hole, you'll find, you just kind of find data where it is and you find different ways and websites to actually collect and get those data sets. So I hope that this was helpful. I hope that these websites help you get really good data sets for your projects or your work. If you liked this video, be sure to like and subscribe below and I'll see you in the next video.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript