Simplifying Data Analysis: Basic Techniques for Effective Data Journalism
Learn how to analyze data using simple math and spreadsheets. Discover techniques like change over time, group comparisons, and identifying outliers.
File
Introduction to Data Journalism - Analyzing Data
Added on 10/02/2024
Speakers
add Add new speaker

Speaker 1: So analyzing data. So we've acquired our data. We've cleaned it. Now what? What do we do? What questions do we ask? So off the bat, it doesn't have to be complicated. You don't have to know statistics. It doesn't have to be some super crazy regression analysis. Many of the best data stories that I've seen, or that I've done, or that I've read are based on simple math, stuff that we all did in high school, right? Addition, division, averages, percent change. So again, I think sometimes the barrier for people getting into data reporting is thinking that, oh, I need to know statistics, or I need to have some skills. But I think that a lot of data reporting can be done with some of these more basic math questions. And again, your best tool when you're doing data journalism is a spreadsheet. It doesn't have to be more complicated than that. It doesn't have to be coding language or anything else. So here's some common things you can think about. Change over time. So we talked about that before, right? You've got a data set that's been collected year after year, or maybe it's data that's collected every day. Data on, again, crime reports, or police stops, or something else. How have those things changed over time? Super easy and common way to analyze data. Compare across different groups. So again, looking at that city sticker ticket data, how they didn't have the actual demographics of the people who got the tickets, but they had location. So can they show, how does it compare within different zip codes across the city? And then based on what we know about those zip codes, what can we say about some of these other potential disparities? So looking at the same data set and breaking it down into different groups. That's a great way to look at census data, because most census data is collected with some of that demographic information, whether it's age, or race, or gender. And that's a good way to look at things. Compare one geographic area to another. So you've got data on the whole city. You can look at different geographic areas within the city. If you have national data sets, you can look at data by county, or by city, or by state, and looking at how those geographic areas compare. An important thing when you're looking at different geographic areas is thinking about, is overall population going to affect the data? And if so, trying to normalize it, or account for population differences. So if you're looking at state-by-state data, and the data set is looking at total number of people who have a college degree in every state. Well, there's different numbers of people in every state. If you look at which state has the most people with a college degree, it's going to be New York or Texas, because they're the biggest states with the most people. So normalize for population is to take into account, per population, how many people per 100,000 have a college degree? Or what percent of people have a college degree? And using that as a comparison, because otherwise, you're just going to end up comparing population. Looking at outliers. So when I talked about cleaning data, I talked about sorting from highest to lowest. That can also be an analysis technique. Assuming that the outliers are correct, and we'll talk about that in a minute. But looking at a list of states or neighborhoods in Chicago and saying, what's got the highest number of police stops or the lowest number of police stops, that's a perfectly valid form of analysis. And all that takes is literally sorting a spreadsheet from highest to lowest. So super easy and can be really compelling. And then breaking down parts of a whole. So if you take survey results, breaking it down by the different demographics or, yeah, that's good. And then the last thing I thought was on the slide, you can do a lot of these in connection with each other. So you can look at change over time comparing different racial groups. Or you could look at different geographic areas by outliers. So things like that, you can combine those things into analysis in different ways. Questions about any of this? Yeah? How would you, if you're someone who doesn't have a ton of data experience, how do you figure out, I guess for certain stories maybe it's not necessary, but figuring out statistical significance and that kind of thing, how do you frame your thinking about that when you're not a statistician? Yeah. So I guess my first thing would be don't say it's statistically significant if you don't know that it is. And basically just be honest with your readers. You can do a totally valid data story that looks at differences between groups that may not be statistically significant if you ran that statistical analysis. But it's a difference. And you can say that and just be honest that it's not statistically significant, or you don't know if it is, or this is just an observational analysis and not a statistical one. So I think that's OK as long as you're, again, being open and honest with your audience and your readers about the kind of analysis that you did. Can you talk about when you're analyzing data, going in and looking for something specific versus just kind of going around and deciding what's important? I mean, is that more of a motivational story that you should step in or stop at? I think it depends on the story. Sometimes the story starts with a big data set, and you're just mining it to see if there's an interesting story there. Often the first thing I start with is those outliers. Are there things that look like they're outside the norm? But sometimes, like we talked about at the beginning, someone's told you that CPS class sizes have gotten bigger over time. And you get the data and then are trying to see is this true or not. And if it is true, why or what's happening or whatever, if it's not true, then maybe it's not a story. Or maybe it still is, but you can say, hey, CPS teachers are complaining about class sizes, but actually, 20 years ago, they were worse. That's interesting. Again, no idea if that's true or not.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript