Understanding Cross-Sectional Studies: Prevalence, Risks, and Limitations
Explore cross-sectional studies, their use in measuring prevalence, and the challenges in interpreting associations between risk factors and outcomes.
File
Week 5 CROSS SECTIONAL STUDIES
Added on 09/30/2024
Speakers
add Add new speaker

Speaker 1: Well, welcome back everyone and thanks for coming back. The next type of study design I'd like to discuss is something we've already seen. I want to talk about cross-sectional studies. They're often referred to as prevalence studies because what they are, are data that's collected at one point in time where you measure the association between a potential risk factor and an outcome at that point in time. So we're using prevalence data and we talked about prevalence in the second series of lectures that I've given in this course. And I believe this is perhaps a table we've seen already describing the cross-sectional relationship between the prevalence of two characteristics in the Framingham Heart Study data set. Using data at the 1956 exam, what we're comparing is the prevalence of smoking, each person was classified either a smoker or a nonsmoker, and the prevalence of existing coronary heart disease. So let's look at this data again. What we have in this data set are a grand total of 2,181 individuals who claimed to be smokers at the 1956 exam compared to another 2,253 individuals who were nonsmokers. So each person's classified as a smoker versus a nonsmoker. In addition, each person at the 1956 exam was classified, was asked, have you been diagnosed previously with coronary heart disease? And among the 2,181 smokers, 86 of those people said yes, they had been diagnosed previously with coronary heart disease and compared to 108 of the 2,253 nonsmokers. So this is what we mean by cross-sectional data, data collected at one point in time. The point in time here is being the 1956 exam, classifying people in two dimensions, whether they're smokers or nonsmokers, the row variable, and whether they did or did not have coronary heart disease at that time, the column variable. And again, this is prevalence data, not incidence data. We're not following people for 24 years yet to see who develops heart disease. We're just asking at one point in time. Were you a smoker? Yes or no. And did you have coronary heart disease at that point in time? Yes or no. Well, with these types of cross-sectional data, we can measure prevalence. We can measure the prevalence of existing coronary heart disease among all the smokers. There are 86 smokers who developed heart disease out of the 2,181 total smokers. That gives me a prevalence of 0.0394. About 4% of the smokers had coronary heart disease at their 1956 exam. Among the 2,253 nonsmokers, 108 of them had coronary heart disease. Their prevalence is 108 divided by 2,253. That's a prevalence of 0.0479, about 5%. So notice the prevalence of coronary heart disease is lower among the smokers than it is among the nonsmokers. From these two prevalence measures, we can measure an association between smoking and the existence of coronary heart disease at the 1956 exam. We can calculate a prevalence ratio by dividing the prevalence of coronary heart disease among smokers by the prevalence of coronary heart disease among nonsmokers. Divide the 4% by the 5%. We come up with a prevalence ratio of 0.8226, meaning the prevalence of existing coronary heart disease among smokers is about 80% of that of the nonsmokers. Now just as an exercise, you can also calculate an odds of coronary heart disease among smokers and nonsmokers and calculate an odds ratio. If you do that, you get a prevalence odds ratio, which is also 82%. Why are they giving me similar results? Because remember going back to the earlier part of this course when we first talked about proportions and odds, that's what we're talking about in the bottom of this slide, a ratio of proportions, the prevalence ratio, or the ratios of odds, the prevalence odds ratio. I mentioned a few weeks ago that when you have a rare event, a small proportion, that the value for the proportion is very similar to the value for the odds. So the value for the odds associated with the proportion of 4% and 5% to two decimal places is still 4% and 5%. So not surprisingly, the value for the odds ratio in this situation is very similar to the value for the prevalence ratio. That was on a little side that I wanted to do. But let's get back now to this table, this cross-sectional study, and try to again interpret the results we can get from cross-sectional studies. The good news is we've already talked about this interpretation when we talked about prevalence because a cross-sectional study uses prevalence data like we have in this example. So a series of questions we might want to ask ourselves, and maybe the main question we want to ask ourselves, does this study help us answer the following question? Can we conclude from these data that smokers have different risks than non-smokers? And in particular, the fact that we saw a lower prevalence of coronary heart disease among smokers and non-smokers, do these data imply that smokers are at lower risk of developing coronary heart disease in the first place? If so, then this is a very efficient study for identifying risk factors for developing disease. But if we go back two or three weeks in the early parts of this course, we talked about prevalence data, we were really talking about cross-sectional studies at that time, and we talked about the limitations of cross-sectional studies because of the multiple interpretations and multiple reasons why one group of people might have a different prevalence of an outcome than another group of people. While in this case, why the smokers have a lower prevalence, 4%, of existing coronary heart disease than the non-smokers, who have a prevalence closer to 5%. What were the possible explanations that we talked about, the possible reasons for finding an association in a cross-sectional study? And one possible reason is incidence, that if one group, in this case the non-smokers, have a higher prevalence of disease than the smokers, it may be because non-smokers are at higher risk of developing coronary heart disease in the first place. We see more heart disease among the non-smokers than we see among the smokers. It might be because heart disease occurs more often among the non-smokers. Well that's a possibility numerically. But based on what we know now about the relationship of smoking and heart disease, it's very unlikely that the reason for this association we saw in this cross-sectional study was due to the fact that the non-smokers had higher incidence, higher risk of developing disease. That explanation is unlikely. But it is always a plausible explanation when you have cross-sectional studies. One reason why one group, the non-smokers, might have higher prevalence than another group, the smokers, is that that first group could be at higher risk for developing disease in the first place, have higher incidence of disease. Well, we also talked, well before we move on to the next explanation, let's address that in a little bit more detail. Here's another table you could develop from the Framingham Heart Study. Now looking at the incidence of developing coronary heart disease in the future, in the 24 years of follow-up. What is limited here in this case is we're talking about the first development of coronary heart disease, meaning what I've done is I've eliminated anybody who had pre-existing heart disease at the 1956 exam. I went back into the Framingham data. And what I did is I eliminated these 86 plus 108 individuals who already had coronary heart disease in 1956. They are no longer at risk for developing coronary heart disease in the future as a first event, a first case of coronary heart disease. So what I'm going to show you now is an analysis limited to those people who at 1956 did not have coronary heart disease and look at the incidence of heart disease during the next 24 years and see if the smokers really do have lower incidence, lower risk of developing disease than the non-smokers. So going back to that table, which I was just at, this is describing the incidence and the incidence rate of developing coronary heart disease among those smokers and among those non-smokers who are at risk of developing a first case of coronary heart disease. We would observe 39,636.77 person-years of observation, almost 40,000 person-years of observation from those smokers. And during that person-years of follow-up, 531 of those smokers developed coronary heart disease. We can calculate the incidence rate as we've done previously in this class. Divide the 531 by the 39,636.77 person-years and we calculate an incidence rate of developing coronary heart disease of 1.34 cases for every 100 person-years of follow-up. We can do the same thing. Let's look at the non-smokers who are at risk of developing their first case of coronary heart disease. In 1956, follow them for 24 years, measure their person time. We observe a little bit more than 41,000 person-years of observation and from it, 515 of those non-smokers developed coronary heart disease. We can calculate their incidence rate, 515 divided by 41,288.39 person-years gives me an incidence rate of 1.25 cases of coronary heart disease for every 100 person-years of follow-up. So notice the incidence rate among smokers is now higher than the incidence rate of the non-smokers, more or less what we expected. If we calculate an incidence rate ratio by dividing those, we see that the rate of developing heart disease among the smokers is 1.07 times that of the non-smokers. So it's unlikely that the reason, going back to the cross-sectional study, that we saw a higher prevalence of coronary heart disease among the non-smokers is because non-smokers are at higher risk. These data suggest, in general, smokers are at higher risk. They have a higher incidence rate. Well, when we talked about prevalence data a couple of weeks ago, we said another reason for developing, having a higher prevalence of disease in one population than another is the duration of disease. Maybe the reason that the non-smokers have a higher prevalence of disease in this cross-sectional study, 5% versus only 4% among the smokers, is because maybe the non-smokers who develop heart disease survive longer with it. They have longer durations. So when I come along in 1956 and take a snapshot of this population and see a higher prevalence of heart disease among the non-smokers, it had nothing to do with risk. It had to do with duration of disease. Well, that's possible, but it's probably unlikely that smoking could somehow impact the duration that you have with the disease, impact your survival once you have disease. So it's possible, but I'd still say not the most likely explanation for this association we see in the cross-sectional study. So those are two possible explanations you always have to consider in cross-sectional studies, incidence and duration. Because as I mentioned previously, prevalence is a function of both incidence and duration. But there's more problems, there's more potential reasons. And the one that might be the real reason for this study is something called reverse causation. We see in this data set that there's less heart disease among the smokers. That's the same thing as saying there's less smoking among the people who in 1956 had heart disease. But it wasn't the smoking that was influencing the heart disease prevalence in 1956. It was the heart disease development prior to 1956 that caused individuals either to stop or not start smoking. So the reason for the association was that once you were diagnosed with heart disease, your physician said, you better quit smoking, or you better not even consider starting smoking. So the reason we see this association in this cross-sectional study has nothing to do with the risk factor of smoking causing the disease or causing people to have longer duration with disease or shorter duration with disease. It has to do with the outcome, coronary heart disease influencing the exposure. That's what we mean by reverse causation. So now we see there are three problems with cross-sectional studies. There are three potential reasons we always have to consider to explain why we see a higher prevalence in one group of people than another group of people. And then on top of that, in any type of study, whether it's cross-sectional or the ones we're going to be talking about in the future weeks, the cohort study, the case control study, the experimental studies, whenever we find an association, the reasons for finding that association is the association might really be reflecting truth. Chance is what we hope to be true. That's what we hope to be able to report, that this study suggests that this factor, say, causes this outcome. But before we can conclude that, we have to always consider three other possible explanations for the reason for seeing an association in our study. We could have a biased study. We could have something called confounding. Or it could be just a reflection of chance happening. And I'd like to talk about each of those three other general reasons for finding an association in any type of study. I'm going to be applying them to this type, the cross-sectional study. So the bottom line is cross-sectional studies are potentially easy to do. They're based on a snapshot of data. But the problem is, whenever you find an association in cross-sectional studies, there are multiple explanations for the reason for that association. And often it's not easy to figure out which of those reasons was the most likely reason for causing that association. And for that reason, these cross-sectional studies are often done preliminarily in the development of a series of studies to try to answer a hypothesis. But it's usually not a definitive study. It usually requires future work using other type of study designs that we're going to talk about in future lectures. But before we do that, what I'd like to talk now in the next three lectures are these three generic, general reasons for finding an association in any type of study that is not reflecting a true association. I'm going to be referring to cross-sectional studies, but what I have to say in the future lectures, hold true for all the other study designs we'll be talking about in future lectures. So when we come back, I'll talk about the first of these alternative general reasons, namely bias. See you next time.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript