Research in the Cloud Textbook, forthcoming with Cambridge University Press

Building a Causal Case: How Scientists Showed that Smoking Kills Without Conducting an Experiment

Aaron Moss, PhDMay 28, 20266 min read

Historical cigarette advertisement from the mid-20th century, before smoking was proven to cause cancer

In this post:

Why scientists could never run the experiment needed to definitively prove smoking causes cancer and how they built an airtight case anyway

How the tobacco industry exploited the limits of correlational research to sow public doubt for decades

What causal inference actually is and the five strategies researchers use to build a cause-and-effect case when experiments aren’t possible

How ruling out alternative explanations one by one can be almost as powerful as a controlled experiment

If you’re under thirty, here’s something that might be hard to imagine: people used to smoke, everywhere. In airplanes. In movie theaters. In hospital waiting rooms. When I was a kid in the 1990s, restaurants would ask if you wanted to sit in the smoking or non-smoking section. Then they placed you in areas separated by a half-wall, plastic dividers, or a few feet of air—as if cigarette smoke respected boundaries.

Today, smoking is banned in most places and how society used to operate strikes most people as absurd. This transformation occurred because scientists established that smoking causes cancer. While that may sound straightforward, scientists established this connection without ever conducting the experiments normally required to show causation.

The question is: how did they do it?

Experimental Research: The Key to Demonstrating Cause and Effect

I’ll explain how scientists convinced people that smoking causes cancer after we explore why experiments are important for establishing cause and effect.

Within science, experiments allow researchers to show that one thing causes another. In experiments that involve people, as those in the social and behavioral sciences do, researchers take a group of participants and randomly assign them to either an experimental condition or a control group. Then, the researchers measure what effect the manipulation of some variable has on some outcome. The idea is that if two things are related, then wiggling the cause (by manipulating a variable) should produce a wiggle in the effect (by changing the outcome).

Illustration of an experiment showing a researcher wiggling a rope labeled manipulation, with manipulation and control conditions feeding into a dependent variable outcome — Illustration of an experiment. If two things are causally related, then wiggling the cause should create a change in the effect.

The logic of experiments means that to truly show smoking causes cancer, researchers would have needed to randomly assign some people to smoke for years on end while other people would be forbidden from touching a cigarette. After ten, twenty, or even thirty years, the researchers could compare rates of cancer and other health measures among the two groups. Doing so would reveal whether smoking caused cancer or not.

Obviously, this kind of study was never conducted. Not only is it impractical, but it’s unethical. No institutional review board would allow researchers to randomly assign people to a behavior that might kill them. So researchers faced a problem. They had good descriptive data about smoking and strong correlational evidence showing that smokers got lung cancer at higher rates than non-smokers. But a correlation doesn’t prove causation, and both the researchers and the tobacco industry knew it.

Correlation Doesn’t Prove Causation: The Tobacco Industry’s Playbook

For decades, cigarette companies funded scientists and paid lawyers who repeated a simple argument: the link between smoking and cancer is merely statistical. The argument was clever because it exploited real methodological concerns with correlational studies.

Maybe, the argument went, people who smoked were different from those who didn’t in other ways—more prone to risk, more stressed, more likely to engage in other unhealthy behaviors. Maybe a genetic factor caused both the urge to smoke and susceptibility to cancer. In other words, multiple “third variables” could explain why smokers got cancer more often.

The industry also raised questions about which direction the relationship ran. Perhaps, they suggested, people who are predisposed to illness are also drawn to smoking as self-medication. If this were true, the correlation between smoking and health would be real, but the causation would run backwards to what critics of smoking suggested.

These arguments sound bad in retrospect. But at the time, they were effective because they sowed doubt. Internal documents from tobacco companies later revealed that their strategy wasn’t to prove smoking was safe. They just wanted to keep the question open and the limits of correlational research were their shield, at least for a while.

How Scientists Built the Case

So how did researchers overcome this doubt? How did they build a case for causation that was so strong it changed laws, bankrupted companies, and transformed daily life all without conducting the definitive experiment?

They did it by accumulating evidence from multiple angles. Each study eliminated alternative explanations one by one.

First, scientists established a dose-response relationship. The more cigarettes a person smoked, and the longer they smoked, the higher their risk of lung cancer. This pattern is hard to explain if a third variable is responsible. If smoking cigarettes didn’t cause cancer, why would the risk scale so reliably with the amount people smoked?

Then they established temporal precedence. Longitudinal studies tracked people over time, showing that smoking preceded the development of cancer—often by decades. People didn’t start smoking because they sensed they were developing cancer. They smoked and then developed cancer. The timeline clearly ran in one direction.

Researchers also systematically controlled for alternative explanations like people’s diet, alcohol consumption, occupation, socioeconomic status, and family history. The relationship between smoking and cancer persisted.

Meanwhile, laboratory research provided a biological case. Scientists identified carcinogens in cigarette tar, documented how these chemicals caused mutations, and traced a pathway from exposure to the formation of tumors. The correlation wasn’t just statistical; it had a physical explanation.

Finally, the researchers replicated their findings. Studies in the United States, the United Kingdom, Japan, and elsewhere all pointed to the same conclusion. Independent researchers using different methods kept arriving at the same answer.

No single piece of evidence was definitive. But together, the evidence from different studies formed a case that became impossible to dismiss.

What Is Causal Inference?

In the behavioral sciences, causal inference is the process of drawing conclusions about cause-and-effect relationships from data. Experiments are the best way to make causal inferences, but as we have seen, experiments aren’t always possible, ethical, or practical.

When researchers cannot conduct experiments, they use causal inference methods to build the strongest possible case. These methods don’t provide the same certainty as experiments, but they can make alternative explanations increasingly implausible.

The smoking case illustrates the core strategies:

Dose-response relationships: Does more of the cause produce more of the effect?
Temporal precedence: Does the cause come before the effect?
Statistical controls: Does the relationship hold after accounting for confounds and third variables?
Biological/theoretical plausibility: Is there a mechanism that explains the relationship?
Replication: Does the finding hold across studies, samples, and methods?

No single strategy proves causation. But together, they can build a case strong enough to change minds and, in the case of smoking, change the world.

What You’ll Learn in Chapter 6

In Chapter 6 of Research in the Cloud you will learn to use the tools and methods scientists use to create a causal case when experiments are not an option.

You’ll start with the fundamental problems that limit correlational research: the directionality problem (which variable influences which?) and the third-variable problem (could something else explain the relationship?).

From there, you’ll learn the methods scientists use to address these problems. You will learn to use statistical controls to hold potential confounds constant, asking: would the relationship between A and B still exist if everyone had the same level of C? You’ll also learn to use techniques like ANCOVA and multiple regression to answer these questions with real data. And, you’ll learn about longitudinal designs where researchers measure variables at multiple points in time. You’ll analyze data from real participants tracked over a full year, testing whether depression predicts future anxiety (or vice versa) even after controlling for baseline levels of both variables.

By the end of the chapter, you’ll understand that causal inference isn’t about finding a single definitive answer. It’s about building a case—ruling out alternative explanations, establishing temporal precedence, and accumulating evidence until the data is strong enough to speak for itself.

This post is part of a series exploring the chapters of Research in the Cloud: An Introduction to Modern Methods in Behavioral Science by Aaron Moss, Jonathan Robinson, and Leib Litman.

Ready to learn how to build a causal case from real-world data?
Read Chapter 6 for free! Research in the Cloud teaches you how to use statistical controls, longitudinal designs, and other methods to make causal inferences when experiments aren’t possible.