Research in the Cloud Textbook, forthcoming with Cambridge University Press

What Is an Experiment? Lessons from a Fake Knee Surgery

Aaron Moss, PhDJune 11, 20267 min read

Surgeons in an operating room performing a knee procedure, illustrating the sham knee surgery experiment that tested whether arthroscopic surgery actually works

In this post:

Why a surgeon performed fake knee surgeries and what his results revealed about a $5 billion-a-year procedure

The three essential ingredients that turn a study into an experiment

Why simply observing that patients improve after a treatment isn’t enough to prove the treatment works

How the same experimental logic powers everything from clinical drug trials to Netflix thumbnail testing to government policy evaluation

In the early 2000s, a surgeon in Houston did something that sounds unethical: he performed fake knee surgeries. The patients received anesthesia. The surgeon cut into their knees. And the operating room staff acted as if they were conducting a normal procedure. But everything wasn’t normal. It was surgery as theatre, and it was done for the sake of science.

The question is: why?

Dr. Bruce Moseley conducted the study because he wanted to know whether arthroscopic knee surgery actually worked. By the late 1990s, these surgeries were being performed over 650,000 times per year in the U.S. and cost roughly $5,000 each. That means patients and their insurance spent billions of dollars annually to relieve knee pain but no one had ever properly tested whether the procedure worked. So, Moseley conducted an experiment.

In the experiment, Moseley and his colleagues randomly assigned 180 patients to one of three groups. Two groups received real surgery (using one of two different techniques) and a third group received the sham surgery (incisions only, nothing else). Patients didn’t know which group they were in and neither did the staff assessing their recovery.

For the next two years, the researchers tracked people’s pain, physical function, and mobility. The results, which were published in the New England Journal of Medicine, showed no difference between groups.

Patients who received the real surgery improved, but so did those who had the fake surgery. In other words, arthroscopic knee surgery appeared to work through the placebo effect, a well-documented phenomenon in which patients improve because they believe a treatment will work, not because it actually does.

What Is an Experiment?

An experiment is a research method in which scientists deliberately manipulate one variable to observe its effect on another, while holding all other factors constant through random assignment.

Three elements define a true experiment:

Manipulation: The researcher actively changes something (the independent variable)
Random assignment: Participants are assigned to conditions by chance, ensuring groups are comparable
Control condition: A comparison group that doesn’t receive the manipulation

When these elements are present, researchers can make confident causal claims—something no other research method can do.

Why Experiments Are Necessary for Understanding Cause and Effect

The knee surgery study isn’t just a medical curiosity; it’s a demonstration of why experiments are necessary.

Before the study, arthroscopic knee surgery seemed obviously effective. Surgeons performed the procedure and patients got better. But this reasoning contained a flaw. It presumed that one thing follows from the other without ever actually testing the premise.

The premise behind arthroscopic knee surgery was undermined when the patients who received the sham procedure did just as well as those who received the real thing. People in the placebo group received the incisions, the anesthesia, the hospital gowns, the recovery instructions, the belief that something had been done to fix their knees. And that was enough.

Without a control group there was no way to know that the surgery itself wasn’t doing anything. And this highlights the fundamental insight behind experimental research: you can’t know whether something works by just observing that people improve after receiving it. You have to compare the group that receives treatment to people who didn’t receive it, under conditions where everything else is held constant.

The Three Ingredients of an Experiment

There are three essential elements in an effective experiment, all of which can be seen in the fake knee surgery study.

Manipulation: Controlling the Variables

First, there was a manipulation. The researchers didn’t just observe who happened to get surgery and who didn’t. They actively controlled the experience. Two groups of people received real surgery and one group got the sham surgery. This is what separates experiments from observational research. Instead of measuring variables as they naturally occur, experimenters deliberately change something to see what happens.

Random Assignment: Eliminating Confounds

Second, there was random assignment. Each patient had an equal chance of ending up in any of the three groups. The surgeon didn’t know which procedure he would perform until the patient was already in the operating room and an envelope was opened revealing the assignment. This randomization is crucial because it ensures that any pre-existing differences between patients—their age, the severity of their arthritis, their expectations, their pain tolerance, anything else—are distributed equally across groups. Without random assignment, you can never be sure that the groups were comparable to begin with. With random assignment, the only difference between groups is the manipulation introduced by the researchers.

Control Groups: Establishing a Baseline

Third, there was a control condition. The sham surgery group provided a baseline against which the real procedures could be compared. Without this comparison, or control group, the researchers would have seen patients improve after surgery and concluded—wrongly—that the surgery was effective. The control group revealed that patients improve regardless of whether the procedure is performed.

Together, these three ingredients allowed the researchers to make a causal claim with confidence: the surgery was not responsible for patient improvement.

What Happened Next

You might expect that the knee surgery study transformed medical practice. It didn’t, at least not completely.

The study was criticized. Some argued the sample was too small, or too limited (it was conducted at a VA hospital with mostly male patients). Others questioned the outcome measures. But the finding was later replicated. A 2008 study in the same journal, using different methods and addressing the earlier criticisms, found the same result: arthroscopic surgery for knee osteoarthritis was no better than physical therapy alone. A Cochrane review—the standard of medical evidence—concluded there was “gold level evidence” that the procedure provided no benefit. Additional sham-controlled trials of related procedures told similar stories.

And then practice began to change, slowly. The rate of arthroscopic knee surgery for osteoarthritis declined by about 40% in the years following the studies. Guidelines now recommend against the procedure for most patients. But the surgery hasn’t disappeared. Hundreds of thousands of procedures are still performed each year, demonstrating that the gap between evidence and practice can take decades to close.

From Clinical Trials to A/B Testing: Experiments in the Real World

While Bruce Moseley’s sham surgery demonstrates why experiments are important, the logic of experimentation extends far beyond academic settings. In fact, it reaches into many aspects of daily life.

Clinical trials use the same principles to test new drugs. These studies—called randomized controlled trials or RCTs—are the gold standard of medical evidence. Patients are randomly assigned to receive either the treatment or a placebo, and neither they nor their doctors know which group they’re in (a “double-blind” design). This is how we know which medications actually work and which ones just seem to because patients expect them to.

A/B testing applies experimental logic to the digital world. When Netflix tests different thumbnail images for a movie, they’re running an experiment: randomly assigning users to see different versions and measuring which one generates more clicks. Google reportedly runs over 10,000 such experiments per year. Every major tech company uses this approach to optimize everything from website layouts to the timing of notifications.

Policy evaluation increasingly relies on randomized trials. Governments test educational interventions, job training programs, and public health campaigns by randomly assigning some communities or people to receive the program and comparing their outcomes to control groups.

The logic is always the same: manipulate one thing, hold everything else constant through randomization, and measure what happens. Whether you’re testing a surgery, a website design, or a policy, the experimental method provides the strongest possible evidence for whether your intervention actually causes the effects you hope for.

What You’ll Learn in Chapter 7

Chapter 7 of Research in the Cloud teaches you to think like a behavioral scientist conducting an experiment, and it does so by asking you to design and run experiments yourself.

You’ll start with the fundamentals: what makes an experiment an experiment, how random assignment solves the third-variable problem, and why control groups are essential for drawing causal conclusions. You’ll see how even simple manipulations—like changing the order of two questions in a survey—can reveal causal relationships that would be invisible with other methods.

From there, you’ll work with multiple versions of the Heinz dilemma, a classic scenario in moral psychology. You’ll learn to manipulate perspective-taking to see how it affects moral judgments. You’ll build the experiment yourself in Qualtrics or Engage, implement random assignment, collect data, and analyze the results using t-tests and ANOVA.

You’ll also explore variations on the basic experimental design. In between-subjects experiments, different people experience different conditions—like the knee surgery patients who were assigned to real surgery, lavage, or sham. In within-subjects experiments, the same people experience all conditions, serving as their own controls. Each approach has advantages and trade-offs, and you’ll learn when to use which.

Finally, you’ll tackle factorial designs, which manipulate multiple variables at once. What if you want to know how perspective-taking affects moral judgments and whether that effect depends on how wealthy the person in the scenario is? Factorial designs let you examine these kinds of interactions, revealing patterns that simpler experiments would miss.

By the end of the chapter, you won’t just understand experiments conceptually—you’ll have designed, conducted, and analyzed your own.

This post is part of a series exploring the chapters of Research in the Cloud: An Introduction to Modern Methods in Behavioral Science by Aaron Moss, Jonathan Robinson, and Leib Litman. Read Chapter 7 for free here.

Ready to design and run your own experiments?
Read Chapter 7 for free! Research in the Cloud teaches you how to use manipulation, random assignment, and control groups to make confident causal claims.