What Does It Mean for Research to Be Statistically Significant?

CloudResearch

Part 1: How is Statistical Significance Defined in Research?

The world today is drowning in data.

That may sound like hyperbole, but consider this: In 2024, humans and machines around the world produced approximately 402.74 million terabytes of data—each day. That's roughly 2.5 quintillion bytes of information created every 24 hours, adding up to an astounding 147 zettabytes over the course of the year.

According to Domo's latest Data Never Sleeps report, every single minute people conduct almost 5.9 million Google searches, view 138.9 million Reels on Facebook and Instagram, watch 3.4 million YouTube videos, and stream over 362,000 hours of content on Netflix. With 5.52 billion people now connected to the internet globally, these numbers—and the world's total data—are expected to reach 181 zettabytes by 2025, continuing their exponential growth.

For behavioral researchers and businesses, this data represents a valuable opportunity. However, using data to learn about human behavior or make decisions about consumer behavior often requires an understanding of statistics and statistical significance.


What Does it Mean to Be Statistically Significant?

Statistical significance is a measurement of whether an observed difference represents a true effect or is merely the result of chance. In other words, it helps researchers determine if their findings reflect a real relationship between variables or just random noise in the data.

To evaluate whether a finding is statistically significant, researchers conduct a statistical significance test through a process known as null hypothesis significance testing. Null hypothesis significance testing is less of a mathematical formula and more of a logical process for analyzing data and thinking about the strength and legitimacy of a finding.

An Example of Null Hypothesis Significance Testing

Imagine a Vice President of Marketing asks her team to test a new layout for the company website. The new layout streamlines the user experience by making it easier for people to place orders and suggesting additional items to go along with each customer's purchase. After testing the new website, the VP finds that visitors spend an average of $12.63. Under the old layout, visitors spent an average of $12.32. This means the new layout increases average spending by $0.31 per person. The question the VP must answer is whether the difference of $0.31 per person represents a meaningful improvement or is simply the result of random chance.

To answer this question with statistical analysis, the VP begins by adopting a skeptical stance toward her data. This stance is known as the null hypothesis. The null hypothesis assumes that whatever researchers are studying does not actually exist in the population of interest. So, the VP assumes the new website layout does not influence how much people spend.

With the null hypothesis in mind, the manager analyzes her data to see how likely it is that she would obtain the results observed in her study—the average difference of $0.31 per visitor—if the change in website layout actually caused no difference in people's spending (i.e., if the null hypothesis is true). If the probability of obtaining the observed results is low, the manager will reject the null hypothesis and conclude that her finding is statistically significant (i.e., the website change represents a true effect).

Measuring Statistical Significance: Understanding the P Value (Significance Level)

Statistically significant findings indicate not only that the researchers' results are unlikely the result of random chance, but also that there is a true effect or relationship between the variables being studied. However, to avoid mistaking random chance for real findings, researchers set strict criteria for their tests.. This criterion is known as the significance level.

Within the social and behavioral sciences, researchers often adopt a significance level of 5%. This means researchers are only willing to conclude that the results of their study are statistically significant if the probability of obtaining those results if the null hypothesis were true—known as the p value—is less than 5%.

Beyond p values, researchers often calculate confidence intervals to understand statistical significance. A confidence interval provides a range of values within which the true effect likely falls. For example, our VP might find that the average spending increase falls between $0.15 and $0.47 with 95% confidence—meaning she can be reasonably certain the true increase lies within this range rather than at exactly $0.31.

Five percent represents a stringent criterion, but there is nothing magical about it. In medical research, significance levels are often set at 1%. In cognitive neuroscience, researchers often adopt significance levels well below 1%. And, when astronomers seek to explain aspects of the universe or physicists study new particles like the Higgs Boson they set significance levels several orders of magnitude below .05.

In other research contexts like business or industry, researchers may set more lenient significance levels. However, in all research, the more stringently a researcher sets their significance level, the more confident they can be that their results are not due to random chance.

Key Takeaway: A statistically significant result means you've found a true effect that's unlikely to be the result of chance. Researchers use a statistical significance test to analyze data and calculate the probability (p value) that their findings occurred randomly.


What Factors Affect the Power of a Hypothesis Test?

Determining whether a result is statistically significant is only one half of the hypothesis testing equation. The other half is ensuring that the statistical tests a researcher conducts are powerful enough to detect an effect if one really exists. That is, If a researcher concludes there's no effect, that finding only matters if their study was strong enough to detect an effect had one actually existed.

The power of a hypothesis test is influenced by multiple factors.

1. Sample size

Sample size, or the number of participants the researcher collects data from, affects the power of a hypothesis test. Larger samples lead to higher-powered tests than smaller samples. In addition, large samples are more likely to produce replicable results because extreme scores that occur by chance are more likely to balance out in a large sample rather than in a small one.

Example: Imagine you're testing whether a new app feature increases user engagement. If you test with only 20 users, one unusually enthusiastic person could skew your entire dataset. But if you test with 2,000 users, individual quirks average out, giving you a clearer picture of the feature's true effect. The larger sample makes it easier to detect even modest improvements in engagement.

2. Significance level

Although setting a low significance level helps researchers ensure their results are not due to chance, it also lowers their power to detect an effect because it makes rejecting the null hypothesis harder. In this respect, the significance level a researcher selects is often in competition with power.

Example: Think of the significance level as the height of a hurdle your data must clear. Setting it at 1% instead of 5% is like raising the bar higher—it makes you more confident when you do find something, but it also means you might miss real effects that don't quite reach that demanding threshold. A pharmaceutical company testing a new drug might use the stricter 1% level because the stakes are so high, even though it means they need stronger evidence to prove the drug works.

3. Standard deviations

Standard deviations represent unexplained variability within data, also known as error. Generally speaking, the more unexplained variability within a dataset, the less power there is to detect a true effect. Unexplained variability can be the result of measurement error, individual differences among participants, or situational noise.

Example: Suppose you're measuring whether a meditation app reduces stress. If you test people in quiet, controlled environments at the same time each day, your measurements will be consistent. But if some people use the app during their morning commute, others during lunch breaks, and others before bed, you've introduced lots of variability that has nothing to do with the app itself. This "noise" makes it harder to hear the "signal" of the app's true effect, reducing your power to detect whether meditation actually helps.

4. Effect size

A final factor that influences power is the size of the effect being studied. As you might expect, big changes in behavior are easier to detect than small ones.

Example: If a new teaching method improves test scores by 30 points, on average, you'll spot that improvement easily even with a modest-sized study. But if the method only improves scores by 2 points, you'll need a much larger sample size and more careful measurement to distinguish that small improvement from random variation.


Why is Statistical Significance Important for Researchers?

Statistical significance is important because it allows researchers to be confident that their findings are real, reliable, and not due to random chance. But statistical significance is not equally important to all researchers in all situations. The importance of obtaining statistically significant results depends on what a researcher studies and within what context.


Does Your Study Need to Be Statistically Significant?

Within academic research, statistical significance is often critical because academic researchers study theoretical relationships between different variables and behavior. Furthermore, the goal of analyzing data within academic research is often to publish research reports in scientific journals. The threshold for publishing in academic journals is often a series of statistically significant results.

Outside of academia, statistical significance is less important. Researchers, managers, and decision makers in business may use statistical significance to understand how strongly the results of a study should inform the decisions they make. But, because statistical significance is simply a way of quantifying how much confidence to hold in a research finding, people in industry are often more interested in a finding's practical significance than statistical significance.

Practical Significance vs. Statistical Significance

Imagine you're a candidate for political office. Maybe you have decided to run for local or state-wide office, or, if you're feeling bold, imagine you're running for President.

During the campaign, your team tests messages intended to mobilize voters. Now you and your team must decide which ones to adopt.

If you go with Message A, 41% of registered voters say they are likely to turn out at the polls and cast a ballot. If you go with Message B, the number drops to 37%. As a candidate, should you care whether the difference is statistically significant at a p value below .05?

The answer is of course not. What you care about more than statistical significance is practical significance: the likelihood that the difference between groups is large enough to be meaningful in real life.

You should ensure there is some rigor behind the difference in messages before you spend money on a marketing campaign, but when elections are sometimes decided by as little as one vote you should adopt the message that brings more people out to vote. Within business and industry, the practical significance of a research finding is often equally if not more important than the statistical significance. In addition, when findings have large practical significance, they are almost always statistically significant too.


Conducting statistically significant research is a challenge, but it's a challenge worth tackling. Flawed data and faulty analyses only lead to poor decisions. Start taking steps to ensure your surveys and experiments produce valid results by using CloudResearch. If you have the team to conduct your own studies, CloudResearch can help you find large samples of high quality online participants quickly and easily. Regardless of your demographic criteria or sample size, we can help you get the participants you need. If your team doesn't have the resources to run a study, we can run it for you. Our team of expert social scientists, computer scientists, and software engineers can design any study, collect the data, and analyze the results for you. Let us show you how conducting statistically significant research can improve your decision-making today.


Continue Reading: A Researcher's Guide to Statistical Significance and Sample Size Calculations

Related Articles

What Is Data Quality and Why Is It Important?
What Is Data Quality and Why Is It Important?

If you were a researcher studying human behavior 30 years ago, your options for identifying participants for your studies were limited. If you worked at a university, you might be...

Read More >
How to Identify and Handle Invalid Responses to Online Surveys
How to Identify and Handle Invalid Responses to Online Surveys

As a researcher, you are aware that planning studies, designing materials and collecting data each take a lot of work. So when you get your hands on a new dataset,..

Read More >

SUBSCRIBE TO RECEIVE UPDATES

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.