How to Reduce Sampling Bias in Research

Aaron Moss, PhD

By Aaron Moss, PhD, Cheskie Rosenzweig, PhD, & Leib Litman, PhD

Online Researcher's Sampling Guide, Part 2:
How to Reduce Sampling Error in Research

Among public opinion pollsters, the year 1936 lives in infamy. That is because that year, the magazine Literary Digest conducted one of the worst public opinion polls in history.

After correctly predicting the previous five presidential contests, the Digest mailed questionnaires to more than 10 million Americans asking who they planned to vote for in the 1936 presidential race. Based on more than 2 million responses, the Digest confidently predicted that Alf Landon would win with 62% of the vote. Yet on election day, it was Franklin Roosevelt who won in a landslide, leaving the Digest to wonder how it got the outcome so wrong.

The answer was a classic example of sampling bias.

Because the Digest identified potential voters using telephone and automobile records, the people they sampled tended to be wealthy. While wealthy Americans preferred Landon over Roosevelt, poor Americans favored Roosevelt by a strong margin. Therefore, under sampling poor people caused the Digest to get the outcome of the election completely backward. And, the Literary Digest's mistake illustrates how sampling error occurs when there's a mismatch between who participates in a study and the population researchers want to understand.


What Are the Different Types of Sampling Bias?

As the Literary Digest example demonstrates, sampling error occurs when the researcher selects a sample that doesn't accurately represent the target population. This often happens when the range of people who can participate in a study is systematically restricted. This sort of sampling error can occur in several ways.

First, the people who participate in a study may be selected in a way that makes them systematically different than the target population (like in the Digest example). When this occurs, it is known as coverage bias (also called sample frame error or selection error). Essentially, coverage bias occurs when the pool the researcher draws participants from doesn't match the population they're trying to understand.

Second, some people who have the opportunity to participate in a study may choose not to. When these people share a common characteristic, such as low trust in institutions, the people who participate may be significantly different than those who don't, leading to non-response bias. Response error, where participants give misleading or incorrect answers, can compound this problem by distorting the data even from those who do participant.

Finally, the opposite of non-response bias, self-selection bias, occurs the researcher receives responses only from people who all share a characteristic that makes them systematically different than those who do not participate. For example, a survey about customer satisfaction might primarily attract extremely satisfied or extremely dissatisfied customers, while moderately satisfied customers ignore it entirely.

Whether or not sampling errors like those above affect the interpretation of a study's results depends on both how researchers gather their data and how they use the data.

There are generally three types of research within the behavioral sciences.

  1. Descriptive Research: Descriptive research describes how often something occurs in a population. This may be a survey of consumer confidence, an opinion poll, or a study to determine which brand of cleaning products consumers use.
  2. Associative Research: Associative research looks for relationships between two or more variables (i.e., correlations). For example, a marketing team may look to see if their recent advertising expenditures are related to sales, a psychologist may investigate whether income predicts happiness, or an app developer may investigate whether features of its app are related to the time people spend using the app.
  3. Experimental Research: Experiments seek to establish cause and effect, so researchers may seek to answer questions, such as: Does message A or B make people more likely to reuse their linens in hotel rooms, are people more likely to vote in elections if their friends vote, and does shaking hands before a negotiation promote deal-making?

Returning to our conversation about sampling bias, the strength of random sampling methods is that they eliminate most sources of bias. When researchers use random sampling, they achieve a representative sample, which mirrors the target population's characteristics. This allows the researchers to reasonably apply their results to the target population, regardless of whether the research aims to describe the frequency of behavior, investigate the association between variables, or test experimental effects.

When researchers use non-random samples, they have to think more carefully about potential sources of bias. For example, one form of non-random sampling is known as convenience sampling. When researchers use convenience sampling, they gather data from whomever is readily available. Because this approach leaves a study open to many sampling errors, it is common for researchers running studies online to try to control or eliminate these issues.

When researchers take this middle ground, they gather what may be called controlled samples, which are a kind of hybrid, falling somewhere between random and non-random samples. Although controlled samples are gathered from sources that are based on convenience, researchers take active measures to eliminate sources of error and maximize the generalizability of their results. Research suggests that these hybrid samples often provide a reasonable trade-off between the high cost, impracticality and slow nature of random sampling and the unconstrained sources of bias inherent to convenience samples, as researchers often find similar results results in both samples.


How Can Sampling Errors Affect an Online Survey?

People typically find it easiest to think about sampling bias within the realm of random samples, when the researcher's goal is to apply the findings from a sample to an entire population. However, most research isn't conducted with random samples. Instead, researchers in academia, business, and market research often run studies that purposefully select participants to control for known sources of bias. They do this because of the ways that sampling error can affect online studies

Simply put, sampling error with non-random samples can distort research findings. When the demographic variables or other characteristics of the people in a sample are systematically related to the topic being investigated, researchers will obtain findings that are either stronger or weaker than exists within the target population.

For example, consider a researcher on the East Coast who launches an online study early in the morning. Because most online studies fill up in a matter of hours, people who live on the West Coast are unlikely to take part in the study due to the time difference (Casey et al., 2017).

If, as research suggests, people on the West Coast have different attitudes toward things like social norms (e.g. Plaut et al., 2012) and climate than those on the East Coast, a study investigating behaviors to combat climate change might have a region-specific sampling error. This sampling error could distort the researcher's findings by leading them to believe the relationship between social norms and behaviors to combat climate change is either weaker or stronger than it actually is within a broader population.


Understanding Sample Sizes and Statistical Precision

Beyond avoiding bias, researchers must also ensure their sample sizes are large enough to detect meaningful effects. Larger samples provide more precise estimates and narrower confidence intervals—the range within which the true population value likely falls.

For example, a study with 50 participants might find that 60% prefer Product A, with a confidence interval of ±14%. But a study with 500 participants showing the same 60% preference would have a much tighter confidence interval of say ±4%, providing far more actionable insight. When combined with careful attention to reducing sampling errors, adequate sample sizes help ensure research findings are both accurate and reliable.


Common Causes of Sampling Error in Research

Fortunately for researchers who conduct studies online, many causes of sampling error are well known. They include:

Participant Demographics

Participant demographics are a common source of sampling bias. For some research questions, participant age, gender, ethnicity, religion, political ideology, socioeconomic status or other demographic characteristics might be related to the research question. If so, the researcher may want to control for these variables.

Example:

Research shows that if you ask most Americans to consider how God feels about abortion before reporting their own attitudes, they will become significantly more opposed to abortion than if you asked them to report their own attitudes before considering God's attitude (Epley, Converse, Delbosc, Monteleon, Cacioppo, 2009). As you might suspect, however, this effect depends on how religious people are. People on Mechanical Turk are, as a whole, less religious than the general U.S. population. Thus, when trying to replicate the “God on Our Side” effect or other questions related to religion, researchers might want to control for the religion of participants.

Platform Factors

Depending on where the researcher gathers data, there may be factors about the platform or website that introduce bias. For example, most online panels work hard to sign up potential participants. Some participants within these panels complete more studies than others and become familiar with common research procedures. For some studies, participant experience or tenure might introduce a form of sample error that researchers want to control.

Example:

Some people on Amazon's Mechanical Turk have a lot of experience with research studies. At times, this experience may reduce the strength of common experimental manipulations. To avoid the potential problems of sampling highly experienced participants, researchers may choose to sample in a way that ensures participants are inexperienced. By choosing to sample inexperienced participants, researchers can control for the potential biasing effect of participant experience.

Factors of the Online Environment

A benefit of online studies is that they are often available anytime that's convenient for research participants. This convenience, however, may also have a drawback. For some research questions having most participants complete the study at an unusual time of day (e.g., 3:00 a.m.) or on specific days of the week may introduce sampling error. In addition, the ability of participants to easily quit a study may produce problems for experiments that rely on random assignment.

Example:

As originally reported by Kouchaki and Smith (2014), the morning morality effect explains a general tendency for people to act more ethically early in the day and less ethically later in the day, as resources for self-control are taxed. Although the original research reported a general effect of time of day, subsequent research suggested that accounting for people's circadian typology, in addition to time of day, might do a better job of predicting moral behavior than time of day alone (Gunia, Barnes, Sah, 2014). Thus, studies that investigate phenomena that vary across time of day and fail to account for variation in peoples' circadian typology may yield incomplete or inaccurate findings.


Strategies for Reducing Sampling Errors in Online Surveys

Avoiding sampling bias requires thoughtful planning and careful execution during data collection. When running studies online, researchers need to think about potential sources of bias and how much of a threat each poses to their research before starting data collection.

Correcting for Participant Demographics

If researchers think participant gender, age, ethnicity or some other demographic characteristic is a potential source of bias then they can construct quotas for each identified demographic. Quotas allow researchers to evenly sample people from different demographic groups within the study.

In fact, a commonly used quota system for many online studies is a census-matched template. With census matching, quotas are automatically applied to a study so that the final sample has participants of different ages and ethnicities that are based on each group's representation in the U.S. census. Similar quotas can be used for a variety of other demographic variables.

Understanding Platform Characteristics

The participants available in different online platforms are not all equally representative of the US population. Several publications have, for example, found that participants on crowdsourcing sites are less religious than the overall US population.

Because researchers know this about crowdsourcing sites like MTurk, a researcher examining the influence of religion on attitudes toward capital punishment would know to target people with a wide range of religious beliefs so that the sample would have enough diversity to test the idea. Knowledge about the platform can help avoid bias.

Controlling Platform Factors

Some online platforms give researchers limited control over the data collection process, while others give researchers complete control. On platforms that give researchers control, such as Amazon's Mechanical Turk or CloudResearch Connect, researchers can choose participants based on participation in previous studies or their overall experience on the platform. Each of these groups of participants can be useful for different types of projects.

Minimizing the Influence of Environmental Factors

Similar to a researcher's ability to control factors related to the research platform, the ability to curb the influence of the environment on a participant's responses depends on how much control the platform gives the researcher. For example, with many online platforms, researchers do not have control over when the study is launched or when it is available to participants. Yet in other platforms, such as CloudResearch Connect, they do.


CloudResearch makes it easy to trust your data by giving you the knowledge and tools to control sources of sampling bias. You can use our demographic targeting tools to control sample composition, gather a census-matched sample, or minimize the effect of environmental factors by controlling when your data collection occurs. Our platform helps researchers focus on reducing sampling errors to ensure their findings accurately reflect the target population. Contact us to learn how you can get the research sample you need.


Continue Reading: The Online Researcher's Guide to Sampling

Related Articles

What Is Data Quality and Why Is It Important?
What Is Data Quality and Why Is It Important?

If you were a researcher studying human behavior 30 years ago, your options for identifying participants for your studies were limited. If you worked at a university, you might be...

Read More >
How to Identify and Handle Invalid Responses to Online Surveys
How to Identify and Handle Invalid Responses to Online Surveys

As a researcher, you are aware that planning studies, designing materials and collecting data each take a lot of work. So when you get your hands on a new dataset,..

Read More >

SUBSCRIBE TO RECEIVE UPDATES

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.