Introduction
In the summer of 2024, OpenAI—the company that created ChatGPT—collaborated with the MIT Media Lab to conduct a study that would have been nearly impossible just a few years before. The study examined how frequent interactions with AI chatbots affect human psychology.
In the study, researchers followed nearly 1,000 people for a month. Each day, participants completed various tasks and interacted with ChatGPT. Then, at the end of the study, the researchers measured how these interactions affected people's feelings of loneliness and their emotional dependence on AI. One of the study's main findings was that interacting with ChatGPT reduced people's feelings of loneliness, but at the cost of becoming more dependent on AI (Phang et al., 2025). What makes this study interesting, however, isn't only its findings. It is the fact it happened at all.
Just a few decades ago, following 1,000 people every day for a month would have presented overwhelming logistical challenges. Researchers would have struggled to track the behaviors of so many people and prevent most of them from dropping out of the study. The project would have required enormous amounts of time and money, and even then, it would have faced long odds of success. Times have changed.
Today, online tools make it much easier to find research participants and to conduct a wide range of behavioral studies. In fact, this study was conducted on CloudResearch Connect—the same platform that was used to collect data for all the projects in Part I of this book. Using online platforms like Connect, researchers can find participants, communicate with them throughout the study, offer payments and bonuses that incentivize participation, and monitor each person's progress over the course of the study. All these activities can be accomplished with just a fraction of the time and resources that offline studies require, which is why online research has transformed behavioral science over the last twenty years.
This chapter describes the options for online participant recruitment in the behavioral sciences. In Module 9.1, we will learn about the history of participant recruitment in the behavioral sciences and how online tools have transformed research in the last two decades. We will explore the different sources researchers use to find online participants beginning with the largest part of the online ecosystem: market research panels. Each year, these panels facilitate approximately 5 billion surveys. We will examine how they operate, which industries they power, and the tradeoffs they present to researchers. Then, we will look at researcher-centric online platforms like Connect and Mechanical Turk, seeking to understand why these platforms host most academic studies.
After exploring where researchers find participants, Module 9.2 discusses sampling and issues of representativeness in online samples. It describes probability and non-probability sampling and the fit-for-purpose framework, which is useful for deciding which sources of participants are the right fit for specific projects. By the end of the chapter, we will understand the different ways to recruit participants online, the advantages of each approach, and how to choose the right participant source for different research questions. Understanding these issues is an important part of successfully conducting research in the modern era.
Chapter Outline
Options for Online Participant Recruitment
Learn about the options for recruiting participants online
A Brief History of Finding Participants
In the summer of 1961, Stanley Milgram began work on what would become one of the most famous studies in the history of psychology. Driven by a pressing question—why do people obey unethical orders?—he designed a study in which ordinary people were asked to deliver increasingly powerful electric shocks to a stranger whenever that person gave incorrect answers on a memory test (the stranger was actually an actor).
In a surprise to many, eighty percent of participants delivered shocks up to 150-volts—a point at which the person receiving the shocks screamed, complained of a heart condition, and began asking to be released from the study. More than 60% of participants continued all the way to the maximum shock of 450-volts, an area labeled "DANGER SEVERE SHOCK: XXX" on the control panel and long past when the stranger in the other room had stopped responding (Milgram, 1963).
While the results of Milgram's study have been discussed and debated for decades (e.g., Benjamin & Simpson, 2009; Gilbert, 1981; Griggs, 2017), less attention has been paid to how he found participants. To recruit people, Milgram placed an ad in the newspaper (Figure 9.1). The ad sought men between the ages of 20 and 50 who were employed in various occupations (later studies included women). Anyone interested in participating was asked to mail a slip of paper with their contact information, demographic data, and an indication of when they could participate. After receiving the replies, Milgram's team had to call each prospective participant and schedule a time for their participation. They also paid people for their time. Milgram offered each participant $4 plus carfare, about $50 in today's dollars.
This method of community-based recruitment was not unique to Milgram. In fact, it was a standard technique for finding people to participate in research at the time. In addition to running ads in the local newspaper, researchers would often post fliers on bulletin boards, recruit by word of mouth, or send direct mail to potential participants. All these recruitment options were slow and limiting. Recruiting enough people for one study could take several months and researchers were restricted in the diversity of people they could reach.
In the decades following Milgram's work, the challenge of finding participants persisted. Over time, researchers came to rely on undergraduate students to participate in studies. But even though undergraduate students solved one problem, they created another: research became limited to primarily young, well-educated people from Western, industrialized nations (Henrich et al., 2010). For the next four decades, up to 80% of studies within fields like experimental psychology were conducted with student participants, raising concerns about how well the findings generalized to people more broadly (e.g., Sears, 1986).
Then, starting in the early 2010's, the development of online recruitment tools dramatically transformed how behavioral scientists find participants (e.g., Buhrmester et al., 2011; Buhrmester et al., 2018; Chandler et al., 2019). Today, most participants are recruited online (e.g., Zhou & Fischbach, 2016). Online platforms give researchers access to millions of people around the world, allowing them to quickly gather data. This change has revolutionized behavioral science in three fundamental ways.
First, online tools have dramatically expanded access to participants. Researchers can easily sample people from different age groups, geographic regions, cultural backgrounds, life experiences, or any other characteristics that are relevant to the research. The diversity of people online solves many of the concerns about the generalizability of findings based primarily on undergraduate student samples (e.g., Mullinix et al., 2015). It also makes it easier to find specific and diverse groups of interest (e.g., Moss et al., 2023).
Second, online methods have made research faster. What took months to accomplish in the past can now be completed in days or even hours. This accelerated pace of data collection allows researchers to test ideas more rapidly, explore new questions, and respond to emerging social phenomena as they happen (e.g., Gharpure et al., 2020; Rosen, 2024).
Third, online tools provide unprecedented flexibility. Researchers can implement a wide range of studies, from simple surveys to complex experiments with multiple waves of data collection. They can track behavior over time, gather data on rare or hard-to-reach populations, and conduct studies that would be logistically impossible in a traditional laboratory, like the example that opened this chapter (e.g., Moss, 2022; Phang et al., 2025).
In some cases, the flexibility of online tools means the research does not have to remain online. In one study, researchers at UCLA, used online platforms to recruit 160 pregnant or post-partum women for an in-person laboratory study with recruitment support from the authors of this book. The research focused on how to build a better bottle nipple for babies with tongue-tie syndrome. Online tools made it far easier to identify people who were eligible and interested in the study than would have been possible otherwise.
Sometimes, the flexibility of online tools allows researchers to mix elements of online research with physical participation. For example, in "I-HUT" studies (in-home usage tests), researchers send participants a specific product, such as a fitness tracker or a device prototype, and collect data about people's experiences remotely. Alternatively, researchers sometimes ask participants recruited online to send bio-samples, such as saliva, to a laboratory where they can be processed and paired with survey data provided by the participant. Ultimately, whether researchers are conducting purely online studies, laboratory experiments, or some combination of the two, online recruitment extends what is possible in behavioral science. The goal of this chapter is to explore the options for online participant recruitment and to understand how best to use each one.
Market Research Panels
When you think about online research, you might picture the kinds of projects from Part I of this book. A student or a professor launches a study, a few hundred to a few thousand participants complete it, and within hours the data are ready to analyze. Although common in academia, these studies represent a small fraction of the research that takes place online.
According to industry estimates, about 5 billion online surveys are completed each year, powering multiple industries and fields of study (Figure 9.2). For example, online surveys drive political polling; they fuel public opinion research on everything from consumer confidence to social attitudes; they help public health researchers understand disease patterns and health behaviors; they enable social science research across disciplines like psychology, sociology, and economics; and perhaps most often, they facilitate market research that guides how companies develop products, advertise to consumers, and make businesses decisions. Where does all this data come from?
Most of it comes from a complex web of businesses and technologies that have been built for market research. You can picture this ecosystem as a three-tiered pyramid (Figure 9.3).
At the base of the pyramid are panels. A panel is a group of people who have agreed (opted-in) to be contacted for future research studies. When people sign up with a panel, they give the company their demographic information, contact details, and sometimes their preferences for which kinds of surveys they want to complete. When a research opportunity arises, the panel provider uses its database to invite qualified participants.
Panels range from small operations with a few thousand participants to large companies with millions of members. There are hundreds of panels in the United States and thousands more worldwide. Some of the larger panels have names like Prodege, TapResearch, and Toluna. They host hundreds of millions of surveys every year.
A step above panels, in the middle of the pyramid, are panel aggregators. These companies do not maintain their own participants. Instead, they aggregate hundreds of different panels to give researchers greater access to participants. Panel aggregators have developed technology that routes surveys to appropriate panels and integrates the responses from different sources into a single study (Moss et al., 2023). This technology is required to find enough people from specific groups to make market research possible. For instance, imagine trying to find 2,000 people who live in Los Angeles, have a household income above $100,000, and shop for a specific breakfast cereal. It is unlikely that any individual panel would be able to provide these participants, but with aggregation these studies become possible. The most widely used panel aggregator is Lucid, which was acquired by Cint. However, there are other aggregators that are also commonly used. For example, Prime Panels is a panel aggregator by CloudResearch commonly used in academia (e.g. Chandler, et al 2019; Moss et al., 2023).
Finally, at the top of the pyramid are research services. Research services sometimes have their own panels or aggregation technology, but far more often they serve as a bridge between researchers and panel providers. When a researcher requests participants from a research service, such as a market research firm, the firm coordinates the request with panel aggregators who pull participants from individual panels. The process looks like Figure 9.4. Within academia, perhaps the most widely known of these services is Qualtrics panels, but many others are available including services from Cint, CloudResearch, and others.
When researchers use an aggregator, a sample of, say, five hundred responses might be drawn from over a hundred different panels, resulting in a supplier list that looks like Table 9.1. Most researchers would not recognize the names of these panels and will never learn about their role in the study. But each panel has its own methods for recruiting participants, addressing issues of data quality, and getting people to engage in studies. This heterogeneity presents both opportunities and challenges for research.
On the side of opportunity, diversity across panels gives researchers unprecedented access to participants. While one panel might focus on specific groups like those who identify as Hispanic or Latino others focus on geographic regions like China or France. When panels with different specialties are aggregated, researchers can find participants who would not exist in large enough numbers anywhere else, such as African immigrants living in Texas or people with specific medical conditions. They also have greater ability to target people in specific countries, geographic regions, or zip codes.
Another benefit of aggregating panels is that it allows researchers to gather large samples. While most studies require only a few hundred to a few thousand participants, market research panels routinely deliver 10,000 or more people within just a few days. In one study, researchers gathered data from 250,000 people in under two weeks (Katz et al., 2020).
| Sample Panel Suppliers | ||||
|---|---|---|---|---|
| Prime Insights Group LLC | Consultancy Services LLC | Payswell, LLC | Beijing Youli Technology Co., Ltd. | MindSumo Inc |
| Make Opinion GmbH | Rewardia.com.au | Reward Holdings | KuRunData | Persona.ly |
| Qmee | dataSpring | GMO Research | B-old.org, Inc. | Grif & Mif LLC |
| SocialLoop | DFour Mobile Research | FusionCash | Splendid Research | Streetbees.com Limited |
| Publishers Clearing House | Market Cube | IPRoyal FZE LLC | Pick Media Ltd | SurveyEveryone |
| Branded Research | URWelcome Technologies | Liidimedia oy | Screen Engine/ASI, LLC | A-K International |
| Bitburst | Verasight | Centiment | Unimrkt Response Inc | Data100 |
| Attapoll | On Device Research | Githaus Research | GAMINGACE TECHNOLOGIES | Ruble Club |
| Trayistats AI | YouThink.io | Pureprofile | Union Street Enterprises, Inc. | SoftRock |
| Besitos Corporation, LLC | Bohemian Research LLC | Catalyse Research | Lux Surveys | Quantish Opinion |
| Unimrkt Research | Innovate | Cre Online | Three Hyphens | Unitedt |
| Tellwut | Datadvise | Market Agent | Bridge Money Inc. | Prime Opinions |
| Dynata US | Research for Good | Hiving | GG2U | SampleBus Market Research Ltd |
| CashInStyle | 99 Ventures | Mo-Web | Idle-Empire | Derrota la Crisis S.L. |
| Slicethepie ltd | Adnitech LLC | ProductLab | FocusGC | Australian Clearing Pty Ltd |
| Surveoo-Link | Aeon Research | TapResearch | Rita Personal Data BV | RevenueClick Media PVT Ltd |
| Rewards1 | BizRate | Testable | Rewardingways | FoxyPanel |
| Drumo | Opivox Panels LLP | Trusted Herd | Samrika Critique Services | Fusion |
| InboxPounds | Neobux | Kazeel | European Research Team | Globee Media - MrSurvey |
| Opinodo | Logit | Gaddin.com | Swash Data Tech Ltd. | Hansa Research |
| Almedia AG | Mypinio GmbH | Insight works | XP Interactive, LLC | Ignite Vision |
| Dale Network | Competition Panel | Survey Pronto | Nomadic Insights, LLC | InBrain.ai |
| Opinaia Panel | Edison Yeti | Syno Rewards | TGM Research FZE | Inginit PTE LTD |
| Alpha Poll | Shopkick | Apex | The Coupon App, LLC | Savvy Technology - Match |
| Purely Research | iSurveyPanel | Bovitz, Inc. | Research on Mobile | Shanghai Wanyan Network |
| Aspen Analytics | Poll Pronto | Daily Rewards | A One Market Research | MadCashSurvey - Match |
| ITC | Maiwen China | Walnut Unlimited | Madai - API | Maholla |
| TheoremReach | iAngelic | Surveyeah | MySoapBox | Mewug GmbH |
| MDQ | GrowthOps | Promio | Offernation | MindMover |
The size of market research panels is one reason they are used so often for polling. Leading up to the 2024 U.S. Presidential election, for example, the Siena College Research Institute (the organization that does polling with The New York Times) collaborated with the authors of this book to collect data through PrimePanels, an aggregator offered by CloudResearch (Chandler et al., 2019; Moss et al., 2023). The researchers surveyed 6,000 participants across seven battleground states, with each state's sample matched to demographic and county distributions of the state (Siena College Research Institute, 2024). This kind of targeting would not be possible without aggregation.
Yet, market research panels have disadvantages, too. The most serious disadvantage is data quality. In a typical study where participants are sourced though market research panels or panel aggregators, over 40% of participants can be expected to provide unusable data (e.g., Litman et al., 2023; Stagnaro et al., 2024, Weber, 2023). Chapters 10 and 11 will discuss data quality in online panels, but the lack of standardization in how panels recruit and vet participants means that if researchers do not take steps to address data quality within their studies the results can be unusable, misleading, or outright false (e.g., Gharpure et al., 2020; c.f. Litman et al., 2023).
Another challenge of market research panels is that researchers cannot control how much participants are paid. In addition, overall compensation tends to be low. In a typical study, participants will receive something equivalent to ten to fifty cents for their time. While this model works for short and simple surveys, it limits how much effort people are willing to invest in longer and more difficult studies.
Finally, a third challenge of market research panels is that researchers cannot directly communicate with participants. Interactions are typically mediated through the panel provider, meaning researchers often cannot give participants special instructions or build the rapport necessary for complicated projects. Participants from market research panels are unlikely to complete studies that require long open-ended responses or complex tasks like downloading and using apps. They are also unlikely to stick with longitudinal studies. About 60% of participants will drop out, or attrit, in the first week of a longitudinal project and that number increases to over 70% in a 30-day period (see Chapter 14 for more detail). The type of study described at the start of this chapter, where 1,000 participants completed daily tasks for 30 days, would be impossible through the typical market research panel.
Due to these limitations, most of the academic research conducted online uses more specialized panels, what this book refers to as researcher-centric online platforms (explored in the next section). Even though researcher-centric online platforms are a better fit for most academic studies than market research panels, it is important to understand the market research ecosystem for several reasons.
First, given how often these panels are used, you will likely encounter data from these sources in published studies, news reports, or policy documents. Second, you may use these panels to reach niche samples that cannot be found elsewhere. Finally, knowing about this ecosystem, and its strengths and weaknesses, will help you critically evaluate the research findings you encounter.
The last thing to note about market research panels is that many companies operate across multiple tiers of the ecosystem presented in Figure 9.3. For example, CloudResearch offers both its own panel platform, Connect, and an aggregator service called PrimePanels. These products serve different research needs. Whereas Connect is a researcher-centric online platform which provides direct access to a carefully vetted pool of high-quality participants, PrimePanels offers access to specialized demographic groups, international sampling, and large sample sizes.
More generally, cross-tier integration is increasingly common because most panels do not operate in isolation but collaborate with other panels. When a market research panel receives a request for participants with characteristics that are underrepresented in their own pool, they might partner with another panel to fulfill the request rather than turn the study down. This interconnectedness helps explain why the boundaries between different parts of the ecosystem often appear blurry. Researchers might work with what they believe is a single panel, unaware that their participants are sourced from multiple providers behind the scenes. By understanding the fundamentals of this ecosystem, you will be better equipped to make informed decisions about your own research. Table 9.2 summarizes the strengths and weaknesses of market research panels and panel aggregators.
| Strengths | Weaknesses |
|---|---|
| Scale and Reach Access to tens of millions of participants globally, making it possible to collect samples of 10,000+ respondents in days. | Data Quality 40% or more respondents may be low quality or fraudulent without proper quality control measures. |
| Geographic Targeting Ability to collect data from different geographic areas, down to the zip code level. | Limited Engagement Low compensation results in participants who give minimal effort, especially in complex tasks. |
| Demographic Specificity Possible to find niche populations thanks to aggregation across hundreds of panels. | Mediated Communication Researchers cannot directly communicate with participants, limiting the ability to provide special instructions or clarifications. |
| Census Matching Sufficient volume to match samples to census demographics in specific states. | Time Constraints Studies work best when 20 minutes or less. |
| Speed Rapid data collection, with very large and niche population studies often completed quickly. | High Attrition Around 60% of participants dropout in the first week of longitudinal studies. |
| Cost Efficiency Relatively affordable compared to traditional research methods like telephone surveys or in-person interviews. | Limited Response Types Not suited for long open-ended responses or complex interactive tasks. |
| Quota Controls Sophisticated tools for setting and managing quotas across demographic characteristics. | Limited Flexibility Researchers can't easily adjust compensation or incentives to improve participation. |
| Accessibility Available to researchers without specialized recruitment expertise or established participant pools. | Technical Limitations Hard to implement studies that require downloads, apps, or specialized software. |
| Research Diversity Suitable for a wide range of descriptive and correlational research across many disciplines and industries. | Longitudinal Challenges Nearly impossible to conduct extended studies with daily participation over weeks or months. |
Researcher-Centric Online Platforms
While there are hundreds of market research panels, most academic research occurs on just a few platforms, such as CloudResearch Connect, Amazon Mechanical Turk (MTurk), or Prolific. These platforms are best thought of as researcher-centric platforms because they allow researchers to control much more of the research process than market research panels.
The first researcher-centric platform to be widely used for behavioral research was Amazon's Mechanical Turk (MTurk; Buhrmester et al., 2011; Paolacci et al., 2010). Although it was not designed specifically for research, MTurk single-handedly moved academic research online (Litman & Robinson, 2020; Moss et al., 2024). Then, beginning in 2018, the quality of data on MTurk declined substantially, causing behavioral scientists to look for new sources of participants (Hauser et al., 2023; Moss & Litman, 2018; see Chapter 10). Some of the alternatives they found, like Connect and Prolific, were specifically built for behavioral research (O'Grady, 2024).
At first glance, researcher-centric platforms appear similar to market research panels, as both maintain a database of participants (a panel) willing to complete tasks for compensation. However, unlike market research panels, researcher-centric platforms put the researcher in full control of the study.
The first factor that differentiates research-centered platforms from market research panels is the ability to specify participant compensation. Controlling compensation allows researchers to conduct more complicated projects because when people feel they are fairly compensated for their time they are more willing to complete long, demanding, or difficult tasks (e.g., Litman & Robinson, 2020; Moss et al., 2023).
For example, Figure 9.5 shows where researchers set compensation on Connect. Because the participants are often motivated by financial rewards, researchers can find people willing to participate in video interviews, test new websites, analyze text, work in teams, discuss political issues, review scientific papers, suggest research ideas, track their experiences over time, and much more, as long as they are well compensated (Arechar et al., 2018; Boynton & Richman, 2014; Campbell & Reiman, 2022; Gallo & Gran-Ruaz, 2021; Garbinsky et al., 2020; Hall et al., 2020; Kittur et al., 2008).
Second, researcher-centric platforms give researchers the ability to communicate with participants. Often, this communication occurs through a messaging or email system like the "Conversations" page on Connect (Figure 9.6).
Communication with participants can enhance the quality of a study. Most tangibly, communication allows participants to tell researchers when something is wrong such as a broken link, a question that does not make sense, or an issue that prevents participants from doing what the researcher has asked. Communication also allows researchers to provide participants with support, and it can be part of upholding the ethical standards of behavioral research discussed in Chapter 15.
Even so, the most common reason to communicate with participants is to remind them about subsequent rounds of data collection in a longitudinal study. For instance, the message in Figure 9.6 is from a longitudinal study that asked participants to engage in a sexual mindfulness intervention. The text of the message reminds participants to practice sexual mindfulness—being mentally present and nonjudgmental—during intimate moments with their partner, such as while cuddling or holding hands. Reminders such as these increase participant engagement and improve retention rates in longitudinal studies, especially when paired with the ability to control how much participants are compensated (e.g., Hall et al., 2020).
Finally, researcher-centric platforms offer significantly higher data quality than other recruitment options (e.g. Stagnaro et al., 2024), as discussed in more detail in Chapters 10-12. The main reason for this is that researcher-centered platforms vet participants before they can participate in studies. If you created a Connect account in Chapter 2, you experienced this vetting. Research platforms that vet participants produce better data quality than those that do not (e.g., Peer et al., 2023; Stagnaro et al., 2024).
Together, the strengths of researcher-centric platforms—control of compensation, the ability to communicate with participants, and high data quality—allow researchers to conduct a variety of projects that go beyond simple surveys. These include interactive experiments where participants need to download software for cognitive research that records reaction times with millisecond precision (e.g., Stewart et al., 2017). It also includes video interviews, real-time interactions between participants (e.g., Keller et al., 2023), mock jury trials (e.g., Salerno et al., 2023), complex tasks like content analysis of written texts (e.g., Benoit et al., 2016), and multi-session longitudinal studies (e.g., Hall et al., 2020). In fact, longitudinal studies are so common that Connect offers a feature called Waves for managing these projects (see Chapter 14).
Researchers can use Waves to create studies with dozens or even hundreds of follow-up points and automate the work of launching studies and reminding participants about upcoming sessions. Features like this elicit high engagement from participants.
The study that opened this chapter is a good example of what differentiates a researcher-centered platform like Connect from a typical market research panel. Researchers had to interact with participants every day, choose how much to pay people, and create a workflow for tracking who participated in previous studies. The researcher dashboard on Connect helped facilitate all these tasks.
The main limitation of researcher-centric platforms is that they cannot provide access to as many niche groups as market research panels (e.g., Moss et al., 2023). This is a direct result of aggregation. Because researcher-centric platforms operate as a single panel, there are many groups of participants researchers simply cannot reach in the numbers they desire. Often, the biggest limitations arise when sampling small groups in the population and people from specific, harder to reach geographic areas such as sparsely populated states or communities.
Despite some limitations on who can be reached, the overall combination of high data quality, researcher control, methodological flexibility, and participant engagement explains why academic researchers have embraced researcher-centric platforms so enthusiastically (for an overview see Litman & Robinson, 2020). For studies requiring thoughtful responses, complex designs, or longitudinal follow up, the advantages often outweigh smaller participant pools compared to traditional market research panels. Table 9.3 summarizes the strengths and weaknesses of researcher-oriented platforms.
| Strengths | Weaknesses |
|---|---|
| Superior Data Quality Rigorous verification procedures reduce fraudulent responses, compared to up to 40% in traditional market research panels (e.g., Stagnaro et al., 2024; Moss et al., 2023). | Smaller Participant Pools Fewer total participants than major market research panels, which limits sample sizes for very specialized populations. |
| Researcher Control Researchers can set compensation rates, implement custom screening, and directly communicate with participants. | Potential Non-naivety Some participants may have more research experience, affecting the results of some study designs. |
| Methodological Flexibility Supports complex study designs including longitudinal research, interactive experiments, specialized tasks beyond simple surveys, studies requiring downloads of specialized software, video interviews, or interactive studies. | Generally Less Representative Samples Demographics may not naturally match census proportions without implementing specific quotas. |
| Participant Engagement Participants are more attentive, thoughtful, and willing to engage with complex or time-consuming tasks compared to other online sources. | Limited Global Coverage Most sites are limited to just a few countries or a single region. |
| Longitudinal Capabilities Higher retention rates for multi-session studies, makes extended research designs more feasible. |
Representativeness in Online Sampling
Examine issues of sampling and how to make online samples more representative
The previous section described the sources for recruiting participants online. In this section, we will learn about sampling and the representativeness of online participants.
The section begins by defining what constitutes a representative sample and why it matters for research. Then, we will learn about the two primary approaches to sampling participants—probability and non-probability—and discuss why most behavioral research uses non-probability sampling. A key focus of the section will be on understanding common practices used to increase the representativeness of online samples, like quota sampling, and how these demographically adjusted samples differ from true probability samples. Finally, we will explore the "fit-for-purpose" framework, which is an essential guide for thinking about sample representativeness.
What is a Representative Sample?
Most studies in the behavioral sciences examine a sample, which is a specific group of people selected from a larger population. A population is the group of people a researcher wishes to understand and draw conclusions about. For instance, if a researcher was interested in the political attitudes of adults in the United States, the population would be all adults in the U.S. The researcher might then survey a sample of a few thousand people.
A population does not have to be the size of an entire country. Instead, students at a specific university or people who drink Coca Cola can be considered a population, so long as those are the people the research is focused on. The critical question for any study with a population of interest then becomes: how well do the people in the sample reflect the population? This is the essence of representativeness. A truly representative sample accurately mirrors the characteristics of the population it is drawn from.
Two Approaches to Sampling: Probability and Non-Probability
Researchers generally use one of two broad strategies for selecting participants: probability sampling or non-probability sampling.
Probability Sampling
Probability sampling is designed to create a sample that is a statistical mirror of the larger population. Its defining feature is that every person in the target population has a known, and roughly equal chance of being selected to participate. Think of it like a lottery where every member of the population holds a ticket.
To collect a probability sample, researchers typically start with a nearly complete list of everyone in the target population, known as a sampling frame. A sampling frame might consist of a list of phone numbers of people in the country. Participants are then randomly selected from the sampling frame. If this process is executed correctly, the resulting sample should naturally reflect the distribution of characteristics found in the overall population. For instance, if we were studying social media use in the United States where usage varies from very little to very frequent, a true probability sample would capture heavy users, light users, and everyone in between in proportions similar to their actual presence in the U.S. population.
As with any statistical process, there is some error associated with measuring people's attitudes or behaviors. In probability sampling, this is referred to as the margin of error, or, how much the results of the survey can be expected to vary by chance. The margin of error in probability samples is a function of the sample size; the larger the sample, the smaller the margin of error. For example, in a sample of 500 people the margin of error is approximately 4%, in a sample of 1,000 it is 3% and in a sample of 2,000 it is 2%.
Putting all of that together, if a random sample of 1,000 people drawn from the US population showed that a President's approval rating is 40%, it would mean that 40% of people in the United States plus or minus 3% (somewhere between 37-43%) approve of the President's performance.
Non-Probability Sampling
In contrast to probability sampling, non-probability sampling is a way of selecting participants where not everyone in the population has an equal or known chance of being included. Instead, people might be included because they are readily available, referred to as a convenience sample. Traditionally, most research in behavioral science has used undergraduate participants in introductory psychology and other classes. These students are a convenience sample.
Online panels are also a source of non-probability samples because participants are self-selected rather than chosen through a random process from the population. People who self-select into panels are generally different demographically from those who do not. For example, they tend to be more technologically knowledgeable, more educated, more likely to be female, and more likely to lean left politically than the general population (see Litman and Robinson, 2020).
Somewhere in-between probability samples and pure convenience samples are quota-based samples, sometimes referred to as purposive samples (see Baker, 2013). In quota sampling, researchers aim to match their sample to the population based on known attributes, such as age, gender, education level, or geographic region.
Quota sampling is a standard practice in much of online research, where researchers will set quotas to align their sample demographics with those of the U.S. Census. For instance, if the Census indicates that 51% of the population is female and 13% is African American, a researcher using quota sampling would aim to recruit participants until approximately 51% of their sample is female, 13% is African American and so on (Chapter 14 demonstrates how this is done). Thus, while the online panel is itself selected from a non-random pool of people, the sample demographics will match the population of the United States.
While quota sampling can make a sample look more like the population on the selected demographics, it does not transform a non-probability sample into a true probability sample. The fundamental difference lies in the initial selection process. Probability sampling gives every person in the sampling frame a known and equal chance of being selected before quotas are considered. Quota sampling, on the other hand, selects from an already non-random pool of people and then matches the final proportions to U.S. Census targets.
The key technical characteristic of non-probability samples is that there is no known or agreed upon way to measure the margin of error (see Baker, 2013). Regardless of where the participants come from, there is no way to know how much the sampling frame differs from the population. For this reason, it is significantly more difficult to make inferences about the frequency with which something occurs in the population from a non-probability sample than a probability sample.
Why Behavioral Research Often Relies on Non-Probability Samples
Despite the strengths of probability sampling, most studies in the social and behavioral sciences use non-probability samples. Think about the two studies that opened this chapter: the classic Milgram study on obedience to authority, and the longitudinal study by Open AI. Neither study used a probability sample.
Neither have most studies throughout the history of the behavioral sciences. In 1986, David Sears reported that approximately 85% of studies published in social psychology journals used undergraduate students (Sears, 1986). Two decades later a similar rate was found in consumer research (Peterson, 2001). There are three reasons for the long history of using non-probability samples: 1) in experimental research, internal validity is often prioritized over external validity; 2) in most cases, the experimental effects and associations observed with probability samples are replicated with non-probability samples, and 3) non-probability samples allow for more complex research designs and better access to hard-to-reach groups.
Testing Treatment Effects
Most behavioral science studies are either experiments, such as those described in Chapter 7, or investigations of the relationships between variables, such as those described in Chapters 5 and 6.
In experimental studies, researchers often prioritize internal validity and the ability to detect whether an effect is present over obtaining a perfectly representative sample (see Hayes, 2017). This is because random assignment of participants to different experimental conditions nullifies any pre-existing differences across groups. In other words, even if the participants do not mirror the larger population, random assignment creates two equivalent groups that allow a well-executed study to establish a cause-and-effect relationship.
Often, an experimental effect observed in a non-probability sample is just as informative as one in a probability sample. For instance, consider the process of testing a new drug in a clinical trial. The primary goal is to determine if the drug works. That is, does the drug have the intended therapeutic effect compared to a placebo or other existing treatments? In such studies, the critical factor for establishing causality is random assignment of participants to the treatment and control conditions. When participants are randomly assigned, pre-existing differences between the groups, whether demographic, medical history, or other unmeasured variables, are distributed equally across the treatment conditions. This allows the researchers to attribute any observed differences in outcomes to the drug itself rather than to other factors.
For this reason, initial drug trials are often conducted on samples that do not represent the entire population that might eventually use the drug. Instead, most drug trials rely on participants who are interested, available, and reside in a convenient geographic region that is close to the physical location of the trial. The same logic that works for drug trials works for psychological and behavioral experiments. Thus, many behavioral studies use convenience samples because doing so does not impede their ability to draw meaningful causal conclusions.
Comparisons of Treatment Effects on Probability and Non-probability Samples
In addition to the importance of internal validity, a growing body of research shows that online convenience samples and probability-based, representative samples yield similar results across the vast majority of studies.
In recent years, several large-scale studies have systematically compared experimental treatment effects found in online non-probability samples with those from nationally representative probability-based samples, such as the Time-Sharing Experiments for the Social Sciences (TESS, 2024). In one study, for instance, researchers examined 23 experiments from across the social sciences and found that more than 80% of the 36 treatment effects they examined on Mechanical Turk perfectly replicated those from probability-based samples (Mullinix et al., 2015). Importantly, in none of the experiments were the MTurk results in the opposite direction of the probability-based samples, suggesting a low risk of Type I (finding an effect that doesn't exist) and Type II (not finding an effect that does exist) errors when using online convenience samples.
Similar research has compared 27 different experiments across hundreds of demographic subgroups from MTurk and traditional probability samples. It found both methods pointed to similar conclusions, with only a small percentage of comparisons showing meaningful differences in how strong the effects were (Coppock et al., 2018). Similar consistency has also been found in correlational studies. In one instance, researchers compared the associations between various political attitudes (such as egalitarianism, moral traditionalism, and authoritarianism) in MTurk samples to those found in data from the nationally representative American National Election Studies (ANES; Clifford et al., (2015). Across 72 different effect sizes, the researchers found 68 were similar in both samples, with the few statistically different effects still showing relationships in the same direction. Thus, the overall evidence from these studies suggests that for most experimental and associative research, online convenience samples are likely to yield conclusions similar to those derived from probability samples.
Complex Designs and Better Access to Niche Groups
The final reason non-probability samples are common in the behavioral sciences is because they allow researchers to conduct the complex projects described in Module 9.1 or to reach niche groups. When researchers want to test ideas or study sub-groups in the population, it is often easier to find people from the market research panels or researcher-centric platforms discussed earlier than it is to use a form of probability sampling. In other words, participants from non-probability sources are often a better fit for the kinds of studies most behavioral scientists want to conduct.
The Fit-for-Purpose Framework: Matching Sample to Research Goal
Given all the considerations of sampling participants, how should researchers think about probability and non-probability samples? The fit-for-purpose framework (Baker, 2013) provides a practical approach. This framework suggests that rather than viewing samples as simply "good" or "bad," "representative" or "unrepresentative", researchers should think about how well a sample meets the goals and design of a research project. Under this framework, different questions require different samples.
According to the fit-for-purpose framework, the choice of participant source and sampling strategy should be driven by the specific research question and objectives. Online non-probability samples are generally fit for experimental and associative research, when the primary goal is to understand the nature and strength of a relationship. As described above, comparative studies have shown encouraging consistency in the effects found in non-probability panels versus those in nationally representative samples.
On the other hand, when a study aims to provide a precise estimate of how frequently something occurs in the population, such as in political polls, non-probability panels are generally not the best fit. Because these samples are opt-in, they may differ from the general population and produce biased estimates of how often a behavior occurs.
Nevertheless, non-probability online panels are commonly used in polls that aim to measure the general population. Indeed, most polling is currently conducted with non-probability online panels (see Cohn, 2024; Kennedy, 2018; Kennedy et al., 2020). In these studies, quota sampling is combined with statistical weighting after data collection, which can substantially improve the accuracy of estimates from non-probability samples. It is generally agreed, however, that such methods may not remove all bias (see Baker, 2013; Tourangeau et al., 2013; Stagnaro, 2024).
Complex Research Designs
Finally, the discussion of sampling methods and representativeness must also consider the practical feasibility of conducting different types of research. Recall the study described at the beginning of the last chapter, where OpenAI and the MIT Media Lab investigated the psychological effects of frequent interactions with ChatGPT. That research involved nearly 1,000 participants engaging in daily tasks and interactions with ChatGPT over a one-month period, with researchers measuring outcomes like loneliness and emotional dependence on AI.
Consider the immense challenge of conducting such an intensive longitudinal study using traditional probability sampling methods aimed at achieving a perfectly representative national sample. Recruiting nearly a thousand people through probability sampling and then following their daily engagement with specific tasks and AI interactions for an entire month, would be extraordinarily complex and prohibitively expensive.
This is where online convenience samples demonstrate their unique value. These platforms are specifically designed to facilitate complex research studies. They allow researchers to recruit large numbers of participants relatively quickly, manage communication for longitudinal studies, implement systems for daily reminders or task delivery, and incentivize sustained participation in demanding or lengthy projects. Without such infrastructure, research questions requiring intensive longitudinal data collection or complex interactive designs might remain unexplored due to practical and financial barriers.
Summary
Behavioral scientists can access online research participants through multiple channels, each with distinct advantages. Researcher-centered platforms offer versatile tools for various research designs, strong quality controls, and participants willing to engage in diverse tasks. Market research panels excel at complex demographic targeting and large samples.
When the primary goal of research is to test predictions from theories, whether through experiments or the examination of associations, online non-probability samples have demonstrated a strong fit-for-purpose. The findings regarding experimental effects and correlations are reliable and have been accepted as standard practice across many disciplines in behavioral science, particularly when researchers employ strategies like quota sampling where appropriate.
The most effective sampling approach depends on your specific research needs. When selecting a participant source, the fit for purpose framework is a helpful and practical approach when thinking about which source to use.
Additional Readings
- Buhrmester, M. D., Talaifar, S., & Gosling, S. D. (2018). An evaluation of Amazon's Mechanical Turk, its rapid rise, and its effective use. Perspectives on psychological science, 13(2), 149-154.
- Chandler, J., Rosenzweig, C., Moss, A. J., Robinson, J., & Litman, L. (2019). Online panels in social science research: Expanding sampling methods beyond Mechanical Turk. Behavior research methods, 51, 2022-2038.
- Coppock, A., & McClellan, O. A. (2019). Validating the demographic, political, psychological, and experimental results obtained from a new source of online survey respondents. Research & politics, 6(1), 2053168018822174.
- Hartman, R., Moss, A. J., Jaffe, S. N., Rosenzweig, C., Litman, L., & Robinson, J. (2023). Introducing Connect by CloudResearch: Advancing online participant recruitment in the digital age.
- Moss, A. J., Hauser, D. J., Rosenzweig, C., Jaffe, S., Robinson, J., & Litman, L. (2023). Using market-research panels for behavioral science: An overview and tutorial. Advances in Methods and Practices in Psychological Science, 6(2), 25152459221140388.
Frequently Asked Questions
What is the difference between market research panels and researcher-centric platforms?
Market research panels are large databases of participants used primarily for market research, facilitating about 5 billion surveys annually through a three-tiered ecosystem of panels, aggregators, and research services. Researcher-centric platforms like CloudResearch Connect, MTurk, and Prolific are specifically designed for behavioral research, giving researchers control over compensation, direct communication with participants, and offering higher data quality through participant vetting.
What is probability sampling versus non-probability sampling?
Probability sampling gives every person in the target population a known, roughly equal chance of being selected, creating a statistical mirror of the larger population with a calculable margin of error. Non-probability sampling, including convenience samples and quota-based samples, does not give everyone an equal chance of selection. Most behavioral research uses non-probability samples because they are practical for experimental research where random assignment to conditions is more important than population representativeness.
What is the fit-for-purpose framework in sampling?
The fit-for-purpose framework suggests that rather than viewing samples as simply 'good' or 'bad,' researchers should evaluate how well a sample meets the specific goals and design of their research project. Online non-probability samples are generally fit for experimental and associative research where the goal is to understand relationships between variables, while probability samples are better suited for studies aiming to estimate how frequently something occurs in the population.
Why do most behavioral science studies use non-probability samples?
Most behavioral research uses non-probability samples for three reasons: (1) experimental research prioritizes internal validity over external validity, and random assignment to conditions creates equivalent groups regardless of sample representativeness; (2) research shows that experimental effects found in non-probability samples typically replicate those from probability samples; and (3) non-probability samples allow for more complex research designs and better access to hard-to-reach groups.
What are the main advantages of researcher-centric platforms like CloudResearch Connect?
Researcher-centric platforms offer several key advantages: control over participant compensation (enabling more complex and demanding studies), ability to communicate directly with participants (crucial for longitudinal studies and troubleshooting), significantly higher data quality due to participant vetting procedures, and methodological flexibility for diverse study designs including longitudinal research, interactive experiments, video interviews, and studies requiring specialized software.
Key Takeaways
- Online recruitment has revolutionized behavioral research by expanding access to participants, accelerating data collection, and enabling complex study designs
- Market research panels facilitate approximately 5 billion surveys annually through a three-tiered ecosystem of panels, aggregators, and research services
- Panel aggregators combine hundreds of individual panels to provide access to niche populations and enable large-scale sampling that would be impossible with single panels
- Researcher-centric platforms like CloudResearch Connect offer superior data quality, researcher control over compensation, and direct participant communication
- Probability sampling gives everyone in the population a known chance of selection, allowing calculation of margin of error
- Non-probability sampling includes convenience samples and quota-based samples where not everyone has an equal chance of selection
- Quota sampling matches sample demographics to population characteristics but does not transform a non-probability sample into a true probability sample
- Research shows that experimental effects replicate similarly across probability and non-probability samples in more than 80% of studies
- The fit-for-purpose framework suggests evaluating samples based on how well they meet specific research goals rather than viewing them as simply good or bad
- Non-probability samples are well-suited for experimental and associative research but may produce biased estimates for population-level frequency questions









