Introduction
The National Longitudinal Study of Adolescent to Adult Health, abbreviated Add Health, is the largest, most comprehensive survey of its kind ever conducted. Funded by the U.S. government, it began during the 1994-95 school year with a nationally representative sample of over 20,000 students in grades 7-12, and it has continued with five additional waves of data collection to date.
Over the years, Add Health has gathered a stunning amount of information about participants' demographic, social, familial, socioeconomic, behavioral, psychosocial, cognitive, and health characteristics, as well as that of their parents. To this self-reported data, researchers have added information about participants' schools, neighborhoods, and communities. During in-home visits, researchers have gathered physical and biological data, including genetic markers, blood-based assays, body measurements, and information about people's medications. Simply stated, Add Health is a remarkable achievement of behavioral research that has required hundreds of millions of dollars and the effort of tens of thousands of people. In terms of methods and rigor, it's as good as it gets.
Yet, it is precisely because of that rigor that researchers were surprised to find people who lied about their data. On several occasions, researchers found significant discrepancies between what participants reported in surveys and what was evident during follow-up visits at participants' homes. In one striking example, researchers identified 253 participants who claimed to be missing limbs and using artificial replacements. But, when the researchers visited these people in person, they found only two people actually had missing limbs. The other 99% had lied (Fan et al., 2006).
Unfortunately, this was not the only problem with the data. In another study based on Add Health data, researchers found that adopted children were at higher risk than non-adopted children for failing school, using drugs, getting into fights, lying to their parents, and experiencing physical and mental health problems (Miller et al., 2000). During subsequent in-home visits, however, the researchers discovered that approximately 20% of the students who reported being adopted had lied; they were living with their biological parents (e.g., Fan et al., 2006). When these "mischievous participants" were removed from the dataset, the differences between adopted and nonadopted adolescents disappeared. Similar problems were found for self-reported immigration status and other outcomes (see Fan et al., 2006)
These problems with data quality are not unique to the Add Health database. In fact, they are as old as survey research itself. In the 1970s, for instance, researchers noticed a small but consistent number of people who claimed in surveys to take drugs that did not exist (e.g., Pape & Storvoll, 2006; Petzell et al., 1973). In other studies, researchers have found participants who lied about their sexual orientation, their gender or age, experiences with consumer products, gang affiliation, whether they have a vision impairment, whether they have ever been pregnant, and whether they own pets, among many other things (e.g., Chandler & Paolacci, 2017; Hartman et al., 2023; Robinson-Cimpian, 2014; Wessling et al., 2017). In each instance, removing these unreliable participants changed the study results.
While problems with data quality exist in all research environments, they pose an especially large problem in online studies. Up to 40% of data from market research panels comes from problematic, unreliable, or outright fraudulent participants (e.g., Chandler et al., 2019; Reavey et al., 2025), and while the researcher-centric platforms discussed in Chapter 9 generally produce cleaner data, no online source is immune from problems with data quality. As a result, this chapter explores the scope of data quality problems in online research.
In Module 10.1 we will learn how poor-quality data can produce misleading claims that spread through both academic and public discourse. Then, we will examine the global network of online survey fraud, learning how click farms and other scams operate to take the money offered for participating in studies. As part of this exploration, we will learn the story of Mechanical Turk, where problems with fraud became particularly evident in 2018 when researchers experienced a sudden drop in data quality that threatened research across multiple disciplines. This crisis highlighted how vulnerable online research platforms can be to systematic fraud. In response, specialized platforms have emerged to replace MTurk with rigorous verification procedures and more sophisticated detection systems. These platforms typically reduce fraudulent data to less than 5% of responses, compared to the 30-40% found on MTurk and typical market research panels.
After learning about problems with data quality, Module 10.2 shows how these problems affect descriptive, correlational, and experimental research. Through real-world examples, we will see how poor data quality can inflate point estimates in descriptive studies, create false correlations in associative research, and dilute effect sizes in experiments. We will see how these distortions can lead even careful researchers to draw incorrect conclusions about human behavior.
By understanding both the nature of data quality problems and their consequences for different types of research, you will be better prepared for the next chapter, where we introduce practical solutions for detecting and addressing these issues. The goal is to enable you to conduct online studies that yield reliable, meaningful insights into human behavior on any online platform.
Chapter Outline
Data Quality in Online Research
Examine what the threats are to online data quality
In recent years, public polling has revealed the following things about the U.S. population:
- 20% of people support political violence (National Public Radio, 2023)
- 30% of Millennials are unsure if the Earth is round (Scientific American, Nguyen, 2018)
- 20% of Millennials think the Holocaust is a myth (The Economist/YouGov, 2023)
- 4% of people drank or gargled bleach to protect against COVID-19 (Centers for Disease Control, CNN, 2020)
- Over 50% of African Americans don't think it's okay to be White (Rasmussen, 2023)
These results are unflattering. They suggest people in the U.S. deny basic science, embrace conspiracy theories, support political violence, are ignorant of history, and are divided by race.
But there is good news: all these findings are false.
Each finding above stems from data quality problems that went undetected. Because the researchers failed to detect the problems, not only were they misled about what people think, but so were the scientific journals and news outlets that published the findings and the citizens who consumed them.
The impact of misleading findings extends far beyond academic circles. Misleading research can shape public opinion and influence policy decisions. For instance, take the claim about Millennials doubting the Earth is round. This finding appeared in Scientific American, one of the most trusted popular science publications in the world. Once there, it gained tremendous credibility. It was later mentioned several times on the popular podcast, The Joe Rogan Experience, where Joe Rogan discussed the finding with renowned astrophysicist Neil deGrasse Tyson. Their exchange, viewed by tens of millions of people, has the potential to legitimize a false narrative about how young Americans view the world.
A similar pattern has played out with other findings. The claim about Millennials denying the Holocaust prompted discussions about education policy in the United States. The statistics about Americans consuming bleach raised concerns about dangerous health practices during a global pandemic. And the finding about racial attitudes fueled already tense conversations about race relations in America, even though the real issue was one of measurement. How did these things happen?
The answer lies in data quality. Remember that about 5 billion surveys are completed online each year, and the majority occur through market research panels. This massive industry has transformed how scientists gather data about human behavior while also creating unprecedented challenges for data quality.
The best estimates today are that around 40% of responses in a typical study conducted with a market research panel (as opposed to a researcher-centric platform) come from unreliable sources (Chandler et al., 2019; Litman, et al., 2023; Mercer et al., 2024; Weber, 2023), although the number can sometimes be much higher (see the Lucid data in Stagnaro et al., 2024).
In one study, an industry group formed to examine data quality, called Case for Quality, collected data from over 4,000 respondents from four different online panels. They found that between 30 and 40 percent of data could not be used for analysis, either because of fraud or inattention. Fraud was equally apparent across all panels and took numerous forms. One notable behavior was that respondents made implausible claims in the survey. For example, the study was conducted during the height of the COVID-19 pandemic in the United States, when all major theaters and entertainment events were closed. Yet, many people in the survey claimed to have gone to the opera in the last month.
Similarly implausible claims have been found in other studies. For instance, when researchers examined data quality across three commonly used online panels with a sample of 2,500 respondents, they included a question about whether people had recently purchased a home in McMullen, Alabama. According to the 2020 Census, McMullen has a population of thirty-two people. Nevertheless, 417 participants reported recently purchasing a home there, clearly indicating they were not reading the questions or intentionally lying (Reavey et al., 2025).
There has been considerable speculation about who or what lies behind unreliable online data. The most common explanation has been that problematic data comes from 'bots,' automated computer programs designed to fraudulently take online surveys. Despite this widespread belief, research suggests most problems with data quality come from people (e.g. see Jaffe et al., 2025; Kennedy et al., 2020).
Direct evidence for human respondents comes from several sources. In one case, researchers identified participants whose past responses suggested fraud and invited them to a Zoom interview (Jaffe et al., 2025). When these participants joined the video calls, what researchers saw was revealing. As shown in Figure 10.1, the participants appeared in rooms with multiple computers where other people were taking surveys. The people in the room spoke languages other than English and were outside of the United States despite participating in studies restricted to the U.S.
Particularly revealing was how participants in these interviews answered survey questions. During the interviews, the researchers opened a Qualtrics survey and asked participants to complete the survey while sharing their screen. Even though the participants knew they were being observed, they provided implausible answers.
For example, when asked if they had recently purchased a home in McMullen, Alabama—the town with just 32 residents—they answered "yes." When presented with a list of cruise lines and asked which ones they had traveled on recently, the participants selected cruise lines that do not exist. They also reported using fictional products and services and said they had experienced several unlikely events such as filing a recent homeowners' insurance claim due to a lightning strike.
The participant depicted in Figure 10.1 is part of what is commonly called a "click farm." Click farms operate all over the world, in countries like India, Bangladesh, Russia, Nigeria, Venezuela, the United States, and many others. People in these locations use sophisticated tools to circumvent geographic restrictions, including virtual private networks (VPNs), remote desktop services, and IP address rotators. For most of these people, even the modest payments collected from online surveys represent a meaningful amount of money creating the incentives for fraud.
Yea-Saying: A Common Behavior in Online Fraud
One thing revealed by the surveys and interviews described above is that fraudulent participants say 'yes' to most survey questions, even when those questions ask about impossible or implausible events. The reason for this behavior lies in how participants qualify for various surveys in the market research ecosystem.
Most market research surveys target people who fit a specific profile. Companies want to hear from consumers who use specific products or services, fall into certain demographic categories, or engage in specific behaviors. To identify these people, researchers use screening questions.
Consider how this works. A soft drink company might want feedback from people who regularly consume their products. Their survey begins with questions like, "Do you drink Coca-Cola?" or "Have you purchased any of the following beverages in the past month?" A little later in the survey, the researchers might ask about specific product usage or preferences. Someone whose answers do not fit the profile the researchers are looking for is immediately disqualified. The survey ends and the person receives no compensation for their time.
This system creates a powerful incentive to say 'yes.' Participants quickly learn that saying "yes" to screening questions increases their chances of qualifying for the study and earning money. Over time, the behavior becomes habitual regardless of the question's content or the participant's actual experiences. For participants working in click farms this yea-saying behavior is even more extreme because it is explicitly taught to maximize income. Indeed, there is an online cottage industry that teaches people how to commit fraud in online surveys.
The Global Network of Research Fraud
Survey fraud is perpetuated through extensive online communities. Platforms like YouTube, Facebook, Telegram, and Reddit host thousands of tutorials and discussion groups that provide detailed instructions for how to bypass security measures in online panels. Some YouTube channels have tens of thousands of followers and post daily videos explaining how to create convincing false identities, manipulate location data, and pass common screening questions.
One notable channel, called "Survey Help 360," is based in Bangladesh. It has amassed 44,000 YouTube followers and nearly 6,000 Facebook followers. This channel posts daily videos showing people how to circumvent security measures in online panels. Some videos provide step-by-step instructions for posing as a U.S. citizen, including detailed guidance on using rented U.S. phone numbers and proxy servers to obtain an IP address in the United States. Another video demonstrates how to create a profile and pose as a Black woman, complete with a fake driver's license and techniques to participate in video surveys specifically recruiting Black female participants.
Social media accounts have built substantial followings by teaching these deceptive tactics. They explicitly instruct followers to maximize survey earnings by claiming to fit multiple target demographics and to respond "yes" to as many screening questions as possible. By indiscriminately answering "yes" to these questions, participants increase their chances of qualifying for studies regardless of whether they meet the criteria. Some people have even taken to publishing digital courses that teach others how to qualify for lucrative research opportunities like in-depth interviews, focus groups, ethnographic studies, and video research.
Overall, the global network of 5 billion annual surveys contains a heavy dose of fraudulent data from respondents all around the world. This fraud is fueled by economic opportunity and amplified by social media networks that disseminate information for how to circumvent the protection mechanisms that panels have put in place. But fraud may not be limited to one side of the research equation.
In April 2025, the U.S. District Attorney's Office in New Hampshire filed an indictment accusing two market research panels of intentionally directing surveys intended for U.S. participants to people in click farms outside of the U.S (U.S. Attorney's Office, 2025). By fabricating large amounts of survey data, these platforms are alleged to have defrauded their customers out of more than $10 million. While it is rare for panels to be complicit in the proliferation of fraud, this indictment further underscores the need to be vigilant about data quality online.
The Rise and Fall of Mechanical Turk
The section above describes data quality on market research panels, but the story of data quality on one particularly popular platform reveals how an otherwise high-quality platform can have its reputation sullied by fraud almost overnight. That platform is Mechanical Turk.
MTurk was created in 2005 as a platform where people could solve problems that computers could not handle efficiently. The platform connected "requesters" (people who need work done) with people MTurk refers to as "workers." It did not take long for academic researchers to suggest MTurk might be good for research. At the time, most behavioral science studies collected data from undergraduate students. As a result, MTurk emerged at an opportune time, and it provided researchers with access to diverse participants at an affordable rate without requiring technical expertise (see Litman and Robinson, 2020; Moss et al., 2024).
Following the publication of an influential paper (Buhrmester et al, 2011), MTurk was rapidly adopted by the scientific community. Numerous papers showed that data quality on MTurk was very good (e.g., Buhrmester et al., 2011; Litman et al, 2015; see Bohannon, 2016). Indeed, the data quality from people on MTurk was often better than that from undergraduate students (Hauser and Schwarz, 2016). As a result, MTurk quickly replaced the undergraduate subject pool as the main source of data in the social and behavioral sciences. By 2016, MTurk data represented nearly 50% of studies reported in top psychology journals (Zhou & Fishbach, 2016). And by 2018, MTurk had been cited in more than 1,000 different journals (Buhrmester et al., 2018).
However, starting in 2018, a dramatic shift occurred. Researchers began seeing unusual patterns in their data, including inconsistent demographic information, nonsensical responses to open-ended questions, and unprecedented failure rates on attention checks (Bai, 2018; Ryan, 2018). These problems emerged almost overnight and caused widespread concern. The issue was so big that news outlets like Wired and the New Scientist ran headlines such as "Bots on Amazon's Mechanical Turk are ruining psychology studies" (Wired, 2018) and "A Bot Panick HITs Amazon's Mechanical Turk" (New Scientist, 2018).
When more detailed analyses of the problem were conducted, however, researchers found little evidence of bots (e.g., Moss & Litman, 2018). Instead, a significant portion of MTurk participants appeared to be providing low-quality data from outside of the U.S, in some cases reaching 30-50% of responses (Chmielewski & Kucker, 2019; Dennis et al., 2020).
Research revealed that international participants had circumvented MTurk's geographic restrictions using virtual private networks (VPNs) and other technical tools (Dennis et al., 2020). Conversations on social media revealed that people in India and other countries were buying access to accounts created by people in the U.S. (Figures 10.2 and 10.3). An analysis of questionable participants' IP addresses revealed that many supposedly U.S.-based respondents were from countries like Venezuela, India, and Eastern European nations (Moss & Litman, 2018). These participants provided data with a distinct pattern, often including impossibly fast completion times, inconsistent demographic information across studies, and implausible response combinations (Kennedy et al., 2020).
Eventually, a large study found that 65,000 out of 165,000, or about 40%, of MTurk workers provided unuseable data, likely because they were fraudulent (Hauser et al., 2023). While the source of the data quality issue was often described as "bots," research painted a different picture. Evidence pointed to human fraud coming from outside of the United States (see Kennedy et al., 2020a, b; Litman et al., 2021; Jaffe et al., 2025). Indeed, it did not take long to see that the data quality issues on Mechanical Turk were an extension of the same problems that had existed for years on market research panels and panel aggregators.
MTurk's fall from grace has led to several shifts in online data collection practices. First, specialized platforms emerged to replace Mechanical Turk. As discussed in the previous chapter, we refer to these as researcher-centric platforms. These platforms implemented rigorous data quality verification procedures. Sites like CloudResearch Connect and Prolific built their systems with data quality as a central focus, reducing the prevalence of fraudulent data to the low single digits compared to the 30-40% or more found on typical market research panels (Stagnaro et al., 2024).
Researchers also put more emphasis on behavioral and technical solutions that could be implemented within studies regardless of which platform was used. Behavioral solutions including attention checks, instructional manipulation checks, and data validation approaches became standard practice (e.g., Arndt et al., 2022; see Chapter 13). Technical solutions, such as CloudResearch's Sentry were developed to identify suspicious IP addresses and patterns among participant accounts, helping researchers filter out potentially fraudulent participants before they entered studies (Litman et al., 2017).
The rise and fall of MTurk as the dominant platform for behavioral research illustrates both the promise and the perils of online data collection. While the platform revolutionized access to research participants and dramatically accelerated the pace of behavioral science, it also demonstrated that data quality cannot be taken for granted. The lessons learned from this period continue to shape how researchers approach online data collection, with increased emphasis on verification, quality screening, and transparent reporting of data quality measures.
High Quality Online Platforms
We began Chapter 9 with a study conducted by Open AI and the MIT media lab in which they followed 1,000 people over 30 days, examining how interactions with ChatGPT affected loneliness. The study was run on CloudResearch Connect, the same platform used throughout this book. Before 2018, the study probably would have been run on Mechanical Turk. But because of the data quality problems described above, researchers have largely replaced MTurk with Connect, and other platforms, that research shows provide substantially higher quality data (e.g., Stagnaro, 2024).
Researcher-centric platforms like Connect have been built with data quality as the focus. For that reason, they stand out in the online research ecosystem. While MTurk and the typical market research panel may contain 30-40% fraudulent responses, studies conducted on researcher-centric platforms typically contain less than 5% problematic respondents and often less. This improvement stems from a vetting process during participant registration, continuous monitoring of people's behavior on the site, and sophisticated detection systems designed to identify response patterns associated with fraud.
For example, Connect requires people to complete an intense onboarding process and submit a valid photo ID to continue taking studies. It also evaluates people's activity after they are on the platform and monitors social media for conversations related to fraud. Connect also solicits feedback from researchers when they detect problematic data. Thanks to these and other procedures, researcher-centric platforms like Connect provide significantly better data quality than MTurk or panel aggregators such as Lucid (Stagnaro et al, 2025).
However, current success does not ensure future outcomes. Even platforms like Connect need to be monitored for data quality to remove potential problematic data. Additionally, as mentioned in Chapter 9, there is often a tradeoff between different participant sources and researchers will sometimes need a market research platform to meet different research objectives. In the next chapter, we examine ways to identify and remove problematic data when using online platforms.
Before that, however, Module 10.2, explores how problematic data create different problems in descriptive, correlational, and experimental studies. It also introduces a few of the techniques that have been successful in finding and removing problematic respondents in past studies.
How Data Quality Affects Research
Learn how low-quality data distorts descriptive, correlational, and experimental research
Researchers do not want fraudulent data in their studies, but why exactly? What effect does low quality data have on research findings? In this module, we will learn how low-quality data distort descriptive, correlational, and experimental research findings.
How Bad Data Affect Descriptive Research
In June of 2020, three months into the COVID-19 pandemic in the U.S., researchers from the Centers for Disease Control and Prevention (CDC) published a study claiming that 40% of Americans were engaged in dangerous cleaning practices. Most concerning, about 10% of people reportedly drank or gargled bleach, 4% drank or gargled soapy water, and 4% drank or gargled other household cleaners to prevent Covid-19 infection (Gharpure et al., 2020). You may recall, the start of the pandemic was a scary time. Yet, if these numbers were accurate, then tens of millions of people were not just misusing cleaning products but doing things that defied common sense.
Perhaps unsurprisingly, an attempt to replicate the CDC's results led to a very different picture (see Litman et al., 2023). This replication used the same questions and sampling procedures as the CDC but included data quality checks to identify potential fraud and inattention (the next chapter will describe these checks in more detail).
In the replication study, researchers separated participants into two groups: those who failed the screening measures ("unreliable") and those who passed ("reliable"). Across two studies, the initial results matched the CDC's findings. However, when the researchers examined the data for reliable and unreliable participants separately, a different picture emerged.
As Figure 10.4 shows, dangerous cleaning behaviors were reported exclusively by unreliable participants. Reports of drinking bleach or other dangerous cleaning products came exclusively from participants who failed the quality checks.
What does this reveal about how poor data quality distorts descriptive research? Descriptive studies typically report percentages or averages (point estimates) to characterize populations (see Chapter 3). When data includes responses from inattentive or fraudulent participants who engage in systematic yea-saying, the estimates become illusory. In effect, when twenty percent or more of respondents are systematically agreeing with most items, evidence can be found for the existence of pretty much anything.
Many of the polling results at the start of the chapter were befallen by a similar fate. Follow-up studies, with proper data quality measures, found much smaller numbers or entirely non-existent effects (e.g., Hartman et al., 2023; Holliday et al., 2024; Litman et al., 2023; Mercer et al., 2024). Yet the sensational claims about Americans drinking bleach, Millennials denying the Holocaust, or Black people saying it isn't okay to be White made headlines because they were shocking and because the researchers failed to account for data quality problems.
How Bad Data Affect Correlational Studies
We have seen how poor data quality distorts descriptive research by inflating point estimates. But how do these same problems affect correlational research?
Remember that a correlation shows how two variables relate to each other. As described in Chapter 5, when scores on one variable increase or decrease, scores on the other variable do too in a predictable direction. When participants engage in yea-saying—systematically agreeing with questions—they artificially increase the correlation between measurements. This can make unrelated variables appear correlated or make weak relationships look stronger than they really are.
Let's look at an example. Figure 10.5. shows the correlation between people's education and social anxiety from a study by Chandler et al., (2020). Previous research found these variables have a small negative correlation; as education increases, social anxiety tends to decrease. This makes sense given that succeeding in higher education requires navigating complex social environments. However, the correlation in Figure 10.5 is not negative.
In the figure, green dots represent "reliable" participants who passed data quality checks, while red dots represent "unreliable" participants who failed these checks.
Looking at the overall correlation in the sample (r = .14), there is a positive relationship that contradicts previous findings. But when the data are separated, two different patterns emerge. Among reliable participants, the correlation is slightly negative (r = -.07), matching previous research. Among unreliable participants, it is moderately positive (r = .20). Why the difference?
The scatterplot on the left shows the clear separation between reliable and unreliable participants. The middle and right images reveal the cause: unreliable participants consistently gave higher ratings on both social anxiety (middle) and education (right). Their yea-saying shifted both distributions to the right, creating an artificial positive correlation between variables that should be negatively related.
Something similar happened within the CDC study we examined earlier as well. After asking about cleaning practices, researchers measured negative health outcomes like skin irritation, dizziness, headaches, and breathing problems. Among reliable participants, there was no correlation between dangerous cleaning behaviors and health problems (r = .03). But among unreliable participants, there was a highly significant correlation (r = .38). When analyzed together, the unreliable responses inflated the overall correlation, leading CDC researchers to wrongly conclude that dangerous cleaning behaviors were associated with health problems.
False correlations pose a serious risk to research. When researchers find surprising correlations that do not match expectations, those results often face scrutiny. But when inflated correlations match what researchers predict, they rarely question the findings. Instead, they see the correlation as evidence supporting their hypothesis. This means flawed findings that confirm expectations are more likely to be published and enter public discourse, and several examples of this phenomenon exist.
Let's consider, the Add Health study once again. After identifying participants who lied about immigrating to the U.S., the researchers reanalyzed their data. Table 10.2 tells the story. The first column compares true immigrants to U.S.-born adolescents, showing small differences that generally favored immigrants. The second column compares adolescents who falsely claimed to be immigrants against U.S.-born adolescents, showing large differences suggesting immigrants face more problems.
Participants who lied about immigrating to the US reported far more negative outcomes—more trouble in school, drinking, emotional distress, health problems, and fighting. When researchers analyzed all the data together, they concluded that immigrant students struggled more than U.S.-born students. The finding seemed logical given the challenges of adjusting to a new country, but it was based on false data.
| Outcome Variable | True Non-U.S. Born (n = 863) vs. True U.S. Born (n = 11,550) | False Non-U.S. Born (n = 176) vs. True U.S. Born (n = 11,550) |
|---|---|---|
| School grades (+) | -0.03 | -0.32 |
| School trouble | 0.03 | 0.30 |
| Positive school feelings (+) | 0.01 | -0.64 |
| Skipping school | 0.02 | 1.33 |
| Smoking | -0.29 | 0.63 |
| Drinking | -0.24 | 1.07 |
| Drunk | -0.26 | 1.24 |
| Self-esteem (+) | 0.01 | -0.49 |
| Emotional distress | -0.10 | 0.60 |
| Future hope (+) | -0.03 | -1.32 |
| Health problems | -0.36 | 0.67 |
| Physical problems | -0.15 | 1.62 |
| Sickness | -0.04 | 0.90 |
| Fight | -0.21 | 1.15 |
| Lie to parents | -0.25 | 0.59 |
| Mean of absolute effect size | 0.13 | 0.80 |
Note: False immigrants reported dramatically worse outcomes, artificially inflating apparent differences between immigrant and U.S.-born youth. From Fan et al. (2006).
Research on LGBTQ youth has shown a similar problem. Since relatively few adolescents identified as LGBTQ in the past, even a small number of mischievous responders could significantly distort findings. Some studies confirm this problem (Robinson & Esplage, 2011, 2012, 2013; Cimpian & Timer, 2020; Savin-Williams & Joyner, 2014). While LGBTQ youth do face higher risks for some issues like suicidal thoughts and bullying, these risks appear much larger when mischievous participants remain in the data. Other supposed risk factors like drug abuse and fighting disappear almost entirely when mischievous participants are removed.
Finally, another example appears in Figure 10.6, showing the correlation between Facebook use and depression. Once again, there is clear separation between reliable participants (green dots) and unreliable ones (red dots). The correlation among reliable participants is small (r = .11), while among unreliable participants it is moderate (r = .35). Analyzing all data together yields an inflated correlation (r = .32).
A researcher finding this inflated correlation would have few reasons to question it because it aligns with the hypothesized outcome and this is how bad data threatens correlational studies. By giving researchers a false understanding of how variables relate to each other, these artificial relationships can mislead both scientists and the public, potentially driving misguided interventions and policies.
How Bad Data Affect Experimental Studies
We have seen how poor data quality distorts descriptive studies and creates false correlations. In experimental research, poor data quality leads to biased estimates of effect size.
An effect size tells researchers how big the difference is between experimental conditions (see Chapter 7 for a refresher on experimental design). When a manipulation strongly affects people's thoughts, feelings, or behaviors, the effect size is large. When the difference between conditions is small, so is the effect size.
The famous Trolley Dilemma provides an example of how bad data quality can affect experimental results. In the Trolley Dilemma, participants imagine a train barreling down the tracks toward five people who are tied up and unable to move. In one version, participants can pull a lever to divert the train to a different track where it will hit and kill just one person. The critical question is whether participants would pull the lever to save five people or let the train continue its path.
If you are like most people, you might feel there is something wrong with letting five people die over just one. So, as unpleasant as it may be, 70-80% of people across countries and cultures choose to pull the lever and save the lives of five people at the expense of one (Awad et al., 2020). But as in all experiments, there is another condition.
In the second version of the Trolley Dilemma, people can push a large man off a bridge above the tracks to stop the train, saving the five trapped people but killing the man who was pushed. As in the original version, the critical question is: what would participants do?
If you are like most people, you are less certain about this scenario. In fact, the number of people willing to save five people by pushing the large man under the train often falls by more than half across these scenarios (Awad et al., 2020). The difference between conditions is so reliable that researchers consider it a human universal. When people are paying attention and responding honestly, this effect appears again and again.
The reliability of the Trolley Dilemma makes it useful for examining data quality. In one study, researchers compared two groups: a "reliable" group of participants who had passed a prior screening process, and an "unreliable" group who had failed this screening (Jaffe et al., 2025).
Figure 10.7 shows the results. Reliable participants were much more willing to pull the lever than push the man. But unreliable participants showed no difference between conditions. About 90% of unreliable participants in both conditions said they would act to stop the train, a pattern that contradicts decades of research. These unreliable participants simply agreed with whatever option was presented (i.e., said Yes, I will pull the lever; Yes, will push the man), regardless of the differences between the conditions.
Another study used the "soda task" to demonstrate how data quality affects experimental results (Hauser et al., 2023). In this task, participants indicate how much they would pay for a soda on a hot day at the beach. Some participants are told the soda comes from a run-down grocery store, while others are told it comes from a fancy resort.
Researchers tested this scenario with three groups: participants with good data quality (who passed prior screening), participants with bad data quality (who failed screening), and a mixed group who had not been screened.
As Figure 10.8 shows, among participants with good data quality, there was a large effect. People were willing to pay significantly more for the resort soda than the grocery store soda. In the mixed quality group, the effect was also present but somewhat smaller. In the bad quality group, however, the effect disappeared completely.
This pattern reveals the main consequence of problems with data quality in experiments. When the responses are systematic, such as consistent yeah-saying in the Trolley Dilemma and the soda task, they will bias the effect in a specific direction. Poor quality responses dilute and distort the impact of experimental manipulations.
This has real consequences beyond the lab. If researchers conclude an intervention has a smaller effect than it truly does, promising treatments or policies might be abandoned. Conversely, if researchers cannot detect effects that truly exist, they might wrongly conclude that their hypothesis is incorrect.
Across all types of research types of data quality problems lead to fundamentally misleading conclusions. The good news is that researchers can implement effective strategies to detect and address these issues, as we will explore in the next chapter.
Summary
Data quality is fundamental to the integrity of behavioral research, as this chapter has demonstrated through concrete examples. When participants provide low-quality data, whether through inattention, yea-saying, or deliberate misrepresentation, all forms of research suffer.
In descriptive research, poor data quality distorts point estimates, creating a false impression about the prevalence of attitudes and behaviors, as seen in the CDC bleach drinking study. In correlational research, participant behaviors like yea-saying artificially inflate relationships between variables, as demonstrated in the Facebook-depression and education-anxiety examples. In experimental studies, low-quality responses dilute effect sizes and can obscure real differences between conditions, as shown in the Trolley Dilemma and soda pricing experiments.
While random noise creates challenges, systematic bias poses the greater threat because it consistently distorts findings in predictable ways. This systematic distortion can create relationships where none exist, inflate weak associations into strong ones, or mask experimental effects entirely. Most concerning is when these distortions align with researchers' expectations, making them less likely to be scrutinized.
The Add Health study and other examples throughout this chapter illustrate that data quality problems can affect even the most methodologically rigorous research. The good news is that researchers can implement effective countermeasures. In the next chapter, we will learn about practical strategies to identify and address data quality issues, enabling more accurate and reliable behavioral research.
Frequently Asked Questions
What percentage of online survey data is typically unreliable?
Research estimates that around 40% of responses in a typical study conducted with a market research panel come from unreliable sources, including fraudulent or inattentive participants. However, researcher-centric platforms like CloudResearch Connect and Prolific typically reduce fraudulent data to less than 5% through rigorous verification procedures.
What is yea-saying and why is it a problem in online surveys?
Yea-saying is when participants systematically agree with most survey questions, regardless of the content. This behavior is often learned because saying 'yes' to screening questions increases chances of qualifying for studies and earning money. Yea-saying artificially inflates correlations between variables and distorts research findings.
How does poor data quality affect experimental research?
Poor data quality in experiments leads to biased estimates of effect size. When participants provide random or systematic low-quality responses, it dilutes and distorts the impact of experimental manipulations, potentially causing researchers to miss real effects or underestimate their magnitude.
What are click farms and how do they affect research?
Click farms are operations where people use multiple computers to fraudulently complete online surveys, often from countries outside the target population. They use VPNs and other tools to circumvent geographic restrictions. Click farm participants typically engage in yea-saying and provide implausible answers, seriously compromising data quality.
What happened to data quality on Amazon Mechanical Turk in 2018?
In 2018, researchers experienced a dramatic drop in data quality on MTurk, with unusual patterns including inconsistent demographic information, nonsensical responses, and high attention check failure rates. Investigation revealed that international participants had circumvented geographic restrictions using VPNs, with up to 40% of workers providing unusable data. This led to the emergence of specialized researcher-centric platforms with better data quality controls.
Key Takeaways
- Data quality problems are pervasive in online research, with up to 40% of market research panel responses coming from unreliable sources
- Click farms operate globally, using VPNs and other tools to circumvent geographic restrictions and fraudulently complete surveys
- Yea-saying is a common fraudulent behavior where participants systematically agree with questions to maximize survey earnings
- In descriptive research, poor data quality inflates point estimates, making rare behaviors appear common
- In correlational research, yea-saying artificially inflates relationships between variables, creating false correlations
- In experimental research, poor data quality dilutes effect sizes and can mask real differences between conditions
- Systematic bias is more dangerous than random noise because it consistently distorts findings in predictable ways
- Findings that confirm researcher expectations are especially vulnerable because they face less scrutiny
- Researcher-centric platforms like CloudResearch Connect and Prolific reduce fraudulent data to less than 5% through rigorous verification
- The 2018 MTurk crisis demonstrated how quickly data quality can deteriorate and led to increased emphasis on verification and quality screening









