No, Americans Are Not Gargling Bleach: How Bad Survey Data Inflated Estimates in the Latest CDC Report – And How to Prevent This From Happening in the Future

Leib Litman, PhD

By Leib Litman, PhD & Zohn Rosen, PhD

Data collection online has become standard practice even for major institutions like the CDC, but unless care is taken to ensure subjects are honest and attentive the results can be very misleading.

On June 12, 2020, the CDC released a report titled “Knowledge and Practices Regarding Safe Household Cleaning and Disinfection for COVID-19 Prevention – United States, May 2020.” The results from this report were alarming, claiming that, “39% of Americans engaged in at least one high-risk behavior during the previous month.” These included people who used bleach to wash food items (19%), used household cleaning products on their skin (18%), and people who drank or gargled diluted bleach or another disinfectant (4%).

Just on its face, the idea that almost 40% of Americans are using cleaning products in a dangerous manner is a bit hard to believe – but it is hard to argue with data. At least that is the case if the data comes from a trusted source, which is unclear in this case. We tested this idea with a new study conducted to reexamine this topic using a system that ensures higher data quality.

Who Did the CDC Survey to Generate This Report?

The report issued by the CDC used online market research panels, which provide researchers with access to tens of millions of Americans. These people are paid small sums of money to provide answers to questions ranging from the next movie they plan to watch to whether or not they are willing to wear masks during a pandemic.

But, for some of the people taking surveys, other things compete for their attention, including dealing with their children, TVs, music, conversations, and other distractions. Competing factors like these are accentuated during the time of COVID, when just about everyone is staying at home. Even worse than being inattentive, some of these respondents put minimal effort into answering questions, effectively selecting answers at random, or actively giving deceiving answers, resulting in low-quality survey data unless these respondents are identified and controlled for. These problems are not unique to online panels, as even face-to-face lab-based studies can suffer from inaccurate and fabricated responses (Hauser, D. J., & Schwarz, N., 2016).

Digging Deeper: How Low-Quality Respondents Inflated the CDC’s Findings

In a new study conducted by CloudResearch, the same market research panels were used to gather data, but an advanced system called SentryTM was applied. Sentry operates through a combination of behavioral assessment and fraud detection technology, and flags inattentive and otherwise low-quality respondents so they can be dropped from the final results. The sample obtained was virtually identical to that collected by the CDC in terms of demographics and representativeness and included over 2,500 people.

The differences in the outcomes were striking. The vast majority of people who reported engaging in high-risk behaviors were flagged by Sentry as being problematic respondents, leaving only 20% of respondents – 19% less than reported by the CDC – using household cleaning products in a risky manner. When examining the most dangerous behaviors such as gargling with cleansers, the numbers drop precipitously, with only 0.9% gargling with any type of cleanser and only 0.5% using bleach.

Looking deeper, we examined the differences between people who were flagged as being problematic (failed the Sentry check) and those who were identified as providing reliable data. The people flagged as problematic reported engaging in high-risk behaviors far more frequently than people who passed the Sentry screening. Over 31% of people flagged by Sentry reported engaging in the highest-risk behaviors (drinking/gargling a household cleaner), but when that data was removed just over 1.5% of respondents reported any of the riskier behaviors.

Clearly, inattentive or deceptive respondents can skew the results of surveys. But why are the results affected so dramatically? The answer is simple probability. When respondents do not carefully read questions they tend to choose randomly among the response options. This noise artificially makes infrequent practices seem much more common and frequent practices (such as increased hand washing) less common. Lower frequency practices such as gargling household cleaners are especially vulnerable to such bias since a change of even a few percentage points makes a big difference in the interpretation. It is notable that when the data from the subjects flagged as being problematic are added back in, the results almost perfectly approximate what was reported by the CDC.

How Can Researchers Ensure That They’re Gathering Trustworthy Survey Data?

Flawed survey data not only misinform the public but can also pose risks to public health. In the present study, there is some potential harm in communicating to tens of millions of people that other people are using cleaners to try to protect themselves during a pandemic. Considerable evidence in social psychology shows that social norm information—information about how other people are behaving—affects people’s individual behavior, sometimes in unintended ways (e.g., Cialdini et al., 2006; Goldstein, Cialdini, & Griskevicius, 2008). It is therefore critical for researchers to ensure that the data they are reporting is trustworthy. Doing so however is not always simple.

We recommend that assessments of data quality in behavioral research should be approached from the ‘fit for purpose’ perspective: different surveys require different standards of stringency depending on each survey’s measurement goals. Under all circumstances, inattentive and disengaged respondents add “noise” to a dataset, making it harder to find significant results. When researchers use survey data to make inferences about unusual and low-frequency behaviors such as drinking bleach, however, screening becomes even more imperative. The present study is the textbook example of when the most stringent filters should be used. This is because even a small bias can lead to highly misleading results when measuring the potential occurrence of low-frequency behaviors such as drinking bleach. When the goal of a survey is not to measure low-frequency behavior, less stringent filters may be perfectly adequate to produce accurate and reliable results. Overall, the data cleaning approach has to be fit to the purpose of the study.


Hauser, D. J., & Schwarz, N. (2016). Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behavior Research Methods, 48(1), 400-407.

Cialdini, R.B., Demaine, L.J., Sagarin, B.J., Barrett, D.W., Rhoads, K., & Winter, P. (2006). Managing social norms for persuasive impact. Social Influence, 1(1), 3-15.

Goldstein, N. J., Cialdini, R. B., & Griskevicius, V. (2008). A Room with a viewpoint: Using social norms to motivate environmental conservation in hotels. Journal of Consumer Research, 35(3), 472-482.

Related Articles