By Josef Edelman, MS, Aaron Moss, PhD, & Cheskie Rosenzweig, MS
When a research participant submits a survey on Amazon Mechanical Turk (MTurk), you, as the researcher, have two options. You can accept the HIT (Human Intelligence Task), in which case the participant gets paid and their approval rating increases. Or, you can reject the HIT, in which case the participant is not paid and their reputation suffers (i.e., their approval rating goes down).
Academic researchers are often reluctant to reject participant submissions because of constraints imposed by institutional review boards and concern for the welfare of people on MTurk. Both concerns are reasonable. Yet, when researchers fail to reject clearly fraudulent submissions, bad actors are able to exploit the system. Data from CloudResearch indicate that almost 90% of researchers reject less than 1% of submissions (Figure 1). And, only 4% of researchers ever reject more than 5% of submissions.
While the practice of accepting all submitted work protects participants from potential mistreatment, if no tasks submitted are ever rejected, this can have negative consequences on the well-being of the platform as a whole. First, many researchers rely on worker reputations to help them avoid low quality respondents. When workers are rejected, their reputation is also downgraded in addition to not being paid. Multiple rejections downgrade worker reputation to the point where they become ineligible for most HITs, effectively quarantining workers who are not paying attention and only trying to game the system. Research has shown that workers with low reputation ratings are indeed much more likely to provide random data in psychology studies (Peer et al, 2014). Thus, rejecting such workers provides protection to the research community as a whole by keeping bad actors from participating in research studies. When such workers are not rejected, their reputations remain intact and their fraudulent behavior is continuously reinforced. Rejections are particularly important for the growing number of studies on Mechanical Turk that pay workers substantial compensation, sometimes totaling over $500. Such studies require extra protection because without accountability they are certain to attract fraud. Having to pay all workers, independent of whether the task was completed properly, will make running such studies online impossible (Litman & Robinson, 2020).
Ironically, the low rate of rejection continues at a time when researchers have been increasingly concerned with fraudulent responses. Evidence indicates that an increasing number of people outside the US have used virtual private networks (VPN) and virtual private servers (VPS) to access studies meant for people within the US. Often, these submissions can be identified by a variety of low-quality and fraudulent responses, and tools exist that can prevent these individuals from entering your survey. However, when low-quality participants do access a study, researchers often want to err on the side of caution when issuing rejections. We provide guidelines below for when we believe there are valid grounds to reject a participant’s submission.
This occurs when a respondent selects the same answer in a straight line for every question – exhibiting the respondent’s lack of attention for the task.
Because straight-lining is so easily spotted by the researcher, participants who are not paying attention seldom provide straight line responses. However, straight-lining is clear grounds for a rejection when it shows the participant was not reading items in the study.
A respondent completes a task at an unrealistic pace – exhibiting the respondent’s lack of attention for the task.
When a respondent either provides gibberish or fails to follow a clear set of instructions for an open-ended response question, this can be grounds for rejection.
Other examples of grounds for rejection are copy and pasted responses, responses that bear no relation to the question prompt, or responses that are often indicative of inattentive or otherwise low-quality participants including “NICE,” “GOOD,” and “GOOD SURVEY.”
A respondent fails a valid attention check – exhibiting the respondent’s lack of attention for the task.
We recommend only using attention checks that have previously been validated as accurately measuring attentiveness and data quality. Although it may seem like attention check questions were all created equal, research has shown that it isn’t so simple to create good check questions. Questions that are thought of as attention checks don’t always measure attention and data quality. Some questions may instead be measuring memory, vocabulary level, or even culturally-specific knowledge that participants should not be expected to have. Because of this, pass rates on some attention checks are correlated with certain demographic factors such as, socioeconomic status, education, and race. Additionally, research has shown that some attention checks will be failed even by careful and attentive participants who may interpret a question in different ways than researchers intended.
We also do not recommend using questions that are written in a crafty way to catch participants who do not read every single word extremely carefully. These “gotcha” type questions are challenging even for well meaning and attentive respondents, and do not take into account the fact that respondents, no matter how they are recruited, are human beings who by nature have shifting attentions and are not perfect.
There can be great variability in what some attention check questions measure, and the difficulty of attention checks is not always straightforward. Therefore, we do not recommend rejecting participants for failing a single attention or manipulation check. However, when participants fail multiple, straightforward and simple attention checks there may be grounds for a rejection. Like speeding or straight-lining, performance on attention checks is often best considered alongside other measures of data quality.
It is important that participants understand potential rejection criteria before entering a study. Although some kinds of rejections may be based on data that are obviously fraudulent or in clear instances where participants did not complete the task, whenever possible, give very clear instructions to participants to guide them in how you’d like them to complete your HIT. Not only will this create an objective standard which workers can aim for, but giving clear instructions is a best practice that will help you get better data. For example, if you are collecting data on an open ended HIT, you can set clear criteria for what you consider to be an adequate open-ended response. These criteria should be clear in the HIT instructions, and repeated in the study itself. For example, you might include a minimum number of words, a minimum number of sentences, and a concrete example of an acceptable response. We also recommend giving workers an opportunity to redo the study if that is at all feasible. Whenever possible, when submitted work does not meet the study’s criteria you can email workers, explain why they were rejected, and offer an opportunity to try again.
As mentioned throughout this blog, the best case for issuing rejections often comes from considering various pieces of information together. Participants who show evidence of not paying attention across multiple measures of quality are good candidates for rejection. And, whenever there is clear evidence of fraudulent behaviors, researchers should strongly consider the implications of not just what the consequences are of rejecting the submission but what the consequences are of approving fraudulent responses.
We hope these tips help you in your research! If you are tired of dealing with bad data, we have many solutions to help. You can use our Cloud Research-Approved Workers to get very clean data on MTurk! Additionally, you can contact our Managed Research team and we can carry out all parts of your study, from setup, to data cleaning, to analysis. Our team is staffed with experts in recruitment methods and data quality and can help you get the most out of your research.