Best Recruitment Practices: Working With Issues of Non-Naivete on MTurk

Leib Litman, PhD

It is important to consider how many highly experienced workers there are on Mechanical Turk. As discussed in previous posts, there is a population pool of active workers in the thousands, but this is far from exhaustible. A small group of workers take a very large number of HITs posted to MTurk, and these workers are very experienced and have seen measures commonly used in the social and behavioral sciences. Research has shown that when participants are repeatedly exposed to the same measures, this can have negative effects on data collection, changing the way workers perform, creating treatment effects, giving participants insight into the purpose of some studies, and in some cases impact effect sizes of experimental manipulations. This issue is referred to as non-naivete (Chandler, 2014; Chandler, 2016).

The current standard approaches to recruitment on MTurk actually compound this problem. When recruiting workers on Mechanical Turk, requesters have the ability to selectively recruit workers based on specific criteria such as the number of HITs previously approved, and the workers’ approval rating – the percentage of previous HITs that were approved of total HITs completed. A commonly used standard is to select workers who have approval ratings of >95% (see Peer, 2014). This is not quite enough on its own, however, because MTurk’s system assigns a 100% approval rating to all workers who have completed between 1 and 100 HITs, regardless of how many were actually approved. Once workers complete 100 HITs, their approval rating accurately reflects the number of HITs they were approved for. It is therefore recommended, and common practice to only recruit workers who have approval ratings of >95% and who have completed at least 100 HITs. Once researchers use the approval rating system as part of their qualifications for a study, by default, the CloudResearch system adds the qualification that workers must have previously completed at least 100 HITs in order to address this issue (researchers do of course have manual control of this).

By selectively recruiting workers with a high approval rating and a high number of previously completed HITs a requester can have increased confidence that workers in their sample can be trusted to follow instructions and pay attention to tasks. Indeed, many researchers choose to recruit participants who have high approval ratings and have completed a high number of previous studies. The approval rating system is unique, as it is a constant motivating factor that makes workers pay attention to each task. This system helps researchers collect high quality data. However, this leads to the exclusion of workers who have completed few HITs, even if they may be good providers of data but haven’t yet had the chance to “prove” it. The use of only workers who have high approval ratings has a negative effect, which is that it is a selection criteria that is based on recruiting only workers who are more experienced, and therefore less naive to measures used on the MTurk platform, bringing the issue of non-naivete to the fore.


Solutions

CloudResearch is introducing a new tool which allows requesters to exclude workers who are extremely active, thus making it possible to selectively recruit workers who are not overly active and are more naive to commonly used measures. We believe this will have great positive impacts on data collection if researchers choose to utilize it. Another option for researchers is to use Prime Panels, which has workers who are more naive to commonly used measures due to the size of the platform and its primary use for marketing research surveys which typically have very different data collection goals and uses different tools than those used on MTurk.


References

Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior research methods, 46(1), 112-130.

Chandler, J., Paolacci, G., Peer, E., Mueller, P., & Ratliff, K. A. (2016). Using nonnaive participants can reduce effect sizes. Psychological science26(7), 1131-1139.

Peer, E., Vosgerau, J., & Acquisti, A. (2014). Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior research methods46(4), 1023-1031.

Related Articles

SUBSCRIBE TO RECEIVE UPDATES