Best Practices That Can Affect Data Quality on MTurk

Cheskie Rosenzweig, MS

Five stars rating, luxury service, client satisfaction concept.

By Cheskie Rosenzweig, MS, Aaron Moss, PhD, & Leib Litman, PhD

Of late, researchers have reported a decrease in data quality on Mechanical Turk (MTurk). To combat the issue, we recently developed some data quality solutions, which are described in detail in our previous blog. Here, we outline seven best practices for setting up and launching studies on MTurk. Following these best practices will help you gather better data (Litman & Robinson, 2020) and avoid low-quality participants in your study.

1. Target High-Quality Participants

Researchers have seen an influx of low-quality responses on MTurk, and we therefore recommend that most studies be open to only high-quality participants. Through a large screening process CloudResearch has identified these participants and offers access to them through our MTurk Toolkit. We recommend that you use “CloudResearch-Approved Participants” in your research.

2. Provide Participants With Clear Instructions

Tell participants what you need them to do in as clear of terms as possible. The clearer your instructions, the easier it is for participants to understand what they should do, and the easier it is for you to decide whether they have done it.

Don’t assume that people know what you want within a survey. For example, if you want participants to write 4-6 sentences in response to a prompt, don’t ask them to “write a paragraph”; ask them to write 4-6 sentences. Clarity of instructions is important both within your survey, and in how you describe your HIT.

3. Describe Your HIT Accurately

It is important that the way you advertise and describe your HIT to participants is accurate. Underestimating details such as a study’s length annoys participants at a minimum and leads people to drop out or speed through your study at worst. To accurately gauge your study’s length, we recommend piloting before launching the full study. If you notice after the pilot that the study takes people significantly longer than estimated, you can pause the study, edit the details, and then resume.

Another aspect of your HIT it is important to describe accurately is anything you might want participants to do that is out of the ordinary. For example, if your study requires participants to listen to audio, let participants know in the HIT description. Doing so will prevent you from receiving bad data from people who couldn’t or didn’t want to turn on their audio. Similarly, letting people know that the study requires them to download reaction time software like Inquisit, will allow people to decide whether the task is something they want to do before accepting it.

4. Do Not Underpay Participants

Data from four or five years ago—a long time in the world of MTurk—indicate that pay does not affect data quality in multiple choice survey data (e.g., Litman, Robinson, Rosenzweig, 2015). However, this is not true for all tasks. You should make sure not to pay participants too little for the time and effort you expect them to put into your HIT. For survey data, we recommend 12 cents per minute as a baseline. For long studies, complex studies, and studies that require ingenuity and idea generation, we recommend paying more.

Pay is most likely to affect people’s willingness to spend time writing or engaging in other creative activities.

5. Monitor Your MTurk Researcher Reputation

If your MTurk requester account has a history of rejecting lots of people, being slow to pay, or paying below platform standards, some high-quality workers may avoid your studies. Additionally your reputation on platforms like Turkopticon can influence some workers’ interest in taking your HIT (Robinson, Rosenzweig, & Litman, 2020). If you have a reputation for not treating workers well, some high-quality participants may avoid your studies, making it more likely that lower-quality participants will eventually take your HIT.

6. Follow Proper Pre-Screening Procedures

When you want to target a specific group of people on MTurk—Military Veterans, for example—do not launch a HIT in which you list the eligibility criteria in the title (e.g., “Only for Military Veterans.”) Research has shown that in these cases, low quality workers might lie for the opportunity to receive the reward associated with your study, and even if most people are honest, a small percentage of liars can have an outsized effect on your study (Chandler & Paolacci, 2017).

Instead of listing eligibility criteria in the title, you should use MTurk or CloudResearch qualifications to target specific workers. Or, you can conduct your own screening by running two studies in which you use the first study to pre-screen participants on your variable of interest (“Are you a Military Veteran?”), and use the second study to follow up with the people you identified as individuals you want to survey.

7. Avoid Misleading or Unfair Data Quality Checks

We recommend that data quality measures require the same attention level and skill set as the other items in a survey. If your study is written at a 6th grade English level, then make sure your data quality checks don’t require a much higher vocabulary level. Additionally many data quality checks that may initially appear to just be measuring attention, basic engagement, and language comprehension, actually measure a host of other state and trait characteristics, including memory, need for cognition, conscientiousness, spelling ability, and others.

Rejections should not be based on a single failed attention check because this can be unfair to workers. In addition, research has shown that such rejections are often biased and come at the expense of sample diversity. At the same time, measuring data quality is critical to conducting good research. Hence, we recommend considering performance on attention check questions in the context of other data quality measures, including response consistency, open ended response quality, and the context of the specific study.

References

Chandler, J. J., & Paolacci, G. (2017). Lie for a dime: When most prescreening responses are honest but most study participants are impostors. Social Psychological and Personality Science, 8(5), 500-508.

Chandler, J., Paolacci, G., & Hauser, D. J. (2020). Data quality issues on MTurk. In L. Litman & J. Robinson (Eds.) Conducting Online Research on Amazon Mechanical Turk and Beyond (95-120). Sage Academic Publishing. Thousand Oaks: CA

Litman, L., & Robinson, J. (2020). Conducting Online Research on Amazon Mechanical Turk and Beyond. Sage Academic Publishing. Thousand Oaks: CA

Robinson, J., Rosenzweig, C., & Litman, L. (2020). The Mechanical Turk Ecosystem. In L. Litman & J. Robinson (Eds.) Conducting Online Research on Amazon Mechanical Turk and Beyond (27-47). Sage Academic Publishing. Thousand Oaks: CA

Best Practices That Can Affect Data Quality on MTurk

1. Target High-Quality Participants

2. Provide Participants With Clear Instructions

3. Describe Your HIT Accurately

4. Do Not Underpay Participants

5. Monitor Your MTurk Researcher Reputation

6. Follow Proper Pre-Screening Procedures

7. Avoid Misleading or Unfair Data Quality Checks

References

SUBSCRIBE TO RECEIVE UPDATES

Personal and Institutional Information

Detailed Research Proposal Questions

Budget and Grant Tier Request

Research Timeline

Best Practices That Can Affect Data Quality on MTurk

1. Target High-Quality Participants

2. Provide Participants With Clear Instructions

3. Describe Your HIT Accurately

4. Do Not Underpay Participants

5. Monitor Your MTurk Researcher Reputation

6. Follow Proper Pre-Screening Procedures

7. Avoid Misleading or Unfair Data Quality Checks

References

Related Articles

How to Conduct Better Research Studies with the MTurk Toolkit

New Solutions Dramatically Improve Research Data Quality on MTurk

SUBSCRIBE TO RECEIVE UPDATES