Solving the Challenges of Managing Data Quality in Online Research

Aaron Moss, PhD

By Aaron Moss, PhD & Leib Litman, PhD

The CloudResearch Guide to Data Quality, Part 3:
How to Enhance Survey Data Quality on Amazon MTurk and Online Panels

Technology has transformed behavioral science research. Researchers today can quickly access participants from all over the world and collect data in ways not possible in the past. Key to this transformation has been online participant recruitment platforms like Mechanical Turk (MTurk) and market research panels. Although these panels offer the opportunity to conduct research quickly and efficiently, they also pose unique opportunities and challenges for managing data quality. Not all platforms are built the same way or are equally valid for different types of research. As we will see, finding participants “fit for purpose” for a research study is a big part of managing data quality in online studies.

Amazon’s Mechanical Turk has Revolutionized Online Research

In academia and for many businesses, Amazon’s Mechanical Turk (MTurk) has been at the heart of the web-based research revolution. MTurk was launched in 2005 as a platform where “requesters” could post small jobs to people known as “workers.” All the tasks posted on MTurk require human intelligence — transcribing data, categorizing images, moderating website content, and completing behavioral science studies.

Initially, many requesters were interested in using MTurk to build model data that could be used to train machine-learning algorithms. Then, a few years later, behavioral scientists co-opted MTurk as a research tool.

HMTurk and Participant Panels Enable a New Frontier of Behavioral Research

The popularity of MTurk among behavioral scientists increased dramatically in 2011. That year, researchers demonstrated that high-quality data for human behavioral research could be collected on MTurk quickly and inexpensively (Buhrmester et al., 2011). In addition, companies like Qualtrics and SurveyMonkey dramatically lowered the technical skills required to program web-based experiments. Thus, the stage was set for the research revolution: MTurk made it possible for researchers to locate and pay participants, while a whole host of online tools made it easy for nearly anyone to create sophisticated online surveys and experiments.

Within a few years of MTurk’s adoption by academic researchers, it was clear much of social science was at a tipping point. MTurk made it possible for researchers to collect data in a fraction of the time required for lab-based experiments, and the participants on MTurk often were more diverse than students in university subject pools. In addition, because MTurk was more affordable than other online alternatives, academics invested resources in understanding issues significant to MTurk: data quality; participant representativeness; replicability of established experimental findings; factors associated with participant availability; and characteristics associated with non-naive MTurk participants. Shortly after MTurk’s adoption by academic researchers, a majority of articles published in the top journals of some disciplines contained data collected from Mechanical Turk.

Requester Reputation: A Key Tool for Improving Response Quality on MTurk

Perhaps one of the strongest tools for maintaining data quality on MTurk is something Mechanical Turk built into its platform from the start: a reputation mechanism. On MTurk, requesters have complete discretion over whether to accept or reject submissions from workers. When a worker’s submission is rejected, the worker is not paid and their reputation suffers.

Other MTurk Features that Help Maintain Data Quality

In addition to MTurk’s reputation mechanism, a number of other features allow researchers to get more participant engagement than is typical on online platforms. One is the ability of researchers to set compensation amounts for each task. When a researcher needs participants to engage in an arduous task, the researcher can pay more money or offer a bonus. Another platform feature useful for researchers is that each worker has a unique worker ID. Worker IDs can be used to recontact workers, making longitudinal or follow-up studies possible.

Although MTurk makes it possible for researchers to conduct a wide variety of online studies, speed and efficiency are not always possible.

Because MTurk was not built as a platform for social science research, there are a number of common research tasks that are challenging, time-consuming, or impossible to accomplish without interacting with MTurk’s application programming interface. CloudResearch’s MTurk Toolkit helps researchers manage MTurk studies and ensure data quality by simplifying the setup and execution of MTurk studies.

The Expanding World of Online Participant Panels

As online data collection practices expanded among basic researchers in academia, they were also being developed in applied areas of behavioral science, like market research. However, because applied research often takes place within industry, where researchers are better funded, these industry researchers developed platforms for online participant recruitment faster than the academics who relied on MTurk.

Many of the online panels used in the industry began in the early 2000s, developed to meet the demands of market researchers. Because market researchers often want to collect data from large samples or segment the population to learn what specific groups think about specific brands or products, online panel providers created large participant pools similar to MTurk in providing access to people willing to take online studies, but different from MTurk in terms of their focus and structure.

Online Panels vs. MTurk: Major Differences for Researchers

In terms of focus, online panels sought to sign up as many potential participants as possible. Building panels with tens of millions of people worldwide gave panel providers the ability to fill requests for studies seeking thousands of participants.

In addition, large panels allowed researchers to reach diverse groups of people, hone in on specific demographic criteria and create samples stratified by important demographic characteristics like ethnicity or wealth.

These online panels also differed from MTurk in terms of structure. Whereas participants on MTurk are people who opt-in to work on the platform, participants in online panels might opt-in, but they also might have been recruited while engaged in other activities on the Internet. Because participants in online panels are recruited in various ways, not all participants are integrated into the platform or possess the same level of motivation to complete studies. This has a big effect on the type of data researchers can expect.

The biggest challenge of using online panels for research is that many participants are inattentive. Research studies show that the trade-off for recruiting tens of millions of participants is that many of these people are not motivated to provide quality data. Fortunately, research has also shown that inattentive participants in online panels can be screened and directed away from a study before they enter the study and contribute low-quality data.

Finally, researchers on MTurk choose the amount participants will be compensated, while in other online platforms, panel providers are in charge of participant compensation. Although research shows compensation does not affect data quality when participants are asked to answer survey questions, compensation does affect people’s willingness to engage in long, challenging or complex tasks requiring effort and ability. Therefore, researchers are often able to get stronger participant engagement in MTurk than through online panels.

Continue Reading: Ultimate Guide to Survey Data Quality

Related Articles