The Fulfilled and Unfulfilled Promises of Amazon Mechanical Turk
Over the past fifteen years Amazon Mechanical Turk (MTurk) has become a substantial source of research participants for the behavioral sciences and has served as a new template for how researchers could connect with research participants. From the beginning, the potential for MTurk to transform research by offering equal access to large, high quality and diverse samples was clear. In this talk Dr. Chandler reviews some of the expected and unexpected ways the platform has changed research for the better as well as some of the opportunities that have been overlooked or underappreciated.
Behavioral research requires participants who are attentive and engaged when taking studies. Yet, there are multiple reasons to question how attentive and engaged many online research participants are. The talks in this session highlight both the problems of “bad-faith” respondents and some possible solutions. The first talk discusses management scholars’ perceived prevalence of and experiences with bad-faith survey data. The second talk outlines a method for separating engaged participants from unengaged ones by checking reading and comprehension of task instructions. And, finally, the third talk describes how to keep fraudulent participants out of your studies, regardless of which online source those participants originate from.
Dr. Barbara Larson
Executive Professor of Management, Northeastern University
Ensuring high-quality data has become more challenging over time. ‘Bad-faith responses’, may include intentionally careless responses (e.g., participants responding without reading items or randomly), fraudulent participants (e.g., individuals lying about who they are on the study inclusion items), the same person completing multiple surveys, and bots (computer algorithms or programs) completing multiple surveys or otherwise producing fake data. We discuss management scholars’ perceived prevalence of, and experiences with, bad-faith survey data.
Yefim Roth
Lecturer of Human Service, University of Haifa
The Role of Attention in Checking Decisions.
Participants are assumed to Read instructions, Understand them, and also Believe them (RUB assumption). We investigated the validity and necessity of the RUB assumption in online experiments. Specifically, whether responses from participants who read the instructions differed from those who do not read instructions. Based on ten studies (N=2500) we show that the attentive participants behave very similarly to physical lab participants, while the inattentive majorly differ. Interestingly, even using highly qualified participants does not eliminate this bias.
Leib Litman
Co-CEO & Chief Research Officer, CloudResearch
Collecting High-Quality Data on Any Online Platform: No Ifs, Ands, or Bots.
Maintaining data quality in online studies has always been a concern for researchers. This concern has been exacerbated in recent years due to bots and problematic respondents on platforms like Mechanical Turk and Lucid. In this talk Leib describes tools that virtually eliminate problematic data on Mechanical Turk and significantly reduce problematic data on online market research platforms like Lucid and Qualtrics.
In a second series on data quality, three talks explore advanced methods for measuring and maintaining quality in online studies. Two talks address ways to leverage machine learning and natural language processing systems to identify high and low quality open-ended text responses, and a third talk examines the value in item-timing data.
Matt Lease
Associate Professor of Computer Science, University of Texas at Austin
Automated Models for Quantifying Centrality of Survey Responses.
When collecting data online, an automated method to quantify relative centrality of participant responses can provide insights for quality assurance. Given a set of textual responses to a survey question, which responses are most normative vs. others? Which represent the greatest outliers? Over a set of questions, which participants provide the most normative or outlying responses overall? How might such automated quantitative assessment inform analysis of responses and participants? In this talk I will present our open source library that enables such centrality measures to be computed in a general way across arbitrary question types.
Josh Waxman
Assistant Professor of Computational Linguistics, Yeshiva University
Machine Learning Approaches to Data Quality.
For online survey and task systems such as Mechanical Turk and CloudResearch, data quality is a hard requirement. A psychological survey is less than worthless if the answers were filled out randomly or in an inattentive manner. If an open-ended question is answered cursorily, with plagiarized answers, or with gibberish (so as to finish a survey and collect payment), the answer is not so useful. We will discuss some of the machine learning/NLP approaches we have applied to free-form responses, to predict respondent quality.
Patrick Dubois
Lecturer in Psychology, University of British Columbia
Item Timing can Capture Careless Personalities.
Several existing approaches to mitigating careless responding rely on response content (e.g., bogus items, semantic or psychometric synonyms or antonyms, Mahalanobis distance) yet may introduce systematic biases. I present data from a behavioral approach, measuring individual item timing and persistent, willful rushing, and find support for carelessness as a broader personality trait that should be measured and controlled for, not discarded.
When designing a new study, many of us rely on the tried and true–methods we’ve used before or that are common in our field. The talks in this session highlight innovative ways of collecting rich data about human behavior. From methods that interrupt consumer’s daily routines to nudge them toward new habits, to a chat-bot that finds online trolls, to a voice-recording system that adds depth and complexity to self-reported data, these talks present research methods that are outside the current box.
Evan Hanover
Director of Research, Conifer
Innovation Adores a Vacuum: Using Deprivation Research to Understand Everyday Behavior.
Our most common habits and routines are repeated so often that they demand little thought or reflection of us. This can be a challenge for researchers who often must unpack the most minute everyday detail in the search for opportunity. In this talk, I will discuss how deprivation, in combination with remote research methods, can be used to dislodge people from their routines and thus give us a view into the ‘why’ of today’s most rote routines and the keys to nudging people towards adopting new behaviors tomorrow.
Vasileia Karasavva
Master’s Student in Psychology, University of British Columbia
Hunting for Trolls: Capturing Objective, Subjective, and Self-Reported Trolling.
To capture trolling behavior, we created an interactive chat-bot. Participants, who were told they would be interacting with another person, were given three options per interaction cycle, a positive response, a neutral response, and a trolling response. We measured objective trolling as the number of trolling responses selected in the chat-bot. We found evidence of three different types of trolls: (1) Self-aware trolls, (2) fake trolls, and (3), secret trolls. To the best of our knowledge, this is the first study to comprehensively examine trolling from multiple points of view and one of the first to employ the use of interactive chat-bots. Finally, we discuss the next steps of this line of work, and ways to better organically capture instances of online antisocial behaviors.
David Ferris
Co-founder, Phonic.ai
Voice, Video, and the Wild West of Remote Research.
What problems and opportunities exist in scaling remote research? We’ll answer this question with four unique, post-pandemic case studies.
When designing a new study, many of us rely on the tried and true–methods we’ve used before or that are common in our field. The talks in this session highlight innovative ways of collecting rich data about human behavior. From methods that interrupt consumer’s daily routines to nudge them toward new habits, to a chat-bot that finds online trolls, to a voice-recording system that adds depth and complexity to self-reported data, these talks present research methods that are outside the current box.
Lucas Keller
Postdoctoral Researcher in Psychology, University of Konstanz
Observing Face-Touching Behavior in a Remote Setting
Reducing face touching could help slow COVID-19’s spread, but it can be difficult to do. We tested whether implementation intentions to reduce either the frequency or duration of face-touching effectively reduced the behavior. In this pre-registered online study, we utilized a novel way to collect behavioral data during a pandemic by using the participants’ webcams to obtain video recordings of them performing three engaging tasks for four minutes each. This procedure allowed us to observe (face-touching) behavior remotely.
Jade Gallo
Research Assistant, University of Connecticut
Sophia Gran-Ruaz
Doctoral Student in Psychology, University of Ottowa
We utilized a novel online interview method to assist in validating a new clinical tool for the measurement of trauma symptoms arising from the race-based maltreatment of people of color. We will discuss how we worked with CloudResearch to maximize and prioritize diversity and representation in our racial trauma study for a stronger real-world impact.
Lorijn Zaadnoordijk
Postdoctoral Researcher in Neuroscience Trinity College
Conducting preferential looking studies with infants online.
Looking behaviour has been a valuable measure in infant studies in various domains of development. Recently, there has been an increasing interest in conducting such studies online as opposed to in-lab. There are many benefits to acquiring infant data via online studies, but the method has unique challenges that have to be addressed. In this presentation, I will talk about the benefits, challenges and potential solutions to acquiring infant data via online measurements.
At its core, behavioral research often seeks to improve the world by solving some social problem or contributing to basic human knowledge. The talks in this session embody that ethos to describe how positivity can nudge consumers toward healthy behaviors in an online weight-loss community, explore which kinds of messages lead people to join an environmentally sustainable bank, and provide recommendations for how researchers can respectfully and ethically interact with online participants. The final talk touches on the practical application of online ethics and issues that commonly arise during IRB review.
Khoa Le Nyugen
Interim Manager of Applied Behavioral Science, Weight Watchers
Positivity resonance, an interpersonal experience characterized by shared positive affect, is hypothesized to build social resources that potentially support health behavior change. This research examines how perceptions of positivity resonance emerge and influence consumers’ health behaviors in a commercial online weight-loss community.
Tim Silva
Senior Lead Qualitative UX Researcher, Bank of West
Becoming an Environmental Social Identity Brand: How to Effectively Convey Your Values.
What motivates people to join an environmentally sustainable bank? Drawing from social identity theory and cognitive dissonance theory, three online experiments examined how webpage messaging could be made more persuasive. Studies 1 & 2 demonstrated that using meta-contrast (contrasting our bank’s environmental values vs other bank’s environmental values) was more effective than only stating our bank’s environmental values. Study 3 demonstrated that using induced hypocrisy was more effective for those with stronger vs. weaker environmental identities.
Aaron Moss
Senior Research Scientist, CloudResearch
Is it Ethical to Use Mechanical Turk for Behavioral Research?
In the last decade, MTurk emerged as a flexible, affordable, and reliable source of human participants and was widely adopted. Yet some have questioned whether researchers should continue using the platform on ethical grounds. Their concern is that people on MTurk are financially insecure, subject to abuse, and earn inhumane wages. We conducted two representative probability surveys of the U.S. MTurk population. The surveys revealed that the financial situation of people on MTurk mirrors the general population, most participants do not find MTurk stressful or requesters abusive, and MTurk offers flexibility and benefits that most people value above other options.
Once only whispered about in the halls of academia, careers for PhDs in industry are having a bit of a moment. In this featured event, Ryne Sherman of Hogan Assessments, and Kate Rogers of Zillow share their journey to careers outside of academia. Current PhD student Rachel Hartman will moderate, so this session promises to answer many of the questions that students thinking about careers in industry often have—from how to start looking for a non-academic job to the trade-offs of choosing industry over academia.
Rhyne Sherman
Chief Science Officer, Hogan Assessments
Kate Rogers
Senior Behavioral Scientist, Zillow
The initial promise of online research was access to a more diverse group of participants than often available for in-person lab studies. Yet, with time, researchers have grown more and more interested in reaching more targeted samples. The first two talks in this session outline recent findings from a study that recruited older adults from various online sources and a study that sought veterans willing to participate in a psychotherapy program designed to address suicidality. The third talk provides insight into the platforms researchers often use to recruit participants. By documenting the existence of a gender pay gap in an anonymous online marketplace, this talk will help researchers better understand the dynamics of participant recruitment platforms and online research participants.
Michael Cohen
Postdoctoral Fellow, University of Pennsylvania
Aging-Related Cognitive and Personality Changes Among Online Research Participants.
We measured cognitive performance in participants from three self-recruitment websites (MTurk, CloudResearch MTurk Toolkit, and Prolific) and three panel recruitment websites (Lucid, CloudResearch PrimePanels, and Qualtrics Panels). Consistent with established norms, on all six platforms, aging was positively correlated with vocabulary performance, agreeableness, conscientiousness, and conservatism, and negatively related to processing speed and neuroticism. There were some differences between the self-recruited and panel samples.
Yosef Sokol
Research Health Science Specialist, Veteran Affairs
Over 15 million individuals in the US will need to find a path towards recovery following a suicide attempt. The VA has been at the forefront of integrating theory into mental health care, and is developing a novel recovery-oriented psychotherapy, Continuous Identity Cognitive Therapy (CI-CT), which aims to treat veterans struggling with suicidality. We will discuss our collaboration with CloudResearch to identify a group of Post-Acute Suicidal Episode veterans and pay them to collaborate on CI-CT treatment development.
Francesca Manzi
Postdoctoral Fellow in Psychology, Utrecht University
A Gender Pay Gap Despite Gender-Blindness: The Hidden Effect of Pay Expectations.
We use a completely gender-blind online work setting to examine the effect of a covert source of gender inequality: differential pay expectations. Despite the absence of many traditional barriers to gender equality in this online work setting, female workers earn less than male workers. Our findings further reveal that differences in earnings are largely driven by differential pay expectations: Women’s salary expectations are lower than those of men, and these lower pay expectations lead to reduced earnings.
In the last 20 months, most behavioral scientists have had to adjust their approach to research because of COVID-19. The talks in this session highlight unique opportunities for data collection that were a result of the pandemic. The first talk focuses on rapid data collection with a sample of older adults, the second talk investigates how paranoia unfolded during the pandemic, and the third talk describes a randomized controlled trial to increase happiness and meaning with prosocial acts of kindness.
Phil Corlett
Associate Professor of Psychiatry, Yale University
Paranoia and Belief Updating During a Crisis.
COVID-19 has made the world seem unpredictable. We investigate paranoia and belief updating in an online sample in the U.S. We demonstrate the pandemic increased individuals’ self-rated paranoia and rendered their task-based belief updating more erratic. State-mandated mask-wearing increased paranoia and induced erratic behavior, particularly in states where adherence to mask-wearing rules was poor but where rule following is typically more common. This paranoia may explain the lack of compliance with masks.
Nitin Verma
Doctoral Student in Informational Studies, University of Texas at Austin
In this talk, I will discuss methodological strategies for conducting rigorous and efficient survey research with older adults recruited using online panel aggregators. This talk will focus on: (1) the design of a survey instrument both comprehensible and usable by older adults; (2) rapid collection (within hours) of data from a large number of older adults; and (3) validation of data using attention checks, independent validation of age, and detection of careless responses to ensure data quality.
Join us in celebrating the winners of our $10,000 CloudResearch Grant Awards who will each receive $2500 for their research, which include projects aimed at developing better facial stimuli for social research on stereotyping and prejudice, understanding how imaginary motion affects time perception, understanding group reasoning and polarization in an ecologically valid online context, and investigating the decision-making that leads venture capitalists to routinely underinvest in Black start-ups. We can’t wait to see the impact this research has and we’re thrilled to have a small role in making it easier to conduct!
Art Marsden
Syracuse University
Michiel Spapé
University of Helsinki
Nick Byrd
Stevens Institute of Technology
(team: Simon Cullen & Philipp Chapkovski)
Megan Burns
New York University
Social media represents a huge opportunity for data collection, but it is an opportunity fraught with challenges. In these talks, presenters will share mistakes and successes using Reddit for research, the unique reach of sites like Facebook and Reddit, and some data on the representativeness of comments within online news stories.
Raymond Luong
Doctoral Student in Psychology, McGill University
Evaluating Reddit as a Crowdsourcing Platform for Psychology Research Projects.
This talk will cover data collection using Reddit, specifically my experiences 1) assessing the data quality of r/SampleSize relative to Amazon Mechanical Turk and 2) actually using r/SampleSize for my own applied research study (currently under review). Beyond the different strategies that I used to assess data quality, I will also discuss some of the demographic advantages for sampling from r/SampleSize, practical advice for optimizing data collection on Reddit based on experience/literature, and mistakes that I made in my own experiences.
George Kuhn
President, Drive Research
Emily Carroll
Marketing Manager, Drive Research
Using Social Media to Find & Recruit Participants for Market Research.
Using social media to find research participants is not the first sampling method research firms think to utilize. However, from our experience, there are several instances where using Facebook, Instagram, or LinkedIn to find qualified participants is better than using a panel or call list. In this presentation, we will share best practices and real-world examples of using paid Facebook ads to find the most difficult to source audiences in market research.
Seung Woo Chae
Doctoral Student in Media Student, Indiana University Bloomington
Representativeness of Comments under Online News.
While we can see comment threads under most online news these days, it is difficult to guess how well they represent the whole audience’s ideas. This study addresses the representativeness of online comments employing natural language processing (NLP). Two types of text data sets were collected: real online comments and comments written by survey participants which were assumed to represent the whole audience group’s ideas. The extent of similarity between the real comments and the survey comments was calculated for each political group using a state-of-the-art NLP technique BERT.
One of the largest opportunities presented by online research is the ability to collect longitudinal data with less burden placed on participants. In the studies reported in these talks, researchers will describe a large-scale longitudinal study investigating political polarization and an intense daily diary study that had 200 participants complete a survey each day for one week and yielded more than 1,300 daily entries. Attendees of this session will learn practical tips and best practices for carrying out a variety of longitudinal research projects.
Brittany Shoots-Reinhard
Senior Research Associate, University of Oregon
In two large-scale longitudinal datasets we investigated ability-related political polarization in responses to the COVID-19 pandemic. We observed more polarization with greater ability in emotional responses and risk perceptions across five waves of data collection with a diverse, high-quality convenience sample of MTurk workers recruited with CloudResearch. The results from our study suggest that polarization may be a function of the amount and/or application of verbal knowledge rather than selective application of quantitative reasoning skills. The results also demonstrate that convenience samples of MTurk workers recruited with CloudResearch are fast, cost-effective, and comparable to larger, representative samples.
Michael Maniachi
Associate Professor of Psychology, Florida Atlantic University
Collecting Intensive Longitudinal Data.
Intensive longitudinal methods (e.g., experience sampling or daily diary studies that involve collecting repeated measurements from participants over several days) offer notable benefits over cross-sectional designs: reducing retrospective recall bias, distinguishing within-person and between-person processes, and examining predictors of within-person change. Compared to typical convenience samples, crowdsourcing services can afford greater confidentiality while providing access to a relatively large and diverse participant population. This talk will discuss the benefits and challenges of collecting intensive longitudinal data through crowdsourcing, including strategies to promote compliance and enhance data quality while protecting participant confidentiality. It will also cover an example study of daily social interaction in which 200 participants recruited from MTurk completed daily surveys over one week, yielding more than 1,300 daily responses.
Even casual observers of the political process know that in the last two Presidential elections the polls had some problems. Although there is a lot of disagreement about what, exactly, those problems were, it seems clear that online polling will be a part of any future solutions. In the two talks in this session, Emerson College Polling and AtlasIntel will describe what went wrong in many 2020 Presidential polls and how a combination of methods allowed both of these organizations to produce accurate polls in the last cycle.
Spencer Kimball
Director, Emerson College Polling
Isabel Holloway
Program Assistant, Emerson College Polling
Efficacy of Mixed-Mode National Pre-Election Polling in the 2020 U.S. Presidential Election.
During 2019-2020, Emerson College Polling conducted a series of monthly national polls tracking the presidential nomination and eventual race of the White House. Throughout the year, ECP used a mixed-mode approach, combining IVR, online panels, and SMS-to-web data at varying proportions. ECP ended the election cycle with one of the most accurate national polls, with the October national poll consisting of IVR and online panel provided by Amazon MTurk and predicting a 4.2 percent (D) magnitude in the popular vote, in comparison to the final magnitude of 4.4 (D) percent.
Andrei Roman
Founder and CEO, AtlasIntel
Polling Error: Why Were Polls so Wrong in 2020 and What Can We Do About It?
This talk will discuss the key challenges for pollster accuracy in the 2020 presidential cycle, drawing on AtlasIntel’s methodological and practical insights. It will also explain how AtlasIntel worked to ensure sample representativity and correct sources of potential bias.