Amazon's Mechanical Turk has Reinvented Research

The icon indicates free access to the linked research on JSTOR.

Why do hurricanes with female names have more victims? Does intelligence predict voting? Is it ethical to pay people to participate in medical experiments?

In the past several years, researchers have embraced a new tool for answering questions like these: crowdsourcing armies of online survey respondents from Amazon’s Mechanical Turk, Google Surveys, and a wide range of other fee-for-service survey panels.

Audio brought to you by curio.io

I understand the appeal. After leaving a job that gave me regular access to a large pool of survey respondents, I felt borderline incapacitated by the lack of easy, reliable options for getting a quick, representative sample. Then I discovered Mechanical Turk, and my data superpowers were restored. Suddenly I was once again able to get quick feedback on such burning questions as how people really feel about Slack.

What makes online survey samples so revolutionary for social science?

Let’s look at Amazon’s Mechanical Turk in particular, since it’s the option that is most widely studied. For those unfamiliar with MTurk (as it’s often abbreviated), political science scholar Adam J. Berinsky et al.’s “Evaluating Online Labor Markets for Experimental Research” offers a very useful overview:

To initiate a survey using MTurk, a researcher (a “Requester” in Amazon’s vernacular) establishes an account (http://www.mturk.com), places funds into her account, and then posts a “job listing” using the MTurk Web interface that describes the Human Intelligence Task (HIT) to be completed and the compensation to be paid (Amazon assesses Requesters a 10% surcharge on all payments [ed: this surcharge has now increased and is typically 20-40%]. Each HIT has a designated number of tasks and the requester can specify how many times an individual MTurk “Worker” can undertake the task. Researchers can also set requirements for subjects, including country of residence and prior “approval rate,” which is the percent of prior HITs submitted by the respondent that were subsequently accepted by Requesters. When MTurk Workers who meet these eligibility requirements log onto their account, they can review the list of HITs available to them and choose to undertake any task for which they are eligible.

Since “turkers” (as MTurk workers are known) can be paid as little as pennies per task, it may seem strange that there are so many people ready, willing, and able to undertake tasks like survey completion. In an article on the implications of crowdsourcing for labor law, scholar Alek Felstiner shares a few quotes from turkers that help to clear up the mystery:

I am a retired senior citizen on a limited income…The extra income becomes even more important now [with] higher gas prices, and the grocery bill becoming more costly each week.

No available jobs in my area, have applied to over 40 jobs no calls so far been 3 months. Do it to pay my bills which includes rent and diapers for my kids until I find work again.

I am working as teacher and my salary is not enough to fullfil my needs so I am looking for some more money. That is why i am participating on Mechanical Turk.

Turkers—and the larger online survey ecosystem of which they are a part—have helped address several methodological problems that have long bedevilled academic researchers. First, online survey respondents give academic researchers an alternative to the tried-and-tired practice of treating undergraduates as guinea pigs. As psychology scholars Michael Buhrmester et al. note in their 2011 assessment of Mechanical Turk data, “[c]ommentators have long lamented the heavy reliance on American college samples in the field of psychology and more generally those from a small sector of humanity.”

Second, online respondents are more representative than other comparable affordable sources of “sample” (a.k.a. a group of survey respondents). As political science scholars Erin C. Cassese et al. note in “Socially Mediated Internet Surveys: Recruiting Participants for Online Experiments,” “[r]esearchers find that relative to other convenience samples, MTurk participants are generally more diverse and seem to respond to experimental stimuli in a manner consistent with the results of prior research.”

Finally, Mechanical Turk respondents generally do a pretty thorough job with their survey responses (though, as I’ll explain later, this can be a mixed blessing). As Berinksy et al. write, “MTurk respondents may generally pay greater attention to experimental instruments and survey questions than do other subjects.”

A golden age in survey research

The sudden availability of a conscientious, diverse, non-student source of survey respondents has led to something of a golden age in survey research. There is no shortage of interesting, creative social science that has tapped into the power of the crowd. Scholar Stig Hebbelstrup Rye Rasmussen’s article “Cognitive Ability Rivals the Effect of Political Sophistication on Ideological Voting,” discusses a thirty to forty minute survey of 2,566 MTurk workers, each of whom was paid $1.10. Based in part on this sample, the authors were able to conclude that “the impact of cognitive ability on ideological voting rivals the importance of political sophistication, the classically most important predictor of ideological voting.”

Economists are getting in on the action, too. To understand perceptions of “repugnant transactions” like “[p]aid kidney donation, prostitution, and paid participation in medical trials,” economists Sandro Ambuehl, Muriel Niederle, and Alvin E. Roth turned to Mechanical Turk:

We presented 1,445 subjects on Am Mechanical Turk with a fictitious medical trial that compensates participants with $50, $ 1,000, or $10,000. We described it as a test for side effects of a vaccine that requires a total of 40 hours of a participant’s time, and characterized it as low but nonzero risk. Each respondent was randomly displayed one of the three payment amounts and answered several questions, including how they would decide as a member of the IRB [internal review board] responsible for approving the experiment, before answering the same questions for each of the remaining amounts.

They found that “the model implies that in-kind incentives will be judged as most ethical.” The authors do not specify the dollar-value incentive they needed to elicit survey responses.

And Mechanical Turk also helped an interdisciplinary team delve into the differential response to hurricanes with male names versus those with female names. Noting that “analyses of archival data on actual fatalities caused by hurricanes in the United States (1950-2012) indicate that severe hurricanes with feminine names are associated with significantly higher death rates,” scholars Kiju Jung at al. ran a series of experiments to unpack the phenomenon, in part by using Mechanical Turk:

One hundred forty-two participants were given a scenario and a weather map on which either Hurricane Christopher or Hurricane Christina was displayed and reported their evacuation intentions on three items (e.g., 1 = definitely will evacuate 7=definitely will stay home)… A measurement of perceived risk showed….Hurricane Christopher was perceived to be riskier than Hurricane Christina.

While Mechanical Turk has facilitated eclectic and engaging research projects like these, it’s not without its problems. First, there are still significant questions about the validity of results obtained through MTurk surveys. As John Bohannan notes in his article “Social Science for Pennies,” one particular issue is with “super-Turkers,” people who are essentially professional workers on MTurk, some of them logging more than 20 hours per week. Many social science experiments rely on the subjects not knowing the researchers’ intentions. Berinsky says super-Turkers could potentially skew experiments if they try too hard to please researchers. There is incentive to do that because MTurk uses a reputation system.

Another issue is the way the sheer facility of MTurk may change the practice of social science. Writing in the American Sociological Review, sociologist David Peterson frets that “by embracing this remote source of data-gathering, social psychologists have committed themselves to filtering all aspects of their experiment through the digital medium of the Internet. They lose all “hands on” access to the object of their inquiry.

Economic injustice and ethical quandary

Finally, but perhaps most problematic, is the ethical quandary of employed, tenured academics exploiting low-wage work by people at the opposite end of the pay and job security scale. As Felstiner notes, “workers tend to receive extremely low pay for their cognitive piecework, on the order of pennies per task. They usually earn no benefits and enjoy no job security.”

Academics are not above exploiting the opportunities this low-wage norm provides. Writing about their (admittedly early) experiments on MTurk, Berinksy et al. state that even the highest pay rate we have used on MTurk of $.50 for a 5-min survey (an effective hourly rate of $6.00) is still associated with a per-respondent cost of $.55 (including Amazon.com’s 10% surcharge) or $.11 per survey minute. By contrast, per subject costs for typical undergraduate samples are about $5-10, for nonstudent campus samples about $30 (Kam, Wilking, and Zechmeister 2007), and for temporary agency subjects between $15 and $20.

To my mind, it is this economic injustice—rather than methodological concerns—that should be of greatest concern to would-be MTurk researchers. Indeed, the exploitation of MTurk workers is already drawing scrutiny in a way that reflects poorly on the academic community. While academics may lament shrinking research funding and increased competition for grants, neither funding scarcity nor the importance of academic research are truly acceptable excuses for paying survey-takers rates that are often a tiny fraction of minimum wage.

The good news is that it’s hardly an insuperable issue. Every MTurk task asks the requester to estimate how long their task will take; every completed task includes a report on how long the turner actually required in order to complete it. Researchers simply need

Want more stories like this one?

to deploy a round of test surveys, calculate the actual time to complete each survey, and price their tasks so that their pay rate is at least equivalent to minimum wage. In my own MTurk surveys, I now price all tasks at the equivalent of at least 25 cents a minute, which averages out to the $15-per-hour that many labor activists now advocate as a reasonable minimum wage.

The use of crowdsourced survey platforms is likely to increase in the years ahead, so now is the time to entrench research practices that ensure fair wages for online survey respondents. Peer-reviewed journals, academic publishers, and universities can all play a part in promoting ethical treatment of online respondents, simply by requiring full disclosure of payment rate and task time allocation as part of any study that uses a crowdsourced workforce. We already expect academic researchers to disclose their sample size; we should also expect them to disclose whether their respondents earned a dollar for a five-minute survey, or a quarter for a half-hour survey.

Crowdsourced labor markets are steadily transforming both the ease and power of survey and experimental research. But that transformation will only represent a true elevation of the field if it is built on justice rather than exploitation.

Amazon’s Mechanical Turk has Reinvented Research

Audio brought to you by curio.io