Amazon.com’s Mechanical Turk (MTurk) is an online, web-based platform that started in 2005 as a service to allow researchers to “crowdsource” labor-intensive tasks for workers registered on the site to complete for compensation.1, 2 MTurk has rapidly become a source of subjects for experimental research and survey data for academic work, as its representativeness, speed, and low cost appeal to researchers.2, 3 Researchers post links to surveys and experiments and use MTurk to crowdsource the survey, collect the data, and compensate workers.4 A Google Scholar search of “Amazon Mechanical Turk” revealed 15,000 results published between 2006 and 20143 and 17,400 results by mid-2017. MTurk is the largest online crowdsourcing platform,4 with about one-third of the tasks related to academic tasks.5 The growing popularity of MTurk has led to questions about its soundness as a subject pool; MTurk is the most studied nonprobability sample available to researchers.3

The MTurk pool of potential workers is vast, diverse, and inexpensive. MTurk has 500,000 registered users3 with 15,000 individual US workers at any given time.6 MTurkers have been paid as little as $0.05 to complete 10- to 15-min tasks.4 Researchers can collect data from large enough samples to generate significant statistical power at one-tenth of the cost of traditional methods.4 The MTurk population is more representative of the population at large than other online surveys and produces reliable results.2, 3, 6,7,8,9,10,11

There is a rapidly growing literature exploring the generalizability of MTurk responses to other data collection methods. Data obtained via MTurk surveys and experiments are at least as reliable as those obtained via traditional methods, are attractive for conducting internally and externally valid experiments, and the advantages outweigh the disadvantages.3, 8,9,10,11,12,13,14,15 However, the benefits and drawbacks to using MTurk in the health and medical literature are largely unexplored beyond a taxonomy of how MTurk has been used in health and medical research.16 This article is the first synthesis to assess the peer-reviewed literature that has a study objective to analyze MTurk as research tool in a health services research and medical context and uses MTurk for part or all the results. The results from this synthesis can guide academic researchers as they explore the strengths and weaknesses of employing MTurk as an academic research platform.

METHODS

A literature search was performed for articles published between 2005 and mid-February 2017 using Google Scholar and PubMed databases. Searches for variations of the terms Mechanical Turk, MTurk, health, healthcare, clinic*, and medic yielded an initial total of 331 non-duplicative articles. Two reviewers (TH and KM) screened the articles first by title review, eliminating those that did not pertain to health as defined by the World Health Organization,17 leaving 181 articles. After abstract and full-text review, 35 articles were included in the final analysis.

RESULTS

The 35 articles that met the criteria of primary peer-reviewed article, MTurk used in part or all of the results, and an objective of the study was to analyze MTurk as a research tool in a health services research and medical context are described briefly in Table 1.12, 18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51 A number of strengths of using MTurk in an academic health services setting were identified in the literature. The studies were overwhelmingly supportive of the economical, cost-effective nature of MTurk.18, 19, 23, 25, 26, 29,30,31,32,33,34, 37, 38, 40, 41, 45,46,47,48,49, 51 Additional strengths include the time-saving component of using MTurk, reliability, and high quality. Accurate,34 effective,29, 30, 51 performance comparable to quality of medical experts,18, 26, 33, 34, 39, 41, 43, 48 high verification,42 reliable,27, 31 objective,32 statistically equivalent to data from other samples,12, 22, 24, 38, 49, 50 diverse,19, 21, 47, 49 and viable,28, 36 high quality,35, 42, 45 among other strengths, were consistent conclusions in the literature.

Table 1 Summary of Research Findings

The weaknesses are dominated by the identified strengths, but important to note. Four studies20, 36,37,38 noted three caveats: (1) researchers should exercise caution when generalizing MTurk findings to the US population;20, 36 (2) despite a high degree of inter-rater reliability in the MTurk sample, it is unknown whether the accuracy of the data is comparable to evaluations by trained ophthalmologic experts;37 (3) the data were not validated against a sample using face-to-face interview techniques.38 The literature overwhelming concludes that MTurk is an efficient, reliable, cost-effective tool for a variety of tasks with results comparable to those collected via more conventional means. However, results from surveys on MTurk should not be generalized to the US population.