Abstract
Purpose
Amazon Mechanical Turk is an increasingly popular data source in the organizational psychology research community. This paper presents an evaluation of MTurk and provides a set of practical recommendations for researchers using MTurk.
Design/Methodology/Approach
We present an evaluation of methodological concerns related to the use of MTurk and potential threats to validity inferences. Based on our evaluation, we also provide a set of recommendations to strengthen validity inferences using MTurk samples.
Findings
Although MTurk samples can overcome some important validity concerns, there are other limitations researchers must consider in light of their research objectives. Researchers should carefully evaluate the appropriateness and quality of MTurk samples based on the different issues we discuss in our evaluation.
Implications
There is not a one-size-fits-all answer to whether MTurk is appropriate for a research study. The answer depends on the research questions and the data collection and analytic procedures adopted. The quality of the data is not defined by the data source per se, but rather the decisions researchers make during the stages of study design, data collection, and data analysis.
Originality/Value
The current paper extends the literature by evaluating MTurk in a more comprehensive manner than in prior reviews. Past review papers focused primarily on internal and external validity, with less attention paid to statistical conclusion and construct validity—which are equally important in making accurate inferences about research findings. This paper also provides a set of practical recommendations in addressing validity concerns when using MTurk.
Similar content being viewed by others
Notes
The ten methodological concerns are not presented in any particular order that indicates the importance or prevalence of each concern.
In our own data collections, we have allowed MTurk participants a maximum of two attempts and have received positive reviews from participants about offering them a second chance.
References
Aguinis, H., & Lawal, S. O. (2012). Conducting field experiments using eLancing’s natural environment. Journal of Business Venturing, 27, 493–505.
Aguinis, H., & Lawal, S. O. (2013). eLancing: A review and research agenda for bridging the science-practice gap. Human Resource Management Review, 23, 6–17.
Antin, J., & Shaw, A. (2012). Social desirability bias and self-reports of motivation: A study of Amazon Mechanical Turk in the US and India. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI’12, (pp. 2925–2934).
Aust, F., Diedenhofen, B., Ullrich, S., & Musch, J. (2013). Seriousness checks are useful to improve data validity in online research. Behavior Research Methods, 45, 527–535.
Barger, T., Behrend, T. S., Sharek, D. J., & Sinar, E. F. (2011). I-O and the crowd: Frequently asked questions about using Mechanical Turk for research. The Industrial–Organizational Psychologist, 49, 11–18.
Behrend, T. S., Sharek, D. S., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior Research Methods, 43, 800–813.
Bergman, M. E., & Jean, V. A. (2016). Where have all the “workers” gone? A critical analysis the unrepresentativeness of our samples relative to the labor market in the industrial–organizational psychology literature. Industrial and Organizational Psychology: Perspectives on Science and Practice, 9, 84–113.
Bergvall-Kareborn, B., & Howcroft, D. (2015). Amazon Mechanical Turk and the commodification of labor. New Technology, Work and Employment, 29, 213–223.
Berinsky, A. J., Huber, G. A., & Lenz, G. S. (2012). Evaluating online labor markets for experimental research: Amazon.com’s Mechanical Turk. Political Analysis, 20, 351–368.
Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5.
Callison-Burch, C., & Dredze, M. (2010). Creating speech and language data with Amazon’s Mechanical Turk. In Proceedings of the NAACL HLT (pp. 1–12).
Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46, 112–130.
Chandler, J., Paolacci, G., Peer, E., Mueller, P., & Ratliff, K. A. (2015). Using nonnative participants can reduce effect sizes. Psychological Science, 26, 1131–1139.
Crump, M. J. C., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PLoS One, 8, e57410.
DeSimone, J. A., Harms, P. D., & DeSimone, A. J. (2015). Best practice recommendations for data screening. Journal of Organizational Behavior, 36, 171–181.
Fleischer, A., Mead, A. D., & Huang, J. (2015). Inattentive responding in MTurk and other online samples. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 196–202.
Harms, P. D., & DeSimone, J. A. (2015). Caution! MTurk workers ahead—Fines doubled. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 183–190.
Hauser, D. J., & Schwarz, N. (2016). Attentive Turkers: MTurk participants perform better on online attention checks than subject pool participants. Behavior Research Methods, 48, 400–407.
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, 61–135.
Highhouse, S., & Zhang, D. (2015). The new fruit fly for applied psychological research. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 179–183.
Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (2011). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14, 399–425.
Huang, J. L., Bowling, N. A., Liu, M., & Li, Y. (2015a). Detecting insufficient effort responding with an infrequency scale: Evaluating validity and participant reactions. Journal of Business and Psychology, 30, 299–311.
Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27, 99–114.
Huang, J. L., Liu, M., & Bowling, N. A. (2015b). Insufficient effort responding: Examining an insidious confound in survey data. Journal of Applied Psychology, 100, 828–845.
Hunter, J. E., Schmidt, F. L., & Le, H. (2006). Implications of direct and indirect range restriction for meta-analysis methods and findings. Journal of Applied Psychology, 91, 594–612.
Ipeirotis, P. G. (2010). Demographics of Mechanical Turk. NYU Working Paper No.; CEDER-10-01. Retrieved from http://ssrn.com/abstract=1585030
Kam, C. C. S., & Meyer, J. P. (2015). How careless responding and acquiescence response bias can influence construct dimensionality: The case of job satisfaction. Organizational Research Methods, 18, 512–541.
Landers, R. N., & Behrend, T. S. (2015). An inconvenient truth: Arbitrary distinctions between organizational, Mechanical Turk, and other convenience samples. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 142–164.
Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods, 44, 1–23.
Matthijsse, S. M., de Leeuw, E. D., & Hox, J. J. (2015). Internet panels, professional respondents, and data quality. Methodology, 11, 81–88.
McGonagle, A. K. (2015). Participant motivation: A critical consideration. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 208–214.
McGonagle, A. K., Huang, J. L., & Walsh, B. M. (2016). Insufficient effort survey responding: An under-appreciated problem in work and organizational health psychology research. Applied Psychology: An International Review, 65, 287–321.
McGrath, R. E., Mitchell, M., Kim, B. H., & Hough, L. (2010). Evidence for response bias as a source of error variance in applied assessment. Psychological Bulletin, 136, 450–470.
Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17, 437–455.
Oppenheimer, D. M., Meyvis, T., & Davidenko, N. (2009). Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology, 45, 867–872.
Paolacci, G., & Chandler, J. (2014). Inside the turk: Understanding Mechanical Turk as a participant pool. Current Directions in Psychology Science, 23, 184–188.
Paolacci, G., Chandler, J., & Ipeirotis, P. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411–419.
Peer, E., Vosgerau, J., & Acquisti, A. (2014). Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior Research Methods, 46, 1023–1031.
Podsakoff, P. M., MacKenzie, S. B., Lee, J.-Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879–903.
Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63, 539–569.
Pollack, J., & Aguinis, H. (2013). 2013 JCR journal rankings. Retrieved from https://drive.google.com/file/d/0B68LcC5lXuedZmpXSWFvcTZNck0/edit
Ran, S., Liu, M., Marchiondo, L. A., & Huang, J. L. (2015). Difference in response effort across sample types: Perception or reality? Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 202–208.
Roulin, N. (2015). Don’t throw the baby out with the bathwater: Comparing data quality of crowdsourcing, online panels, and student samples. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 190–196.
Rouse, S. V. (2015). A reliability analysis of Mechanical Turk data. Computers in Human Behavior, 43, 304–307.
Schmidt, G. B. (2015). Fifty days as an MTurk worker: The social and motivational context for Amazon Mechanical Turk Workers. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 165–171.
Schmitt, N., & Stults, D. M. (1985). Factors defined by negatively keyed items: The result of careless respondents? Applied Psychological Measurement, 9, 367–373.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Wadsworth Cengage Learning.
Shapiro, D. N., Chandler, J., & Mueller, P. A. (2013). Using Mechanical Turk to study clinical populations. Clinical Psychological Science, 1, 213–220.
Smith, N. A., Sabat, I. E., Martinez, L. R., Weaver, K., & Xu, S. (2015). A convenient solution: Using MTurk to sample from hard-to-reach populations. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 220–228.
Spector, P. E. (2006). Method variance in organizational research. Organizational Research Methods, 9, 221–232.
Sprouse, J. (2011). A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior Research Methods, 43, 155–167.
Stewart, N., Ungemach, C., Harris, A. J., Bartels, D. M., Newell, B. R., Paolacci, G., & Chandler, J. (2015). The average laboratory samples a population of 7300 Amazon Mechanical Turk workers. Judgment and Decision Making, 10, 479–491.
Stone-Romero, E. F. (2011). Research strategies in industrial and organizational psychology: Nonexperimental, quasi-experimental, and randomized experimental research in special purpose and nonspecial purpose settings. In S. Zedeck (Ed.), APA handbook of industrial and organizational psychology (Vol. 1, pp. 37–72). Building and developing the organization Washington, DC: American Psychological Association.
Welcome to Requester Help. (n.d.). Retrieved from http://requester.mturk.com/help
Woo, S. E., Keith, M., & Thornton, M. A. (2015). Amazon Mechanical Turk for industrial and organizational psychology: Advantages, challenges and practical recommendations. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 171–178.
Woods, C. M. (2006). Careless responding to reverse-worded items: Implications for confirmatory factor analysis. Journal of Psychopathology and Behavioral Assessment, 28, 186–191.
Zhu, X., Barnes-Farrell, J. L., & Dalal, D. K. (2015). Stop apologizing for your samples, start embracing them. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 228–232.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Cheung, J.H., Burns, D.K., Sinclair, R.R. et al. Amazon Mechanical Turk in Organizational Psychology: An Evaluation and Practical Recommendations. J Bus Psychol 32, 347–361 (2017). https://doi.org/10.1007/s10869-016-9458-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10869-016-9458-5