Journal of Business and Psychology

, Volume 32, Issue 4, pp 347–361 | Cite as

Amazon Mechanical Turk in Organizational Psychology: An Evaluation and Practical Recommendations

  • Janelle H. Cheung
  • Deanna K. Burns
  • Robert R. Sinclair
  • Michael Sliter
Original Paper

Abstract

Purpose

Amazon Mechanical Turk is an increasingly popular data source in the organizational psychology research community. This paper presents an evaluation of MTurk and provides a set of practical recommendations for researchers using MTurk.

Design/Methodology/Approach

We present an evaluation of methodological concerns related to the use of MTurk and potential threats to validity inferences. Based on our evaluation, we also provide a set of recommendations to strengthen validity inferences using MTurk samples.

Findings

Although MTurk samples can overcome some important validity concerns, there are other limitations researchers must consider in light of their research objectives. Researchers should carefully evaluate the appropriateness and quality of MTurk samples based on the different issues we discuss in our evaluation.

Implications

There is not a one-size-fits-all answer to whether MTurk is appropriate for a research study. The answer depends on the research questions and the data collection and analytic procedures adopted. The quality of the data is not defined by the data source per se, but rather the decisions researchers make during the stages of study design, data collection, and data analysis.

Originality/Value

The current paper extends the literature by evaluating MTurk in a more comprehensive manner than in prior reviews. Past review papers focused primarily on internal and external validity, with less attention paid to statistical conclusion and construct validity—which are equally important in making accurate inferences about research findings. This paper also provides a set of practical recommendations in addressing validity concerns when using MTurk.

Keywords

Amazon Mechanical Turk MTurk Validity Best practices Recommendations 

Supplementary material

10869_2016_9458_MOESM1_ESM.docx (64 kb)
Supplementary material 1 (DOCX 48 kb)

References

  1. Aguinis, H., & Lawal, S. O. (2012). Conducting field experiments using eLancing’s natural environment. Journal of Business Venturing, 27, 493–505.CrossRefGoogle Scholar
  2. Aguinis, H., & Lawal, S. O. (2013). eLancing: A review and research agenda for bridging the science-practice gap. Human Resource Management Review, 23, 6–17.CrossRefGoogle Scholar
  3. Antin, J., & Shaw, A. (2012). Social desirability bias and self-reports of motivation: A study of Amazon Mechanical Turk in the US and India. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI’12, (pp. 2925–2934).Google Scholar
  4. Aust, F., Diedenhofen, B., Ullrich, S., & Musch, J. (2013). Seriousness checks are useful to improve data validity in online research. Behavior Research Methods, 45, 527–535.CrossRefPubMedGoogle Scholar
  5. Barger, T., Behrend, T. S., Sharek, D. J., & Sinar, E. F. (2011). I-O and the crowd: Frequently asked questions about using Mechanical Turk for research. The Industrial–Organizational Psychologist, 49, 11–18.Google Scholar
  6. Behrend, T. S., Sharek, D. S., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior Research Methods, 43, 800–813.CrossRefPubMedGoogle Scholar
  7. Bergman, M. E., & Jean, V. A. (2016). Where have all the “workers” gone? A critical analysis the unrepresentativeness of our samples relative to the labor market in the industrial–organizational psychology literature. Industrial and Organizational Psychology: Perspectives on Science and Practice, 9, 84–113.CrossRefGoogle Scholar
  8. Bergvall-Kareborn, B., & Howcroft, D. (2015). Amazon Mechanical Turk and the commodification of labor. New Technology, Work and Employment, 29, 213–223.CrossRefGoogle Scholar
  9. Berinsky, A. J., Huber, G. A., & Lenz, G. S. (2012). Evaluating online labor markets for experimental research: Amazon.com’s Mechanical Turk. Political Analysis, 20, 351–368.CrossRefGoogle Scholar
  10. Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5.CrossRefPubMedGoogle Scholar
  11. Callison-Burch, C., & Dredze, M. (2010). Creating speech and language data with Amazon’s Mechanical Turk. In Proceedings of the NAACL HLT (pp. 1–12).Google Scholar
  12. Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46, 112–130.CrossRefPubMedGoogle Scholar
  13. Chandler, J., Paolacci, G., Peer, E., Mueller, P., & Ratliff, K. A. (2015). Using nonnative participants can reduce effect sizes. Psychological Science, 26, 1131–1139.CrossRefPubMedGoogle Scholar
  14. Crump, M. J. C., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PLoS One, 8, e57410.CrossRefPubMedPubMedCentralGoogle Scholar
  15. DeSimone, J. A., Harms, P. D., & DeSimone, A. J. (2015). Best practice recommendations for data screening. Journal of Organizational Behavior, 36, 171–181.CrossRefGoogle Scholar
  16. Fleischer, A., Mead, A. D., & Huang, J. (2015). Inattentive responding in MTurk and other online samples. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 196–202.CrossRefGoogle Scholar
  17. Harms, P. D., & DeSimone, J. A. (2015). Caution! MTurk workers ahead—Fines doubled. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 183–190.CrossRefGoogle Scholar
  18. Hauser, D. J., & Schwarz, N. (2016). Attentive Turkers: MTurk participants perform better on online attention checks than subject pool participants. Behavior Research Methods, 48, 400–407.CrossRefPubMedGoogle Scholar
  19. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, 61–135.CrossRefPubMedGoogle Scholar
  20. Highhouse, S., & Zhang, D. (2015). The new fruit fly for applied psychological research. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 179–183.CrossRefGoogle Scholar
  21. Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (2011). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14, 399–425.CrossRefGoogle Scholar
  22. Huang, J. L., Bowling, N. A., Liu, M., & Li, Y. (2015a). Detecting insufficient effort responding with an infrequency scale: Evaluating validity and participant reactions. Journal of Business and Psychology, 30, 299–311.CrossRefGoogle Scholar
  23. Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27, 99–114.CrossRefGoogle Scholar
  24. Huang, J. L., Liu, M., & Bowling, N. A. (2015b). Insufficient effort responding: Examining an insidious confound in survey data. Journal of Applied Psychology, 100, 828–845.CrossRefPubMedGoogle Scholar
  25. Hunter, J. E., Schmidt, F. L., & Le, H. (2006). Implications of direct and indirect range restriction for meta-analysis methods and findings. Journal of Applied Psychology, 91, 594–612.CrossRefPubMedGoogle Scholar
  26. Ipeirotis, P. G. (2010). Demographics of Mechanical Turk. NYU Working Paper No.; CEDER-10-01. Retrieved from http://ssrn.com/abstract=1585030
  27. Kam, C. C. S., & Meyer, J. P. (2015). How careless responding and acquiescence response bias can influence construct dimensionality: The case of job satisfaction. Organizational Research Methods, 18, 512–541.CrossRefGoogle Scholar
  28. Landers, R. N., & Behrend, T. S. (2015). An inconvenient truth: Arbitrary distinctions between organizational, Mechanical Turk, and other convenience samples. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 142–164.CrossRefGoogle Scholar
  29. Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods, 44, 1–23.CrossRefPubMedGoogle Scholar
  30. Matthijsse, S. M., de Leeuw, E. D., & Hox, J. J. (2015). Internet panels, professional respondents, and data quality. Methodology, 11, 81–88.CrossRefGoogle Scholar
  31. McGonagle, A. K. (2015). Participant motivation: A critical consideration. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 208–214.CrossRefGoogle Scholar
  32. McGonagle, A. K., Huang, J. L., & Walsh, B. M. (2016). Insufficient effort survey responding: An under-appreciated problem in work and organizational health psychology research. Applied Psychology: An International Review, 65, 287–321.CrossRefGoogle Scholar
  33. McGrath, R. E., Mitchell, M., Kim, B. H., & Hough, L. (2010). Evidence for response bias as a source of error variance in applied assessment. Psychological Bulletin, 136, 450–470.CrossRefPubMedGoogle Scholar
  34. Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17, 437–455.CrossRefPubMedGoogle Scholar
  35. Oppenheimer, D. M., Meyvis, T., & Davidenko, N. (2009). Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology, 45, 867–872.CrossRefGoogle Scholar
  36. Paolacci, G., & Chandler, J. (2014). Inside the turk: Understanding Mechanical Turk as a participant pool. Current Directions in Psychology Science, 23, 184–188.CrossRefGoogle Scholar
  37. Paolacci, G., Chandler, J., & Ipeirotis, P. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411–419.Google Scholar
  38. Peer, E., Vosgerau, J., & Acquisti, A. (2014). Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior Research Methods, 46, 1023–1031.CrossRefPubMedGoogle Scholar
  39. Podsakoff, P. M., MacKenzie, S. B., Lee, J.-Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879–903.CrossRefPubMedGoogle Scholar
  40. Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63, 539–569.CrossRefPubMedGoogle Scholar
  41. Pollack, J., & Aguinis, H. (2013). 2013 JCR journal rankings. Retrieved from https://drive.google.com/file/d/0B68LcC5lXuedZmpXSWFvcTZNck0/edit
  42. Ran, S., Liu, M., Marchiondo, L. A., & Huang, J. L. (2015). Difference in response effort across sample types: Perception or reality? Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 202–208.CrossRefGoogle Scholar
  43. Roulin, N. (2015). Don’t throw the baby out with the bathwater: Comparing data quality of crowdsourcing, online panels, and student samples. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 190–196.CrossRefGoogle Scholar
  44. Rouse, S. V. (2015). A reliability analysis of Mechanical Turk data. Computers in Human Behavior, 43, 304–307.CrossRefGoogle Scholar
  45. Schmidt, G. B. (2015). Fifty days as an MTurk worker: The social and motivational context for Amazon Mechanical Turk Workers. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 165–171.CrossRefGoogle Scholar
  46. Schmitt, N., & Stults, D. M. (1985). Factors defined by negatively keyed items: The result of careless respondents? Applied Psychological Measurement, 9, 367–373.CrossRefGoogle Scholar
  47. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Wadsworth Cengage Learning.Google Scholar
  48. Shapiro, D. N., Chandler, J., & Mueller, P. A. (2013). Using Mechanical Turk to study clinical populations. Clinical Psychological Science, 1, 213–220.CrossRefGoogle Scholar
  49. Smith, N. A., Sabat, I. E., Martinez, L. R., Weaver, K., & Xu, S. (2015). A convenient solution: Using MTurk to sample from hard-to-reach populations. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 220–228.CrossRefGoogle Scholar
  50. Spector, P. E. (2006). Method variance in organizational research. Organizational Research Methods, 9, 221–232.CrossRefGoogle Scholar
  51. Sprouse, J. (2011). A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior Research Methods, 43, 155–167.CrossRefPubMedGoogle Scholar
  52. Stewart, N., Ungemach, C., Harris, A. J., Bartels, D. M., Newell, B. R., Paolacci, G., & Chandler, J. (2015). The average laboratory samples a population of 7300 Amazon Mechanical Turk workers. Judgment and Decision Making, 10, 479–491.Google Scholar
  53. Stone-Romero, E. F. (2011). Research strategies in industrial and organizational psychology: Nonexperimental, quasi-experimental, and randomized experimental research in special purpose and nonspecial purpose settings. In S. Zedeck (Ed.), APA handbook of industrial and organizational psychology (Vol. 1, pp. 37–72). Building and developing the organization Washington, DC: American Psychological Association.Google Scholar
  54. Welcome to Requester Help. (n.d.). Retrieved from http://requester.mturk.com/help
  55. Woo, S. E., Keith, M., & Thornton, M. A. (2015). Amazon Mechanical Turk for industrial and organizational psychology: Advantages, challenges and practical recommendations. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 171–178.CrossRefGoogle Scholar
  56. Woods, C. M. (2006). Careless responding to reverse-worded items: Implications for confirmatory factor analysis. Journal of Psychopathology and Behavioral Assessment, 28, 186–191.CrossRefGoogle Scholar
  57. Zhu, X., Barnes-Farrell, J. L., & Dalal, D. K. (2015). Stop apologizing for your samples, start embracing them. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 228–232.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Janelle H. Cheung
    • 1
  • Deanna K. Burns
    • 1
  • Robert R. Sinclair
    • 1
  • Michael Sliter
    • 2
  1. 1.Department of PsychologyClemson UniversityClemsonUSA
  2. 2.FurstPersonChicagoUSA

Personalised recommendations