Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers
- 4.7k Downloads
Crowdsourcing services—particularly Amazon Mechanical Turk—have made it easy for behavioral scientists to recruit research participants. However, researchers have overlooked crucial differences between crowdsourcing and traditional recruitment methods that provide unique opportunities and challenges. We show that crowdsourced workers are likely to participate across multiple related experiments and that researchers are overzealous in the exclusion of research participants. We describe how both of these problems can be avoided using advanced interface features that also allow prescreening and longitudinal data collection. Using these techniques can minimize the effects of previously ignored drawbacks and expand the scope of crowdsourcing as a tool for psychological research.
KeywordsCrowdsourcing Internet research Data quality Longitudinal research Mechanical Turk MTurk
Jesse Chandler, Postdoctoral Research Associate, Woodrow Wilson School of Public Policy, Princeton University (firstname.lastname@example.org), Pam Mueller, Graduate Student, Department of Psychology, Princeton University (email@example.com); Gabriele Paolacci, Assistant Professor, Department of Marketing Management, Rotterdam School of Management, Erasmus University (firstname.lastname@example.org).
Jesse Chandler is now at PRIME Research, Ann Arbor, MI and The Institute for Social Research, University of Michigan.
The authors wish to thank John Myles White for help developing and testing the API syntax and Elizabeth Ingriselli for her help coding data.
Correspondence concerning this article can be addressed to any of the authors.
- Amazon Mechanical Turk Requester Tour. (n.d.). Retrieved from https://requester.mturk.com/tour
- Anderson, N. H. (1968). Likableness ratings of 555 personality-trait words. Journal of Personality and Social Psychology, 9(3), 272Google Scholar
- Chandler, J., Paolacci, G., & Mueller, P. (2013). Risks and rewards of crowdsourcing marketplaces. In P. Michelucci (Ed.) Handbook of Human Computation. New York: Sage.Google Scholar
- Chilton, L. B., Horton, J. J., Miller, R. C., & Azenkot, S. (2009). Task search in a human computation market. In Proceedings of the ACM SIGKDD workshop on human computation (pp. 1–9). In P. Bennett, R. Chandrasekar, M. Chickering, P. Ipeirotis, E. Law, A. Mityagin, F. Provost, & L. von Ahn (Eds.), HCOMP ’09: Proceedings of the ACM SIGKDD Workshop on Human Computation (77–85). New York: ACM. doi: 10.1145/1837885.1837889 Google Scholar
- Cooper, S., Khatib, F., Treuille, A., Barbero, J., Lee, J., Beenan, M., . . . Foldit Players (2010). Predicting protein structures with a multilayer online game. Nature, 466, 756–760. doi: 10.1038/nature09304
- Downs, J. S., Holbrook, M., & Peel, E. (2012). Screening Participants on Mechanical Turk: Techniques and Justifications. Vancouver: Paper presented at the annual conference of the Association for Consumer Research. October 2012.Google Scholar
- Downs, J. S., Holbrook, M. B., Sheng, S., & Cranor, L. F. (2010). Are your participants gaming the system? Screening Mechanical Turk workers. In Proceedings of the 28th international conference on Human factors in computing systems (pp. 2399–2402). New York: ACM. doi: 10.1145/1753326.1753688 Google Scholar
- Edlund, J. E., Sagarin, B. J., Skowronski, J. J., Johnson, S. J., & Kutter, J. (2009). Whatever happens in the laboratory stays in the laboratory: The prevalence and prevention of participant crosstalk. Personality and Social Psychology Bulletin, 35, 635–642. doi: 10.1177/0146167208331255 PubMedCrossRefGoogle Scholar
- Fiske, S. T., & Taylor, S. E. (1984). Social cognition. New York: Random HouseGoogle Scholar
- Ellsworth, P. C., & Gonzalez, R. (2003). Questions and comparisons: Methods of research in social psychology. In M. Hogg & J. Cooper (Eds.), The Sage Handbook of Social Psychology (pp. 24–42). London: Sage Publications, Ltd.Google Scholar
- Goldin, G., Darlow, A. (2013). TurkGate (Version 0.4.0) [Software]. Available from, http://gideongoldin.github.com/TurkGate/
- Goodman, J. K., Cryder, C. E., & Cheema, A. (2012). Data Collection in a Flat World: The Strengths and Weaknesses of Mechanical Turk Samples. Journal of Behavioral Decision Making.Google Scholar
- Ipeirotis, P. (2010). Demographics of Mechanical Turk. CeDER-10–01 working paper, New York University.Google Scholar
- Kittur, A., Chi, E. H., & Suh, B. (2008). Crowdsourcing user studies with Mechanical Turk. In Proceedings of the ACM conference on human factors in computing systems (pp. 453–456). New York: ACM.Google Scholar
- Lintott, C. J., Schawinski, K., Slosar, A., Land, K., Bamford, S., Thomas, D., . . . Vandenberg, J. (2008). Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society, 389(3), 1179-1189Google Scholar
- Mata, A., Fiedler, K., Ferreira, M. B., & Almeida, T. (2013). Reasoning about others’ reasoning. Journal of Experimental Social Psychology.Google Scholar
- Mueller, P., & Chandler, J. (2012). Emailing Workers Using Python (March 3, 2012). Available at SSRN: http://ssrn.com/abstract=2100601
- Munson, S. A., & Resnick, P. (2010). Presenting diverse political opinions: How and how much. In E. Mynatt, G. Fitzpatrick, S. Hudson, K. Edwards, & T. Rodden (Eds.), Proceedings of the 28th International Conference on Human Factors in Computing Systems (pp. 1457–1466). New York: Association for Computing Machinery. doi: 10.1145/1753326.1753543 Google Scholar
- Paolacci, G., Chandler, J., & Ipeirotis, P. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411–419.Google Scholar
- Peer, E., Paolacci, G., Chandler, J., & Mueller, P. (2012). Selectively Recruiting Participants from Amazon Mechanical Turk Using Qualtrics (May 2, 2012). Available at SSRN: http://ssrn.com/abstract=2100631
- Ribisl, K. M., Walton, M. A., Mowbray, C. T., Luke, D. A., Davidson, W. S., & Bootsmiller, B. J. (1999). Minimizing participant attrition in panel studies through the use of effective retention and tracking strategies: Review and recommendations. Evaluation and Program Planning, 19, 1–25. doi: 10.1016/0149-7189(95)00037-2 CrossRefGoogle Scholar
- Shapiro, D. N., Chandler, J. J., & Mueller, P. A. (2013). Using Mechanical Turk to Study Clinical and Subclinical Populations.Google Scholar
- Summerville, A., & Chartier, C. R. (2012). Pseudo-dyadic “interaction” on Amazon’s Mechanical Turk. Behavior Research Methods, 1-9. doi: 10.3758/s13428-012-0250-9