Advertisement

Behavior Research Methods

, Volume 46, Issue 1, pp 95–111 | Cite as

Collecting response times using Amazon Mechanical Turk and Adobe Flash

  • Travis Simcox
  • Julie A. Fiez
Article

Abstract

Crowdsourcing systems like Amazon’s Mechanical Turk (AMT) allow data to be collected from a large sample of people in a short amount of time. This use has garnered considerable interest from behavioral scientists. So far, most experiments conducted on AMT have focused on survey-type instruments because of difficulties inherent in running many experimental paradigms over the Internet. This study investigated the viability of presenting stimuli and collecting response times using Adobe Flash to run ActionScript 3 code in conjunction with AMT. First, the timing properties of Adobe Flash were investigated using a phototransistor and two desktop computers running under several conditions mimicking those that may be present in research using AMT. This experiment revealed some strengths and weaknesses of the timing capabilities of this method. Next, a flanker task and a lexical decision task implemented in Adobe Flash were administered to participants recruited with AMT. The expected effects in these tasks were replicated. Power analyses were conducted to describe the number of participants needed to replicate these effects. A questionnaire was used to investigate previously undescribed computer use habits of 100 participants on AMT. We conclude that a Flash program in conjunction with AMT can be successfully used for running many experimental paradigms that rely on response times, although experimenters must understand the limitations of the method.

Keywords

Response times Crowdsourcing Amazon Mechanical Turk Adobe flash ActionScript Stimulus presentation Web experiment Rich media Timing 

Notes

Author Note

Travis Simcox, Department of Psychology, University of Pittsburgh; The Center for the Neural Basis of Cognition, Pittsburgh; Learning Research and Development Center, University of Pittsburgh. Julie A. Fiez, Department of Psychology, University of Pittsburgh; The Center for Neuroscience, University of Pittsburgh; The Center for the Neural Basis of Cognition, Pittsburgh; Learning Research and Development Center, University of Pittsburgh.

This research was supported by NIH R01 HD060388 and NSF 0815945.

Supplementary material

13428_2013_345_MOESM1_ESM.gif (64 kb)
Supplemental table (GIF 64 kb)

References

  1. Adobe Developer Connection. (2012). Flash Platform Developer Center. http://www.adobe.com/devnet/flashplatform.html Retrieved 2012-08-11.
  2. AMT FAQ. (2012). https://requester.mturk.com/help/faq#examples_violations Retrieved 2012-08-11.
  3. Behrend, T. S., Sharek, D. J., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior Research Methods, 43, 800–813.PubMedCrossRefGoogle Scholar
  4. Berinsky, A. J., Huber, & Lenz, G. S. (2012). Evaluating online labor markets for experimental research: Amazon.com’s Mechanical Turk. Political Analysis, 20, 351–368.CrossRefGoogle Scholar
  5. Brand, A., & Bradley, M. T. (2012). Assessing the effects of technical variance on the statistical outcomes of web experiments measuring response times. Social Science Computer Review, 30, 350–357.CrossRefGoogle Scholar
  6. Buhrmester, M. D., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5.CrossRefGoogle Scholar
  7. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, New Jersey: Lawrence Erlbaum.Google Scholar
  8. Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.PubMedCrossRefGoogle Scholar
  9. Fowler, R. L. (1985). Point estimates and confidence intervals in measures of association. Quantitative Methods in Psychology, 98, 160–165.Google Scholar
  10. Goodman, J. K., Cryder, C. E., & Cheema, A. (in press). Data collection in a flat world: Strengths and weaknesses of mechanical Turk samples. Journal of Behavioral Decision Making.Google Scholar
  11. Grossman, G. & Huang, E. (2009). ActionScript 3.0 Overview. Adobe systems incorporated. Retrieved 2012-08-11 from http://www.adobe.com/devnet/actionscript/articles/actionscript3_overview.html
  12. Halberda, J., Ly, R., Wilmer, J., Naiman, D., & Germine, L. (2012). Number sense across the lifespan as revealed by a massive internet-based sample. Proceedings of the National Academy of Sciences, 109, 11116–11120.CrossRefGoogle Scholar
  13. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavior and Brain Sciences. Retrieved from: http://www2.psych.ubc.ca/~henrich/pdfs/WeirdPeople.pdf
  14. Hewson, C. M., Laurent, D., & Vogel, C. M. (1996). Proper methodologies for psychological and sociological studies conducted via the Internet. Behavior Research Methods, Instruments, & Computers, 28, 186–191.CrossRefGoogle Scholar
  15. Houben, K., & Wiers, R. W. (2008). Measuring implicit alcohol associations via the Internet: Validation of Web-based implicit association tests. Behavior Research Methods, 40, 1134–1143.PubMedCrossRefGoogle Scholar
  16. Ipeirotis, P. (2010). Demographics of Mechanical Turk. CeDER Working Papers, CeDER-10-01, New York University, Stern School of Business. Retrieved Aug 2012 from: http://hdl.handle.net/2451/29585
  17. Lupker, S. J., Perea, M., & Davis, C. M. (2008). Transposed-letter effects: Consonants, vowels and letter frequency. Language & Cognitive Processes, 23, 93–116.CrossRefGoogle Scholar
  18. Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods, 44, 1–23.PubMedCrossRefGoogle Scholar
  19. Mayo, C., Aubanel, V., & Cooke, M. (2012). Effect of prosodic changes on speech intelligibility. In Proc. Interspeech, Portland, OR, USA.Google Scholar
  20. McDonnell, J., Domingo, D., & Gureckis, T. (2012). Is Mechanical Turk the future of cognitive science research? Retrieved Aug 2012 from http://gureckislab.org/blog/?p=1297
  21. Meyer, D. E., Osman, A. M., Irwin, D. E., & Yantis, S. (1988). Modern mental chronometry. Biological Psychology, 26, 3–67.PubMedCrossRefGoogle Scholar
  22. Meyer, D. E., & Roger, W. S. (1971). Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations. Journal of Experimental Psychology, 90, 227–234.PubMedCrossRefGoogle Scholar
  23. Neath, I., Earle, A., Hallett, D., & Surprenant, A. M. (2011). Response time accuracy in Apple Macintosh computers. Behavior Research Methods, 43, 353–362.PubMedCrossRefGoogle Scholar
  24. Nieuwenhuis, S., Stins, J. F., Posthuma, D., Polderman, T. J., Boomsma, D. I., & de Geus, E. J. (2006). Accounting for sequential trial effects in the flanker task: Conflict adaptation or associative priming? Memory & Cognition, 34, 1260–1272.CrossRefGoogle Scholar
  25. Owen, A. M., Hampshire, A., Grahn, J. A., Stenton, R., Dajani, S., Burns, A. S., et al. (2010). Putting brain training to the test. Nature, 465, 775–778.PubMedCentralPubMedCrossRefGoogle Scholar
  26. Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411–419.Google Scholar
  27. Paré, D. E., & Cree, G. S. (2009). Web-based image norming: How do object familiarity and visual complexity ratings compare when collected in-lab versus online? Behavior Research Methods, 41, 699–704.PubMedCrossRefGoogle Scholar
  28. Plant, R., & Turner, G. (2009). Millisecond precision psychological research in a world of commodity computers: New hardware, new problems? Behavior Research Methods, 41, 598–614.PubMedCrossRefGoogle Scholar
  29. R Developmental Core Team. (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/
  30. Rand, D. G. (2012). The promise of Mechanical Turk: How online labor markets can help theorists run behavioral experiments. Journal of Theoretical Biology, 222, 172–179.CrossRefGoogle Scholar
  31. Reimers, S., & Stewart, N. (2007). Adobe Flash as a medium for online experimentation: A test of reaction time measurement capabilities. Behavior Research Methods, 39, 365–370.PubMedCrossRefGoogle Scholar
  32. Schmidt, W. C. (2001). Presentation accuracy of Web animation methods. Behavior Research Methods, Instruments, & Computers, 33, 187–200.CrossRefGoogle Scholar
  33. Simcox, T. (2012). [Compilation of user agent strings from Amazon Mechanical Turk workers]. Unpublished raw data.Google Scholar
  34. Smithson, M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement, 61, 605–632.CrossRefGoogle Scholar
  35. The Mechanical Turk Blog. (2012, August 28). Improving quality with qualifications – tips for API requesters [Web log post]. Retrieved from: http://mechanicalturk.typepad.com/blog/2012/08/requesters-consistently-tell-us-that-using-qualifications-is-one-of-the-most-effective-strategies-for-optimizing-the-quality.html
  36. Ulrich, R., & Giray, M. (1989). The resolution of clocks: Effects on reaction time measurement-Good news for bad clocks. British Journal of Mathematical and Statistical Psychology, 42, 1–12.CrossRefGoogle Scholar
  37. Woltman, G. (2012). Prime95. Retrieved from: http://www.mersenne.org/freesoft/
  38. Yu, C. H. (2003). Resampling methods: Concepts, applications, and justification. Practical Assessment, Research & Evaluation, 8. Retrieved from http://PAREonline.net/getvn.asp?v=8&n=19

Copyright information

© Psychonomic Society, Inc. 2013

Authors and Affiliations

  1. 1.Department of PsychologyUniversity of PittsburghPittsburghUSA

Personalised recommendations