Methods to detect low quality data and its implication for psychological research

  • Erin M. Buchanan
  • John E. Scofield


Web-based data collection methods such as Amazon’s Mechanical Turk (AMT) are an appealing option to recruit participants quickly and cheaply for psychological research. While concerns regarding data quality have emerged with AMT, several studies have exhibited that data collected via AMT are as reliable as traditional college samples and are often more diverse and representative of noncollege populations. The development of methods to screen for low quality data, however, has been less explored. Omitting participants based on simple screening methods in isolation, such as response time or attention checks may not be adequate identification methods, with an inability to delineate between high or low effort participants. Additionally, problematic survey responses may arise from survey automation techniques such as survey bots or automated form fillers. The current project developed low quality data detection methods while overcoming previous screening limitations. Multiple checks were employed, such as page response times, distribution of survey responses, the number of utilized choices from a given range of scale options, click counts, and manipulation checks. This method was tested on a survey taken with an easily available plug-in survey bot, as well as compared to data collected by human participants providing both high effort and randomized, or low effort, answers. Identified cases can then be used as part of sensitivity analyses to warrant exclusion from further analyses. This algorithm can be a promising tool to identify low quality or automated data via AMT or other online data collection platforms.


Amazon Mechanical Turk Survey automation Participant screening Data quality 


  1. Aiena, B. J., Baczwaski, B. J., & Schulenberg, S. E. (2014). Measuring resilience with the RS14: A tale of two samples. Journal of Personality Assessment, 97(3), 291–300. CrossRefPubMedGoogle Scholar
  2. Berinsky, A. J., Huber, G. A., & Lenz, G. S. (2012). Evaluating online labor markets for experimental research:’s Mechanical Turk. Political Analysis, 20(3), 351–368. CrossRefGoogle Scholar
  3. Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data?. Perspectives on Psychological Science, 6(1), 3–5. CrossRefPubMedGoogle Scholar
  4. Buchanan, E. M., Valentine, K. D., & Scofield, J. E. (2017). MOTE. Retrieved from
  5. Casler, K., Bickel, L., & Hackett, E. (2013). Separate but equal? A comparison of participants and data gathered via Amazon’s MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior, 29 (6), 2156–2160. CrossRefGoogle Scholar
  6. Chandler, J. J., & Paolacci, G. (2017). Lie for a dime: When most prescreening responses are honest but most study participants are imposters. Social Psychological and Personality Science, 194855061769820.
  7. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge.Google Scholar
  8. Cumming, G. (2013). The new statistics. Psychological Science, 25(1), 7–29. CrossRefPubMedGoogle Scholar
  9. Downs, J. S., Holbrook, M. B., Sheng, S., & Cranor, L. F. (2010). Are your participants gaming the system? Screening Mechanical Turk workers. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2399–2402.
  10. Felstiner, A. (2011). Working the crowd: Employment and labor law in the crowdsourcing industry. Berkeley Journal of Employment & Labor Law, 32(1), 143–204. Google Scholar
  11. Fort, K., Adda, G., & Cohen, K. B. (2011). Amazon Mechanical Turk: Gold mine or coal mine?. Computational Linguistics, 37(2), 413–420. CrossRefGoogle Scholar
  12. Goodman, J. K., Cryder, C. E., & Cheema, A. (2012). Inside the Turk: Methodological concerns and solutions in Mechanical Turk experimentation. Advances in Consumer Research, 40, 112– 117.Google Scholar
  13. Gosling, S. D., Vazire, S., Srivastava, S., & John, O. P. (2004). Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. American Psychologist, 59(2), 93–104. CrossRefPubMedGoogle Scholar
  14. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world?. Behavioral and Brain Sciences, 33(2-3), 61–83. CrossRefPubMedGoogle Scholar
  15. Ipeirotis, P. G. (2010). Analyzing the Amazon Mechanical Turk marketplace. The ACM Magazine for Students, 17(2), 16–21. CrossRefGoogle Scholar
  16. Krantz, J. H., & Dalal, R. (2000). Validity of web-based psychological research. In Psychological experiments on the internet (pp. 35–60): Elsevier.
  17. Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs, Frontiers in Psychology, 4.
  18. Lawrence, M. A. (2016). Ez: Easy analysis and visualization of factorial experiments. Retrieved from
  19. Mason, W. A., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods, 44(1), 1–23. CrossRefPubMedGoogle Scholar
  20. Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: Measures of effect size for some common research designs. Psychological Methods, 8(4), 434–447. CrossRefPubMedGoogle Scholar
  21. Paolacci, G., & Chandler, J. J. (2014). Inside the Turk. Current Directions in Psychological Science, 23(3), 184–188. CrossRefGoogle Scholar
  22. Paolacci, G., Chandler, J. J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5(5), 411–419. Google Scholar
  23. Sorokin, A., & Forsyth, D. (2008). Utility data annotaton with Amazon Mechanical Turk. Proceedings of the 1st IEEE Workshop on Internet Vision at CVPR, 08(c), 1–8. Google Scholar
  24. Stieger, S., & Reips, U. -D. (2010). What are participants doing while filling in an online questionnaire: A paradata collection tool and an empirical study. Computers in Human Behavior, 26(6), 1488–1495. CrossRefGoogle Scholar
  25. Suri, S., Goldstein, D. G., & Mason, W. A. (2011). Honesty in an online labor market. In Paper presented at the 3rd Association for the Advancement of Artificial Intelligence Human Computation Workshop, San Francisco, CA.Google Scholar
  26. Trauzettel-Klosinski, S., & Dietz, K. (2012). Standardized assessment of reading performance: The new international reading speed texts IReST. Investigative Ophthalmology and Visual Science, 53(9), 5452–5461. CrossRefPubMedGoogle Scholar
  27. van den Berg, G. J., Lindeboom, M., & Dolton, P. J. (2006). Survey non-response and the duration of unemployment. Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(3), 585–604. CrossRefGoogle Scholar
  28. van’t Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in social psychology—A discussion and suggested template. Journal of Experimental Social Psychology, 67, 2–12. CrossRefGoogle Scholar
  29. Wagnild, G. M. (2009). A review of the resilience scale. Journal of Nursing Measurement, 17(2), 105–113. CrossRefPubMedGoogle Scholar
  30. Wagnild, G. M., & Young, H. M. (1993). Development and psychometric evaluation of the resilience scale. Journal of Nursing Measurement, 1(2), 165–178.PubMedGoogle Scholar
  31. Zhu, D., & Carterette, B. (2010). An analysis of assessor behavior in crowdsourced preference judgments. In M. Lease, V. Carvalho, & E. Yilmaz (Eds.) Proceedings of the ACM SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation. Geneva, Switzerland.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  1. 1.Department of PsychologyMissouri State UniversitySpringfieldUSA
  2. 2.Department of Psychological SciencesUniversity of MissouriColumbiaUSA

Personalised recommendations