Methods to detect low quality data and its implication for psychological research
Web-based data collection methods such as Amazon’s Mechanical Turk (AMT) are an appealing option to recruit participants quickly and cheaply for psychological research. While concerns regarding data quality have emerged with AMT, several studies have exhibited that data collected via AMT are as reliable as traditional college samples and are often more diverse and representative of noncollege populations. The development of methods to screen for low quality data, however, has been less explored. Omitting participants based on simple screening methods in isolation, such as response time or attention checks may not be adequate identification methods, with an inability to delineate between high or low effort participants. Additionally, problematic survey responses may arise from survey automation techniques such as survey bots or automated form fillers. The current project developed low quality data detection methods while overcoming previous screening limitations. Multiple checks were employed, such as page response times, distribution of survey responses, the number of utilized choices from a given range of scale options, click counts, and manipulation checks. This method was tested on a survey taken with an easily available plug-in survey bot, as well as compared to data collected by human participants providing both high effort and randomized, or low effort, answers. Identified cases can then be used as part of sensitivity analyses to warrant exclusion from further analyses. This algorithm can be a promising tool to identify low quality or automated data via AMT or other online data collection platforms.
KeywordsAmazon Mechanical Turk Survey automation Participant screening Data quality
- Buchanan, E. M., Valentine, K. D., & Scofield, J. E. (2017). MOTE. Retrieved from https://github.com/doomlab/MOTE.
- Chandler, J. J., & Paolacci, G. (2017). Lie for a dime: When most prescreening responses are honest but most study participants are imposters. Social Psychological and Personality Science, 194855061769820. https://doi.org/10.1177/1948550617698203
- Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge.Google Scholar
- Downs, J. S., Holbrook, M. B., Sheng, S., & Cranor, L. F. (2010). Are your participants gaming the system? Screening Mechanical Turk workers. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2399–2402. https://doi.org/10.1145/1753326.1753688
- Goodman, J. K., Cryder, C. E., & Cheema, A. (2012). Inside the Turk: Methodological concerns and solutions in Mechanical Turk experimentation. Advances in Consumer Research, 40, 112– 117.Google Scholar
- Krantz, J. H., & Dalal, R. (2000). Validity of web-based psychological research. In Psychological experiments on the internet (pp. 35–60): Elsevier. https://doi.org/10.1016/b978-012099980-4/50003-4
- Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs, Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00863
- Lawrence, M. A. (2016). Ez: Easy analysis and visualization of factorial experiments. Retrieved from https://CRAN.R-project.org/package=ez
- Suri, S., Goldstein, D. G., & Mason, W. A. (2011). Honesty in an online labor market. In Paper presented at the 3rd Association for the Advancement of Artificial Intelligence Human Computation Workshop, San Francisco, CA.Google Scholar
- Zhu, D., & Carterette, B. (2010). An analysis of assessor behavior in crowdsourced preference judgments. In M. Lease, V. Carvalho, & E. Yilmaz (Eds.) Proceedings of the ACM SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation. Geneva, Switzerland.Google Scholar