Advertisement

Methods to detect low quality data and its implication for psychological research

  • Erin M. Buchanan
  • John E. Scofield
Article

Abstract

Web-based data collection methods such as Amazon’s Mechanical Turk (AMT) are an appealing option to recruit participants quickly and cheaply for psychological research. While concerns regarding data quality have emerged with AMT, several studies have exhibited that data collected via AMT are as reliable as traditional college samples and are often more diverse and representative of noncollege populations. The development of methods to screen for low quality data, however, has been less explored. Omitting participants based on simple screening methods in isolation, such as response time or attention checks may not be adequate identification methods, with an inability to delineate between high or low effort participants. Additionally, problematic survey responses may arise from survey automation techniques such as survey bots or automated form fillers. The current project developed low quality data detection methods while overcoming previous screening limitations. Multiple checks were employed, such as page response times, distribution of survey responses, the number of utilized choices from a given range of scale options, click counts, and manipulation checks. This method was tested on a survey taken with an easily available plug-in survey bot, as well as compared to data collected by human participants providing both high effort and randomized, or low effort, answers. Identified cases can then be used as part of sensitivity analyses to warrant exclusion from further analyses. This algorithm can be a promising tool to identify low quality or automated data via AMT or other online data collection platforms.

Keywords

Amazon Mechanical Turk Survey automation Participant screening Data quality 

References

  1. Aiena, B. J., Baczwaski, B. J., & Schulenberg, S. E. (2014). Measuring resilience with the RS14: A tale of two samples. Journal of Personality Assessment, 97(3), 291–300.  https://doi.org/10.1080/00223891.2014.951445 CrossRefPubMedGoogle Scholar
  2. Berinsky, A. J., Huber, G. A., & Lenz, G. S. (2012). Evaluating online labor markets for experimental research: Amazon.com’s Mechanical Turk. Political Analysis, 20(3), 351–368.  https://doi.org/10.1093/pan/mpr057 CrossRefGoogle Scholar
  3. Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data?. Perspectives on Psychological Science, 6(1), 3–5.  https://doi.org/10.1177/1745691610393980 CrossRefPubMedGoogle Scholar
  4. Buchanan, E. M., Valentine, K. D., & Scofield, J. E. (2017). MOTE. Retrieved from https://github.com/doomlab/MOTE.
  5. Casler, K., Bickel, L., & Hackett, E. (2013). Separate but equal? A comparison of participants and data gathered via Amazon’s MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior, 29 (6), 2156–2160.  https://doi.org/10.1016/j.chb.2013.05.009 CrossRefGoogle Scholar
  6. Chandler, J. J., & Paolacci, G. (2017). Lie for a dime: When most prescreening responses are honest but most study participants are imposters. Social Psychological and Personality Science, 194855061769820.  https://doi.org/10.1177/1948550617698203
  7. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge.Google Scholar
  8. Cumming, G. (2013). The new statistics. Psychological Science, 25(1), 7–29.  https://doi.org/10.1177/0956797613504966 CrossRefPubMedGoogle Scholar
  9. Downs, J. S., Holbrook, M. B., Sheng, S., & Cranor, L. F. (2010). Are your participants gaming the system? Screening Mechanical Turk workers. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2399–2402.  https://doi.org/10.1145/1753326.1753688
  10. Felstiner, A. (2011). Working the crowd: Employment and labor law in the crowdsourcing industry. Berkeley Journal of Employment & Labor Law, 32(1), 143–204.  https://doi.org/10.15779/Z38Z92X Google Scholar
  11. Fort, K., Adda, G., & Cohen, K. B. (2011). Amazon Mechanical Turk: Gold mine or coal mine?. Computational Linguistics, 37(2), 413–420.  https://doi.org/10.1162/COLI_a_00057 CrossRefGoogle Scholar
  12. Goodman, J. K., Cryder, C. E., & Cheema, A. (2012). Inside the Turk: Methodological concerns and solutions in Mechanical Turk experimentation. Advances in Consumer Research, 40, 112– 117.Google Scholar
  13. Gosling, S. D., Vazire, S., Srivastava, S., & John, O. P. (2004). Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. American Psychologist, 59(2), 93–104.  https://doi.org/10.1037/0003-066x.59.2.93 CrossRefPubMedGoogle Scholar
  14. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world?. Behavioral and Brain Sciences, 33(2-3), 61–83.  https://doi.org/10.1017/S0140525X0999152X CrossRefPubMedGoogle Scholar
  15. Ipeirotis, P. G. (2010). Analyzing the Amazon Mechanical Turk marketplace. The ACM Magazine for Students, 17(2), 16–21.  https://doi.org/10.1145/1869086.1869094 CrossRefGoogle Scholar
  16. Krantz, J. H., & Dalal, R. (2000). Validity of web-based psychological research. In Psychological experiments on the internet (pp. 35–60): Elsevier.  https://doi.org/10.1016/b978-012099980-4/50003-4
  17. Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs, Frontiers in Psychology, 4.  https://doi.org/10.3389/fpsyg.2013.00863
  18. Lawrence, M. A. (2016). Ez: Easy analysis and visualization of factorial experiments. Retrieved from https://CRAN.R-project.org/package=ez
  19. Mason, W. A., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods, 44(1), 1–23.  https://doi.org/10.3758/s13428-011-0124-6 CrossRefPubMedGoogle Scholar
  20. Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: Measures of effect size for some common research designs. Psychological Methods, 8(4), 434–447.  https://doi.org/10.1037/1082-989X.8.4.434 CrossRefPubMedGoogle Scholar
  21. Paolacci, G., & Chandler, J. J. (2014). Inside the Turk. Current Directions in Psychological Science, 23(3), 184–188.  https://doi.org/10.1177/0963721414531598 CrossRefGoogle Scholar
  22. Paolacci, G., Chandler, J. J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5(5), 411–419.  https://doi.org/10.2139/ssrn.1626226 Google Scholar
  23. Sorokin, A., & Forsyth, D. (2008). Utility data annotaton with Amazon Mechanical Turk. Proceedings of the 1st IEEE Workshop on Internet Vision at CVPR, 08(c), 1–8.  https://doi.org/10.1109/CVPRW.2008.4562953 Google Scholar
  24. Stieger, S., & Reips, U. -D. (2010). What are participants doing while filling in an online questionnaire: A paradata collection tool and an empirical study. Computers in Human Behavior, 26(6), 1488–1495.  https://doi.org/10.1016/j.chb.2010.05.013 CrossRefGoogle Scholar
  25. Suri, S., Goldstein, D. G., & Mason, W. A. (2011). Honesty in an online labor market. In Paper presented at the 3rd Association for the Advancement of Artificial Intelligence Human Computation Workshop, San Francisco, CA.Google Scholar
  26. Trauzettel-Klosinski, S., & Dietz, K. (2012). Standardized assessment of reading performance: The new international reading speed texts IReST. Investigative Ophthalmology and Visual Science, 53(9), 5452–5461.  https://doi.org/10.1167/iovs.11-8284 CrossRefPubMedGoogle Scholar
  27. van den Berg, G. J., Lindeboom, M., & Dolton, P. J. (2006). Survey non-response and the duration of unemployment. Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(3), 585–604.  https://doi.org/10.1111/j.1467-985x.2006.00422.x CrossRefGoogle Scholar
  28. van’t Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in social psychology—A discussion and suggested template. Journal of Experimental Social Psychology, 67, 2–12.  https://doi.org/10.1016/j.jesp.2016.03.004 CrossRefGoogle Scholar
  29. Wagnild, G. M. (2009). A review of the resilience scale. Journal of Nursing Measurement, 17(2), 105–113.  https://doi.org/10.1891/1061-3749.17.2.105 CrossRefPubMedGoogle Scholar
  30. Wagnild, G. M., & Young, H. M. (1993). Development and psychometric evaluation of the resilience scale. Journal of Nursing Measurement, 1(2), 165–178.PubMedGoogle Scholar
  31. Zhu, D., & Carterette, B. (2010). An analysis of assessor behavior in crowdsourced preference judgments. In M. Lease, V. Carvalho, & E. Yilmaz (Eds.) Proceedings of the ACM SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation. Geneva, Switzerland.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  1. 1.Department of PsychologyMissouri State UniversitySpringfieldUSA
  2. 2.Department of Psychological SciencesUniversity of MissouriColumbiaUSA

Personalised recommendations