Skip to main content
Log in

Are we measuring the same health constructs? Amazon’s Mechanical Turk versus a community sample

  • Published:
Current Psychology Aims and scope Submit manuscript

Abstract

Amazon’s Mechanical Turk (MTurk) platform has increasingly gained popularity because of its affordability and efficiency. The results of studies comparing MTurk respondents to community respondents have been mixed. The purpose of the present study was to compare an MTurk and a community sample to determine whether the psychometric properties of a measure completed in the two different formats were comparable. There were 957 MTurk participants and 837 from the community sample, with approximately equal numbers of males and females. Participants were asked to read a scenario depicting a family with a sick child, and then to complete a questionnaire that measured their perceived likelihood of hiring a Health Care Advocate (HCA). The results indicated some demographic differences between MTurk and community participants. There was an effect of medical condition in the MTurk sample, such that participants were more likely to perceive hiring an HCA for a child with leukemia than cystic fibrosis (p = .008). However, in the community sample, there was an effect of conception difficulty where participants were more likely to perceive hiring an HCA for a child who took 2 months to conceive than 5 years to conceive (p = .012). Despite some psychometric similarities between the two samples, there were some differences in the constructs measured in the two samples. Future researchers should continue to evaluate the reliability and validity of paper-and-pencil measurements for online administration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • American Psychological Association. (2017). Ethical principles of psychologists and code of conduct. (2002, amended effective June 1, 2010, and January 1, 2017). http://www.apa.org/ethics/code/index.aspx

  • Antoun, C., Zhang, C., Conrad, F. G., & Schober, M. F. (2015). Comparisons of online recruitment strategies for convenience samples: Craigslist, Google AdWords, Facebook, and Amazon mechanical Turk. Field Methods, 28(3), 231–246. https://doi.org/10.1177/1525822X15603149.

    Article  Google Scholar 

  • Aruguete, M. S., Huynh, H., Browne, B. L., Jurs, B., Flint, E., & McCutcheon, L. E. (2019). How serious is the ‘carelessness’ problem on mechanical Turk? International Journal of Social Research Methodology, 22(5), 441–449.

    Article  Google Scholar 

  • Balboa Park. (n.d.) Advertising & Sponsorship. Retrieved March 28, 2020, from https://www.balboapark.org/about/sponsor-ads

  • Barak, A. (2011). Internet-based psychological testing and assessment. In Online Counseling (pp. 225-255). Elsevier.

  • Bartneck, C., Duenser, A., Moltchanova, E., & Zawieska, K. (2015). Comparing the similarity of responses received from studies in Amazon's mechanical Turk to studies conducted online and with direct recruitment. PLoS One, 10(4), e0121595. https://doi.org/10.1371/journal.pone.0121595.

    Article  PubMed  PubMed Central  Google Scholar 

  • Beymer, M. R., Holloway, I. W., & Grov, C. (2018). Comparing self-reported demographic and sexual behavioral factors among men who have sex with men recruited through mechanical Turk, Qualtrics, and a HIV/STI clinic-based sample: Implications for researchers and providers. Archives of Sexual Behavior, 47(1), 133–142.

    Article  Google Scholar 

  • Brock, R. L., Barry, R. A., Lawrence, E., Dey, J., & Rolffs, J. (2012). Internet administration of paper-and-pencil questionnaires used in couple research: Assessing psychometric equivalence. Assessment, 19(2), 226–242.

    Article  Google Scholar 

  • Buchanan, T. (2002). Online assessment: Desirable or dangerous? Professional Psychology: Research and Practice, 33(2), 148–154.

    Article  Google Scholar 

  • Buchanan, T., Ali, T., Heffernan, T. M., Ling, J., Parrott, A. C., Rodgers, J., & Scholey, A. B. (2005). Nonequivalence of on-line and paper-and-pencil psychological tests: The case of the prospective memory questionnaire. Behavior Research Methods, 37(1), 148–154.

    Article  Google Scholar 

  • Buchanan, T., & Smith, J. L. (1999). Using the internet for psychological research: Personality testing on the world wide web. British Journal of Psychology, 90(1), 125–144.

    Article  Google Scholar 

  • Buhrmester, M. K., Kwang, T. T., & Gosling, S. D. (2011). Amazon's MechanicalTurk: A new source of inexpensive, yet high-quality. Perspectives on Psychological Science, 6, 3–5.

    Article  Google Scholar 

  • Chambers, S., Nimon, K., & Anthony-McMann, P. (2016). A primer for conducting survey research using MTurk: Tips for the field. International Journal of Adult Vocational Education and Technology (IJAVET), 7(2), 54–73.

    Article  Google Scholar 

  • Chandler, J., Sisso, I., & Shapiro, D. (2020). Participant carelessness and fraud: Consequences for clinical research and potential solutions. Journal of Abnormal Psychology, 129(1), 49–55.

    Article  Google Scholar 

  • Coles, M. E., Cook, L. M., & Blake, T. R. (2007). Assessing obsessive compulsive symptoms and cognitions on the internet: Evidence for the comparability of paper and internet administration. Behaviour Research and Therapy, 45(9), 2232–2240.

    Article  Google Scholar 

  • Davis, R. N. (1999). Web-based administration of a personality questionnaire: Comparison with traditional methods. Behavior Research Methods, Instruments, & Computers, 31(4), 572–577.

    Article  Google Scholar 

  • Difallah, D., Filatova, E., & Ipeirotis, P. (2018). Demographics and dynamics of mechanical Turk workers. In Proceedings of the eleventh ACM international conference on web search and data mining (pp. 135-143).

  • Follmer, D. J., Sperling, R. A., & Suen, H. K. (2017). The role of MTurk in education research: Advantages, issues, and future directions. Educational Researcher, 46(6), 329–334. https://doi.org/10.3102/0013189X17725519.

    Article  Google Scholar 

  • Goodman, J. K., Cryder, C. E., & Cheema, A. (2013). Data collection in a flat world: The strengths and weaknesses of mechanical Turk samples. Journal of Behavioral Decision Making, 26(3), 213–224.

    Article  Google Scholar 

  • Hauser, D., Paolacci, G., & Chandler, J. J. (2018). Common concerns with MTurk as a participant pool: Evidence and solutions.

  • Hertel, G., Naumann, S., Konradt, U., & Batinic, B. (2002). Personality assessment via internet: Comparing online and paper-and-pencil questionnaires. Online social sciences, 115-133.

  • Huff, C., & Tingley, D. (2015). "Who are these people?" Evaluating the demographic characteristics and political preferences of MTurk survey respondents. Research & Politics, 2. https://doi.org/10.1177/2053168015604648.

  • Janvier, A., Leblanc, I., & Barrington, K. J. (2008). Nobody likes premies: The relative value of patients’ lives. Journal of Perinatology, 28(12), 821–826.

    Article  Google Scholar 

  • Kuang, J., Argo, L., Stoddard, G., Bray, B. E., & Zeng-Treitler, Q. (2015). Assessing pictograph recognition: A comparison of crowdsourcing and traditional survey approaches. Journal of Medical Internet Research, 17(12), e281.

    Article  Google Scholar 

  • Levay, K. E., Freese, J., & Druckman, J. N. (2016). The demographic and political composition of mechanical Turk samples. SAGE Open, 6(1), 2158244016636433. https://doi.org/10.1177/2158244016636433.

    Article  Google Scholar 

  • Luce, K. H., Winzelberg, A. J., Das, S., Osborne, M. I., Bryson, S. W., & Taylor, C. B. (2007). Reliability of self-report: Paper versus online administration. Computers in Human Behavior, 23(3), 1384–1389.

    Article  Google Scholar 

  • Lynch, C. D. (2011). How long does it take the average couple to get pregnant? A systematic review of what we know. Fertility and Sterility, 96(3), S115.

    Article  Google Scholar 

  • McCredie, M. N., & Morey, L. C. (2019). Who are the Turkers? A characterization of MTurk workers using the personality assessment inventory. Assessment, 26(5), 759–766.

    Article  Google Scholar 

  • Meyerson, P., & Tryon, W. W. (2003). Validating internet research: A test of the psychometric equivalence of internet and in-person samples. Behavior Research Methods, Instruments, & Computers, 35(4), 614–620.

    Article  Google Scholar 

  • Mortensen, K., & Hughes, T. L. (2018). Comparing Amazon’s mechanical Turk platform to conventional data collection methods in the health and medical research literature. Journal of General Internal Medicine, 33(4), 533–538. https://doi.org/10.1007/s11606-017-4246-0.

    Article  PubMed  PubMed Central  Google Scholar 

  • Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon mechanical Turk. Judgment and Decision making, 5(5), 411–419.

    Google Scholar 

  • Peer, E., Vosgerau, J., & Acquisti, A. (2014). Reputation as a sufficient condition for data quality on Amazon mechanical Turk. Behavior Research Methods, 46(4), 1023–1031. https://doi.org/10.3758/s13428-013-0434-y.

    Article  PubMed  Google Scholar 

  • Riva, G., Teruzzi, T., & Anolli, L. (2003). The use of the internet in psychological research: Comparison of online and offline questionnaires. Cyberpsychology & Behavior, 6(1), 73–80.

    Article  Google Scholar 

  • Robinson, J., Rosenzweig, C., Moss, A. J., & Litman, L. (2019). Tapped out or barely tapped? Recommendations for how to harness the vast and largely unused potential of the mechanical Turk participant pool. PLoS One, 14(12), e0226394.

    Article  Google Scholar 

  • Stanton, J. M. (1998). An empirical assessment of data collection using the internet. Personnel Psychology, 51(3), 709–725.

    Article  Google Scholar 

  • Tseng, H.-M., Macleod, H. A., & Wright, P. (1997). Computer anxiety and measurement of mood change. Computers in Human Behavior, 13(3), 305–316.

    Article  Google Scholar 

  • United States Census Bureau, U. S. C (n.d.). U.S. and world population clock. U.S. Department of Commerce.

  • Vasserman-Stokes, E. A., Cronan, T. A., & Sadler, M. S. (2012). Factors that influence the likelihood of hiring a health care advocate for a chronically ill child. Journal of Pediatric Health Care, 26(1), 27–36.

    Article  Google Scholar 

  • Walters, K., Christakis, D. A., & Wright, D. R. (2018). Are mechanical Turk worker samples representative of health status and health behaviors in the U.S. PLOS ONE, 13(6), e0198835. https://doi.org/10.1371/journal.pone.0198835.

    Article  PubMed  PubMed Central  Google Scholar 

  • Yank, V., Agarwal, S., Loftus, P., Asch, S., & Rehkopf, D. (2017). Crowdsourced health data: Comparability to a US National Survey, 2013–2015. American Journal of Public Health, 107(8), 1283–1289.

    Article  Google Scholar 

Download references

Acknowledgments

We would like to acknowledge Kai Givogue for his assistance in transferring the methodology from the community study to the MTurk study and in conducting literature reviews.

Funding

Research reported in this publication was supported in part by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R25GM058906. The content is solely the responsibility of the authors, and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Terry A. Cronan.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Data Sharing and Data Accessibility

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thompson, L.M., Van Liew, C., Patrus, A. et al. Are we measuring the same health constructs? Amazon’s Mechanical Turk versus a community sample. Curr Psychol 41, 6700–6711 (2022). https://doi.org/10.1007/s12144-020-01176-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12144-020-01176-3

Keywords

Navigation