Abstract
Amazon’s Mechanical Turk (MTurk) platform has increasingly gained popularity because of its affordability and efficiency. The results of studies comparing MTurk respondents to community respondents have been mixed. The purpose of the present study was to compare an MTurk and a community sample to determine whether the psychometric properties of a measure completed in the two different formats were comparable. There were 957 MTurk participants and 837 from the community sample, with approximately equal numbers of males and females. Participants were asked to read a scenario depicting a family with a sick child, and then to complete a questionnaire that measured their perceived likelihood of hiring a Health Care Advocate (HCA). The results indicated some demographic differences between MTurk and community participants. There was an effect of medical condition in the MTurk sample, such that participants were more likely to perceive hiring an HCA for a child with leukemia than cystic fibrosis (p = .008). However, in the community sample, there was an effect of conception difficulty where participants were more likely to perceive hiring an HCA for a child who took 2 months to conceive than 5 years to conceive (p = .012). Despite some psychometric similarities between the two samples, there were some differences in the constructs measured in the two samples. Future researchers should continue to evaluate the reliability and validity of paper-and-pencil measurements for online administration.
Similar content being viewed by others
References
American Psychological Association. (2017). Ethical principles of psychologists and code of conduct. (2002, amended effective June 1, 2010, and January 1, 2017). http://www.apa.org/ethics/code/index.aspx
Antoun, C., Zhang, C., Conrad, F. G., & Schober, M. F. (2015). Comparisons of online recruitment strategies for convenience samples: Craigslist, Google AdWords, Facebook, and Amazon mechanical Turk. Field Methods, 28(3), 231–246. https://doi.org/10.1177/1525822X15603149.
Aruguete, M. S., Huynh, H., Browne, B. L., Jurs, B., Flint, E., & McCutcheon, L. E. (2019). How serious is the ‘carelessness’ problem on mechanical Turk? International Journal of Social Research Methodology, 22(5), 441–449.
Balboa Park. (n.d.) Advertising & Sponsorship. Retrieved March 28, 2020, from https://www.balboapark.org/about/sponsor-ads
Barak, A. (2011). Internet-based psychological testing and assessment. In Online Counseling (pp. 225-255). Elsevier.
Bartneck, C., Duenser, A., Moltchanova, E., & Zawieska, K. (2015). Comparing the similarity of responses received from studies in Amazon's mechanical Turk to studies conducted online and with direct recruitment. PLoS One, 10(4), e0121595. https://doi.org/10.1371/journal.pone.0121595.
Beymer, M. R., Holloway, I. W., & Grov, C. (2018). Comparing self-reported demographic and sexual behavioral factors among men who have sex with men recruited through mechanical Turk, Qualtrics, and a HIV/STI clinic-based sample: Implications for researchers and providers. Archives of Sexual Behavior, 47(1), 133–142.
Brock, R. L., Barry, R. A., Lawrence, E., Dey, J., & Rolffs, J. (2012). Internet administration of paper-and-pencil questionnaires used in couple research: Assessing psychometric equivalence. Assessment, 19(2), 226–242.
Buchanan, T. (2002). Online assessment: Desirable or dangerous? Professional Psychology: Research and Practice, 33(2), 148–154.
Buchanan, T., Ali, T., Heffernan, T. M., Ling, J., Parrott, A. C., Rodgers, J., & Scholey, A. B. (2005). Nonequivalence of on-line and paper-and-pencil psychological tests: The case of the prospective memory questionnaire. Behavior Research Methods, 37(1), 148–154.
Buchanan, T., & Smith, J. L. (1999). Using the internet for psychological research: Personality testing on the world wide web. British Journal of Psychology, 90(1), 125–144.
Buhrmester, M. K., Kwang, T. T., & Gosling, S. D. (2011). Amazon's MechanicalTurk: A new source of inexpensive, yet high-quality. Perspectives on Psychological Science, 6, 3–5.
Chambers, S., Nimon, K., & Anthony-McMann, P. (2016). A primer for conducting survey research using MTurk: Tips for the field. International Journal of Adult Vocational Education and Technology (IJAVET), 7(2), 54–73.
Chandler, J., Sisso, I., & Shapiro, D. (2020). Participant carelessness and fraud: Consequences for clinical research and potential solutions. Journal of Abnormal Psychology, 129(1), 49–55.
Coles, M. E., Cook, L. M., & Blake, T. R. (2007). Assessing obsessive compulsive symptoms and cognitions on the internet: Evidence for the comparability of paper and internet administration. Behaviour Research and Therapy, 45(9), 2232–2240.
Davis, R. N. (1999). Web-based administration of a personality questionnaire: Comparison with traditional methods. Behavior Research Methods, Instruments, & Computers, 31(4), 572–577.
Difallah, D., Filatova, E., & Ipeirotis, P. (2018). Demographics and dynamics of mechanical Turk workers. In Proceedings of the eleventh ACM international conference on web search and data mining (pp. 135-143).
Follmer, D. J., Sperling, R. A., & Suen, H. K. (2017). The role of MTurk in education research: Advantages, issues, and future directions. Educational Researcher, 46(6), 329–334. https://doi.org/10.3102/0013189X17725519.
Goodman, J. K., Cryder, C. E., & Cheema, A. (2013). Data collection in a flat world: The strengths and weaknesses of mechanical Turk samples. Journal of Behavioral Decision Making, 26(3), 213–224.
Hauser, D., Paolacci, G., & Chandler, J. J. (2018). Common concerns with MTurk as a participant pool: Evidence and solutions.
Hertel, G., Naumann, S., Konradt, U., & Batinic, B. (2002). Personality assessment via internet: Comparing online and paper-and-pencil questionnaires. Online social sciences, 115-133.
Huff, C., & Tingley, D. (2015). "Who are these people?" Evaluating the demographic characteristics and political preferences of MTurk survey respondents. Research & Politics, 2. https://doi.org/10.1177/2053168015604648.
Janvier, A., Leblanc, I., & Barrington, K. J. (2008). Nobody likes premies: The relative value of patients’ lives. Journal of Perinatology, 28(12), 821–826.
Kuang, J., Argo, L., Stoddard, G., Bray, B. E., & Zeng-Treitler, Q. (2015). Assessing pictograph recognition: A comparison of crowdsourcing and traditional survey approaches. Journal of Medical Internet Research, 17(12), e281.
Levay, K. E., Freese, J., & Druckman, J. N. (2016). The demographic and political composition of mechanical Turk samples. SAGE Open, 6(1), 2158244016636433. https://doi.org/10.1177/2158244016636433.
Luce, K. H., Winzelberg, A. J., Das, S., Osborne, M. I., Bryson, S. W., & Taylor, C. B. (2007). Reliability of self-report: Paper versus online administration. Computers in Human Behavior, 23(3), 1384–1389.
Lynch, C. D. (2011). How long does it take the average couple to get pregnant? A systematic review of what we know. Fertility and Sterility, 96(3), S115.
McCredie, M. N., & Morey, L. C. (2019). Who are the Turkers? A characterization of MTurk workers using the personality assessment inventory. Assessment, 26(5), 759–766.
Meyerson, P., & Tryon, W. W. (2003). Validating internet research: A test of the psychometric equivalence of internet and in-person samples. Behavior Research Methods, Instruments, & Computers, 35(4), 614–620.
Mortensen, K., & Hughes, T. L. (2018). Comparing Amazon’s mechanical Turk platform to conventional data collection methods in the health and medical research literature. Journal of General Internal Medicine, 33(4), 533–538. https://doi.org/10.1007/s11606-017-4246-0.
Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon mechanical Turk. Judgment and Decision making, 5(5), 411–419.
Peer, E., Vosgerau, J., & Acquisti, A. (2014). Reputation as a sufficient condition for data quality on Amazon mechanical Turk. Behavior Research Methods, 46(4), 1023–1031. https://doi.org/10.3758/s13428-013-0434-y.
Riva, G., Teruzzi, T., & Anolli, L. (2003). The use of the internet in psychological research: Comparison of online and offline questionnaires. Cyberpsychology & Behavior, 6(1), 73–80.
Robinson, J., Rosenzweig, C., Moss, A. J., & Litman, L. (2019). Tapped out or barely tapped? Recommendations for how to harness the vast and largely unused potential of the mechanical Turk participant pool. PLoS One, 14(12), e0226394.
Stanton, J. M. (1998). An empirical assessment of data collection using the internet. Personnel Psychology, 51(3), 709–725.
Tseng, H.-M., Macleod, H. A., & Wright, P. (1997). Computer anxiety and measurement of mood change. Computers in Human Behavior, 13(3), 305–316.
United States Census Bureau, U. S. C (n.d.). U.S. and world population clock. U.S. Department of Commerce.
Vasserman-Stokes, E. A., Cronan, T. A., & Sadler, M. S. (2012). Factors that influence the likelihood of hiring a health care advocate for a chronically ill child. Journal of Pediatric Health Care, 26(1), 27–36.
Walters, K., Christakis, D. A., & Wright, D. R. (2018). Are mechanical Turk worker samples representative of health status and health behaviors in the U.S. PLOS ONE, 13(6), e0198835. https://doi.org/10.1371/journal.pone.0198835.
Yank, V., Agarwal, S., Loftus, P., Asch, S., & Rehkopf, D. (2017). Crowdsourced health data: Comparability to a US National Survey, 2013–2015. American Journal of Public Health, 107(8), 1283–1289.
Acknowledgments
We would like to acknowledge Kai Givogue for his assistance in transferring the methodology from the community study to the MTurk study and in conducting literature reviews.
Funding
Research reported in this publication was supported in part by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R25GM058906. The content is solely the responsibility of the authors, and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Data Sharing and Data Accessibility
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Thompson, L.M., Van Liew, C., Patrus, A. et al. Are we measuring the same health constructs? Amazon’s Mechanical Turk versus a community sample. Curr Psychol 41, 6700–6711 (2022). https://doi.org/10.1007/s12144-020-01176-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12144-020-01176-3