Are Relational Inferences from Crowdsourced and Opt-in Samples Generalizable? Comparing Criminal Justice Attitudes in the GSS and Five Online Samples

  • Andrew J. ThompsonEmail author
  • Justin T. Pickett



Similar to researchers in other disciplines, criminologists increasingly are using online crowdsourcing and opt-in panels for sampling, because of their low cost and convenience. However, online non-probability samples’ “fitness for use” will depend on the inference type and outcome variables of interest. Many studies use these samples to analyze relationships between variables. We explain how selection bias—when selection is a collider variable—and effect heterogeneity may undermine, respectively, the internal and external validity of relational inferences from crowdsourced and opt-in samples. We then examine whether such samples yield generalizable inferences about the correlates of criminal justice attitudes specifically.


We compare multivariate regression results from five online non-probability samples drawn either from Amazon Mechanical Turk or an opt-in panel to those from the General Social Survey (GSS). The online samples include more than 4500 respondents nationally and four outcome variables measuring criminal justice attitudes. We estimate identical models for the online non-probability and GSS samples.


Regression coefficients in the online samples are normally in the same direction as the GSS coefficients, especially when they are statistically significant, but they differ considerably in magnitude; more than half (54%) fall outside the GSS’s 95% confidence interval.


Online non-probability samples appear useful for estimating the direction but not the magnitude of relationships between variables, at least absent effective model-based adjustments. However, adjusting only for demographics, either through weighting or statistical control, is insufficient. We recommend that researchers conduct both a provisional generalizability check and a model-specification test before using these samples to make relational inferences.


Web survey Selection bias Collider variable Amazon Mechanical Turk Opt-in panel 



The authors thank Jasmine Silver, Sean Roche, Luzi Shi, Megan Denver, and Shawn Bushway for their help collecting data.

Supplementary material

10940_2019_9436_MOESM1_ESM.docx (86 kb)
Supplementary material 1 (DOCX 86 kb)


  1. Ansolabehere S, Rivers D (2013) Cooperative survey research. Annu Rev Polit Sci 16:307–329CrossRefGoogle Scholar
  2. Baker R, Blumberg SJ, Brick JM, Couper MP, Courtright M, Dennis JM, Dillman D, Frankel MR, Garland P, Groves RM, Kennedy C, Krosnick JA, Lavrakas PJ, Lee S, Link M, Piekarski L, Rao K, Thomas RK, Zahs D (2010) Research synthesis: aAPOR report on online panels. Public Opin Q 74:711–781CrossRefGoogle Scholar
  3. Baker R, Brick JM, Bates NA, Battaglia M, Couper MP, Dever JA, Gile KJ, Tourangeau R (2013) Summary report of the AAPOR task force on non-probability sampling. J Sur Stat Methodol 1:90–143CrossRefGoogle Scholar
  4. Berk RA (1983) An introduction to sample selection bias in sociological data. Am Sociol Rev 48:386–398CrossRefGoogle Scholar
  5. Berk RA, Ray SC (1982) Selection biases in sociological data. Soc Sci Res 11:352–398CrossRefGoogle Scholar
  6. Berryessa CM (2018) The effects of psychiatric and “biological” labels on lay sentencing and punishment decisions. J Exp Criminol 14:241–256CrossRefGoogle Scholar
  7. Bhutta C (2012) Not by the book: facebook as a sampling frame. Sociol Methods Res 41:57–88CrossRefGoogle Scholar
  8. Blair J, Czaja RF, Blair EA (2013) Designing surveys: a guide to decisions and procedures. Sage, Thousand OaksGoogle Scholar
  9. Bollen KA, Biemer PP, Karr AF, Tueller S, Berzofsky ME (2016) Are survey weights needed? A review of diagnostic tests in regression analysis. Annu Rev Stat Appl 3:375–392CrossRefGoogle Scholar
  10. Brandon DM, Long JH, Loraas TM, Mueller-Phillips J, Vansant B (2013) Online instrument delivery and participant recruitment services: emerging opportunities for behavioral accounting research. Behav Res Account 26:1–23CrossRefGoogle Scholar
  11. Brown EK, Socia KM (2017) Twenty-first century punitiveness: social sources of punitive American views reconsidered. J Quant Criminol 33:935–959CrossRefGoogle Scholar
  12. Bullock JG, Green DP, Ha SE (2010) Yes, but what’s the mechanism? (don’t expect an easy answer). J Pers Soc Psychol 98:550–558CrossRefGoogle Scholar
  13. Callegaro M, Villar A, Krosnick J, Yeager D (2014) A critical review of studies investigating the quality of data obtained with online panels. In: Callegaro M, Baker R, Bethlehem J, Goritz A, Krosnick J, Lavrakas P (eds) Online panel research: a data quality perspective. Wiley, New York, pp 23–53CrossRefGoogle Scholar
  14. Callegaro M, Manfreda KL, Vehovar V (2015) Web survey methodology. Sage, Thousand OaksGoogle Scholar
  15. Casey LS, Chandler J, Levine AS, Proctor A, Strolovitch DZ (2017) Intertemporal differences among MTurk workers: time-based sample variations and implications for online data collection. SAGE Open 7:1–15CrossRefGoogle Scholar
  16. Chandler J, Shapiro D (2016) Conducting clinical research using crowdsourced convenience samples. Annu Rev Clin Psychol 12:53–81CrossRefGoogle Scholar
  17. Chandler J, Mueller P, Paolacci G (2014) Nonnaïveté among Amazon Mechanical Turk workers: consequences and solutions for behavioral researchers. Behav Res Methods 46:112–130CrossRefGoogle Scholar
  18. Chang L, Krosnick JA (2009) National surveys via RDD telephone interviewing versus the internet: comparing sample representativeness and response quality. Public Opin Q 73:641–678CrossRefGoogle Scholar
  19. Couper MP (2011) The future of modes of data collection. Public Opin Q 75:889–908CrossRefGoogle Scholar
  20. Denver M, Pickett JT, Bushway SD (2017) Criminal records and employment: a survey of experiences and attitudes in the United States. Justice Q 35:584–613CrossRefGoogle Scholar
  21. Dum CP, Socia KM, Rydberg J (2017) Public support for emergency shelter housing interventions concerning stigmatized populations. Criminol Public Policy 16:835–877CrossRefGoogle Scholar
  22. DuMouchel WH, Duncan GJ (1983) Using sample survey weights in multiple regression analyses of stratified samples. J Am Stat Assoc 75:535–543CrossRefGoogle Scholar
  23. Elliott MR, Valliant R (2017) Inference for nonprobability samples. Stat Sci 32:249–264CrossRefGoogle Scholar
  24. Elwert F, Winship C (2014) Endogenous selection bias: the problem of conditioning on a collider variable. Annu Rev Sociol 40:31–53CrossRefGoogle Scholar
  25. Enns PK, Ramirez M (2018) Privatizing punishment: testing theories of public support for private prison and immigration detention facilities. Criminology 56:546–573CrossRefGoogle Scholar
  26. ESOMAR 28: Surveymonkey Audience (2013) European Society for Opinion and Marketing Research, Amsterdam.
  27. Gelman A (2007) Struggles with survey weighting and regression modeling. Stat Sci 22:153–164CrossRefGoogle Scholar
  28. Gelman A, Carlin JB (2002) Postratification and weighting adjustments. In: Groves RM, Dillman DA, Eltinge JL, Little RJA (eds) Survey nonresponse. Wiley, New York, pp 289–302Google Scholar
  29. Gottlieb A (2017) The effect of message frames on public attitudes toward criminal justice reform for nonviolent offenses. Crime Delinq 63:636–656CrossRefGoogle Scholar
  30. Greenland S (2003) Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiol 14:300–306Google Scholar
  31. Groves RM, Fowler FJ, Couper MP, Lepkowski J, Singer E, Tourangeau R (2009) Survey methodology, 2nd edn. Wiley, HobokenGoogle Scholar
  32. Holbert RL, Shah DV, Kwak N (2004) Fear, authority, and justice: crime-related TV viewing and endorsements of capital punishment and gun ownership. Journal Mass Commun Q 81:343–363CrossRefGoogle Scholar
  33. Horton JJ, Rand DG, Zeckhauser RJ (2011) The online laboratory: conducting experiments in a real labor market. Exp Econ 14:399–425CrossRefGoogle Scholar
  34. Hox JJ, De Leeuw ED, Zijlmans EA (2015) Measurement equivalence in mixed mode surveys. Front Psychol 6:1–10CrossRefGoogle Scholar
  35. Johnson D (2009) Anger about crime and support for punitive criminal justice policies. Punishm Soc 11:51–66CrossRefGoogle Scholar
  36. Johnson D, Kuhns JB (2009) Striking out: race and support for police use of force. Justice Q 26:592–623CrossRefGoogle Scholar
  37. Jones DN, Olderbak SG (2014) The associations among dark personalities and sexual tactics across different scenarios. J Interp Viol 29:1050–1070CrossRefGoogle Scholar
  38. Keeter S, McGeeney K, Mercer A, Hatley N, Pattern E, Perrin A (2015) Coverage error in internet surveys. Pew Research Center, Washington. Retrieved from
  39. King RD, Wheelock D (2007) Group threat and social control: race, perceptions of minorities and the desire to punish. Soc Forces 85:1255–1280CrossRefGoogle Scholar
  40. Lageson SE, McElrath S, Palmer KE (2018) Gendered public support for criminalizing “Revenge Porn”. Feminist Criminol 14:560–583CrossRefGoogle Scholar
  41. Lehmann PS, Pickett JT (2017) Experience versus expectation: economic insecurity, the Great Recession, and support for the death penalty. Justice Q 34:873–902CrossRefGoogle Scholar
  42. Levay KE, Freese J, Druckman JN (2016) The demographic and political composition of Mechanical Turk samples. Sage Open 6:1–17CrossRefGoogle Scholar
  43. Little A, Rubin DB (2002) Statistical analysis with missing data. Wiley, New YorkCrossRefGoogle Scholar
  44. Mercer AW, Kreuter F, Keeter S, Stuart EA (2017) Theory and practice in nonprobability surveys: parallels between causal inference and survey inference. Public Opin Q 81:250–271CrossRefGoogle Scholar
  45. Mercer A, Lau A, Kennedy C (2018) For weighting online opt-in samples, what matters most?. Pew Research Center, WashingtonGoogle Scholar
  46. Morgan SL, Winship C (2015) Counterfactuals and causal inference. Cambridge University Press, OxfordGoogle Scholar
  47. Mullinix KJ, Leeper TJ, Druckman JN, Free se J (2015) The generalizability of survey experiments. J Exp Pol Sci 2:109–138CrossRefGoogle Scholar
  48. Nicolaas G, Calderwood L, Lynn P, Roberts C (2014) Web surveys for the general population: How, why and when?. National Centre for Research Methods, Southampton. Retrieved from
  49. Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251):aac4716CrossRefGoogle Scholar
  50. Page BI, Shapiro RY (1992) The rational public: fifty years of trends in Americans’ policy preferences. Chicago University Press, ChicagoCrossRefGoogle Scholar
  51. Pasek J (2016) When will nonprobability surveys mirror probability surveys? Considering types of inference and weighting strategies as criteria for correspondence. Int J Public Opin Res 28:269–291CrossRefGoogle Scholar
  52. Pasek J, Krosnick JA (2010) Measuring intent to participate and participation in the 2010 census and their correlates and trends: comparisons of RDD telephone and non–probability sample internet survey data. Statistical Research Division of the US Census Bureau, Washington. Retrieved from
  53. Peer E, Vosgerau J, Acquisti A (2014) Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behav Res Methods 46:1023–1031CrossRefGoogle Scholar
  54. Peffley M, Hurwitz J (2007) Persuasion and resistance: race and the death penalty in America. Am J Pol Sci 51:996–1012CrossRefGoogle Scholar
  55. Peytchev A (2009) Survey breakoff. Public Opin Q 73:74–97CrossRefGoogle Scholar
  56. Peytchev A (2011) Breakoff and unit nonresponse across web surveys. J Off Stat 27:33–47Google Scholar
  57. Pfeffermann D (1993) The role of sampling weights when modeling survey data. Int Stat Rev 61:317–337CrossRefGoogle Scholar
  58. Pickett JT (2016) On the social foundations for crimmigration: latino threat and support for expanded police powers. J Quant Criminol 32:103–132CrossRefGoogle Scholar
  59. Pickett JT, Mancini C, Mears DP (2013) Vulnerable victims, monstrous offenders, and unmanageable risk: explaining public opinion on the social control of sex crime. Criminology 51:729–759CrossRefGoogle Scholar
  60. Pickett JT, Cullen F, Bushway SD, Chiricos T, Alpert G (2018) The response rate test: nonresponse bias and the future of survey research in criminology and criminal justice. Criminologist 43:7–11Google Scholar
  61. Rivers D (2007) Sampling for web surveys. Joint Statistical Meetings, Salt LakeGoogle Scholar
  62. Roche SP, Pickett JT, Gertz M (2016) The scary world of online news? Internet news exposure and public attitdues toward crime and justice. J Quant Criminol 32:215–236CrossRefGoogle Scholar
  63. Ross J, Irani L, Silberman M, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers? Shifting demographics in Mechanical Turk. In: Edwards K, Rodden T, Proceedings of the ACM conference on human factors in computing systems. ACM, New YorkGoogle Scholar
  64. Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66:688CrossRefGoogle Scholar
  65. Sackett PR, Yang H (2000) Correction for range restriction: an expanded typology. J Appl Psychol 85:112–118CrossRefGoogle Scholar
  66. Shadish W, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin, BostonGoogle Scholar
  67. Shadish WR, Clark MH, Steiner PM (2008) Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. J Am Stat Assoc 103:1334–1344CrossRefGoogle Scholar
  68. Sheehan KB, Pittman M (2016) Amazon’s Mechanical Turk for academics: The HIT handbook for social science research. Melvin and Leigh, IrvineGoogle Scholar
  69. Silver JR, Pickett JT (2015) Toward a better understanding of politicized policing attitudes: conflicted conservatism and support for police use of force. Criminology 53:650–676CrossRefGoogle Scholar
  70. Silver JR, Silver E (2017) Why are conservatives more punitive than liberals? A moral foundations approach. Law Human Behav 41:258–272CrossRefGoogle Scholar
  71. Simmons AD (2017) Cultivating support for punitive criminal justice policies: news sectors and the moderating effects of audience characteristics. Soc Forces 96:299–328CrossRefGoogle Scholar
  72. Simmons AD, Bobo LD (2015) Can non-full-probability internet surveys yield useful data? A comparison with full-probability face-to-face surveys in the domain of race and social inequality attitudes. Sociol Methodol 45:357–387CrossRefGoogle Scholar
  73. Solon G, Haider SJ, Wooldridge JM (2015) What are we weighting for? J Hum Resour 50:301–316CrossRefGoogle Scholar
  74. Stewart N, Chandler J, Paolacci G (2017) Crowdsourcing samples in cognitive science. Trends Cogn Sci 21:736–748CrossRefGoogle Scholar
  75. Tourangeau R, Yan T (2007) Sensitive questions in surveys. Psychol Bull 133(5):859CrossRefGoogle Scholar
  76. Tourangeau R, Frederick G, Conrad FG, Couper MP (2013) The science of web surveys. Oxford University Press, OxfordCrossRefGoogle Scholar
  77. Unnever JD, Cullen FT (2010) The social sources of Americans’ punitiveness: a test of three competing models. Criminology 48:99–129CrossRefGoogle Scholar
  78. Unnever JD, Cullen FT, Jonson CL (2008) Race, racism, and support for capital punishment. Crime Justice 37:45–96CrossRefGoogle Scholar
  79. Valliant R, Dever JA (2011) Estimating propensity adjustments for volunteer web surveys. Sociol Methods Res 40:105–137CrossRefGoogle Scholar
  80. Vaughan TJ, Holleran LB, Silver J (2019) Applying moral foundations theory to the explanation of capital jurors’ sentencing decisions. Justice Q. CrossRefGoogle Scholar
  81. Wang W, Rothschild D, Goel S, Gelman A (2015) Forecasting elections with non-representative polls. Int J Forecast 31:980–991CrossRefGoogle Scholar
  82. Weinberg JD, Freese J, McElhattan D (2014) Comparing data characteristics and results of an online factorial survey between a population-based and a crowdsource-recruited sample. Sociol Sci 1:292–310CrossRefGoogle Scholar
  83. Williams R (2009) Using heterogeneous choice models to compare logit and probit coefficients across groups. Sociol Methods Res 37:531–559CrossRefGoogle Scholar
  84. Winship C, Radbill L (1994) Sampling weights and regression analysis. Soc Methods Res 23:230–257CrossRefGoogle Scholar
  85. Yeager DS, Krosnick JA, Chang L, Javitz HS, Levendusky MS, Simpser A, Wang R (2011) Comparing the accuracy of RDD telephone surveys and internet surveys conducted with probability and non-probability samples. Public Opin Q 75:709–747CrossRefGoogle Scholar
  86. Zhou H, Fishbach A (2016) The pitfall of experimenting on the web: how unattended selective attrition leads to surprising (yet false) research conclusions. J Pers Soc Psychol 111:493–504CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Criminal JusticeUniversity at Albany, SUNYAlbanyUSA

Personalised recommendations