Skip to main content

Advertisement

Log in

Are Relational Inferences from Crowdsourced and Opt-in Samples Generalizable? Comparing Criminal Justice Attitudes in the GSS and Five Online Samples

  • Revisions
  • Published:
Journal of Quantitative Criminology Aims and scope Submit manuscript

Abstract

Objectives

Similar to researchers in other disciplines, criminologists increasingly are using online crowdsourcing and opt-in panels for sampling, because of their low cost and convenience. However, online non-probability samples’ “fitness for use” will depend on the inference type and outcome variables of interest. Many studies use these samples to analyze relationships between variables. We explain how selection bias—when selection is a collider variable—and effect heterogeneity may undermine, respectively, the internal and external validity of relational inferences from crowdsourced and opt-in samples. We then examine whether such samples yield generalizable inferences about the correlates of criminal justice attitudes specifically.

Methods

We compare multivariate regression results from five online non-probability samples drawn either from Amazon Mechanical Turk or an opt-in panel to those from the General Social Survey (GSS). The online samples include more than 4500 respondents nationally and four outcome variables measuring criminal justice attitudes. We estimate identical models for the online non-probability and GSS samples.

Results

Regression coefficients in the online samples are normally in the same direction as the GSS coefficients, especially when they are statistically significant, but they differ considerably in magnitude; more than half (54%) fall outside the GSS’s 95% confidence interval.

Conclusions

Online non-probability samples appear useful for estimating the direction but not the magnitude of relationships between variables, at least absent effective model-based adjustments. However, adjusting only for demographics, either through weighting or statistical control, is insufficient. We recommend that researchers conduct both a provisional generalizability check and a model-specification test before using these samples to make relational inferences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. There may also be fewer errors of observation in online web surveys because of the elimination of interviewer effects, less potential for social desirability bias, and higher quality responding (Chang and Krosnick 2009; Weinberg et al. 2014; Yeager et al. 2011). However, issues such as respondent nonnaïveté may lead to unique types of errors of observation that are especially problematic for these surveys (Chandler et al. 2014).

  2. Although selection on X does not introduce bias, it does reduce efficiency and statistical power (Berk 1983). It also has different consequences for the bivariate correlation (rYX) and regression coefficient (bYX), because both variables are outcomes for the correlation (Sackett and Yang 2000). The Pearson correlation between two variables, X and Y, is simply the geometric mean of the slopes (bYX and bXY) from regressing Y on X and then X on Y.

  3. The reason is that typically there are more possible sources of confounded sampling than of endogenous sampling. In an online study of death penalty support, for example, all common causes of SONS and death penalty support (e.g., race, gender, political ideology) would be confounders, whereas the only potential source of endogenous sampling would be death penalty support (or a variable caused by death penalty support).

  4. The response rate of the 2016 GSS sample is not yet available, however, response rates have consistently hovered around 70% since the year 2000.

  5. For more information about how the questionnaire items are administered, see Appendix Q of the General Social Survey (GSS), retrieved from http://gss.norc.org/DOCUMENTS/CODEBOOK/Q.pdf.

  6. The analytic samples for the models estimated with the SurveyMonkey sample are much smaller than the full sample for two reasons. First, several hundred cases have item missing data on education. SurveyMonkey measured this variable at the profile stage of panel recruitment and provided it to us. Changes in the profiling process before our survey resulted in several hundred panelists lacking data on this pre-recorded variable. This data appears to be missing at random with respect to both outcomes—neither outcome differs significantly across respondents with versus without item missing data on education. Second, 288 respondents answered “don’t know” to the cappun question, and 101 to the fear question, and these responses are treated as missing in the analysis.

  7. There is one small presentational difference in the law enforcement spending question asked in the MTurk17 and GSS samples. In the GSS respondents are asked, “We are faced with many problems in this country, none of which can be solved easily or inexpensively. I’m going to name some of these problems, and for each one I’d like you to tell me whether you think we're spending too much money on it, too little money, or about the right amount. First (READ ITEM A)… are we spending too much, too little, or about the right amount on (ITEM)?” Respondents are then asked to decide their spending preferences on a variety of issues. In the MTurk17 sample, it is a standalone question with the same introduction (i.e., respondents are not asked about spending on other topics).

  8. For presentational purposes, we divided the continuous age variable by 50. This approach, suggested by one reviewer, makes it easier to see the differences across samples by widening the otherwise small confidence intervals.

  9. To weight the GSS data we used the “WTSSALL” variable and adjusted for the geographic clustering of respondents with the “VSTRAT” and “VPSU” variables. We did this in Stata 15 using the following command: svyset [weight = wtssall], strata(vstrat) psu(vpsu) singleunit(scaled).

  10. As Page and Shapiro (1992, p. 422) explain, “the evidence indicates that ‘house effects’ are mostly limited to one specific area: ‘don’t knows’ … Thus it is generally safe to compare identical questions across survey organizations, so long as one excludes ‘don’t knows’”.

  11. Typically, to compare logistic regression coefficients across samples, we would need to use heterogeneous choice models to control for the confounding effects of group differences in residual variation (Williams 2009). But as one reviewer pointed out, the GSS and online samples are assumed to represent the same population, and thus we should not expect differences in residual variation across the samples absent selection bias.

  12. In addition to the figures, tables comparing the weighted GSS and unweighted online estimates can be found in the online supplementary materials.

  13. If we reverse the comparison, and focus instead on the confidence intervals in the online samples, we find that 22 out of 56 (or 39%) excluded the GSS regression coefficient.

  14. We also estimated supplementary models with both the GSS and online samples unweighted (see Tables C1–C7 in the online supplement). The substantive conclusions were unchanged.

References

  • Ansolabehere S, Rivers D (2013) Cooperative survey research. Annu Rev Polit Sci 16:307–329

    Google Scholar 

  • Baker R, Blumberg SJ, Brick JM, Couper MP, Courtright M, Dennis JM, Dillman D, Frankel MR, Garland P, Groves RM, Kennedy C, Krosnick JA, Lavrakas PJ, Lee S, Link M, Piekarski L, Rao K, Thomas RK, Zahs D (2010) Research synthesis: aAPOR report on online panels. Public Opin Q 74:711–781

    Google Scholar 

  • Baker R, Brick JM, Bates NA, Battaglia M, Couper MP, Dever JA, Gile KJ, Tourangeau R (2013) Summary report of the AAPOR task force on non-probability sampling. J Sur Stat Methodol 1:90–143

    Google Scholar 

  • Berk RA (1983) An introduction to sample selection bias in sociological data. Am Sociol Rev 48:386–398

    Google Scholar 

  • Berk RA, Ray SC (1982) Selection biases in sociological data. Soc Sci Res 11:352–398

    Google Scholar 

  • Berryessa CM (2018) The effects of psychiatric and “biological” labels on lay sentencing and punishment decisions. J Exp Criminol 14:241–256

    Google Scholar 

  • Bhutta C (2012) Not by the book: facebook as a sampling frame. Sociol Methods Res 41:57–88

    Google Scholar 

  • Blair J, Czaja RF, Blair EA (2013) Designing surveys: a guide to decisions and procedures. Sage, Thousand Oaks

    Google Scholar 

  • Bollen KA, Biemer PP, Karr AF, Tueller S, Berzofsky ME (2016) Are survey weights needed? A review of diagnostic tests in regression analysis. Annu Rev Stat Appl 3:375–392

    Google Scholar 

  • Brandon DM, Long JH, Loraas TM, Mueller-Phillips J, Vansant B (2013) Online instrument delivery and participant recruitment services: emerging opportunities for behavioral accounting research. Behav Res Account 26:1–23

    Google Scholar 

  • Brown EK, Socia KM (2017) Twenty-first century punitiveness: social sources of punitive American views reconsidered. J Quant Criminol 33:935–959

    Google Scholar 

  • Bullock JG, Green DP, Ha SE (2010) Yes, but what’s the mechanism? (don’t expect an easy answer). J Pers Soc Psychol 98:550–558

    Google Scholar 

  • Callegaro M, Villar A, Krosnick J, Yeager D (2014) A critical review of studies investigating the quality of data obtained with online panels. In: Callegaro M, Baker R, Bethlehem J, Goritz A, Krosnick J, Lavrakas P (eds) Online panel research: a data quality perspective. Wiley, New York, pp 23–53

    Google Scholar 

  • Callegaro M, Manfreda KL, Vehovar V (2015) Web survey methodology. Sage, Thousand Oaks

    Google Scholar 

  • Casey LS, Chandler J, Levine AS, Proctor A, Strolovitch DZ (2017) Intertemporal differences among MTurk workers: time-based sample variations and implications for online data collection. SAGE Open 7:1–15

    Google Scholar 

  • Chandler J, Shapiro D (2016) Conducting clinical research using crowdsourced convenience samples. Annu Rev Clin Psychol 12:53–81

    Google Scholar 

  • Chandler J, Mueller P, Paolacci G (2014) Nonnaïveté among Amazon Mechanical Turk workers: consequences and solutions for behavioral researchers. Behav Res Methods 46:112–130

    Google Scholar 

  • Chang L, Krosnick JA (2009) National surveys via RDD telephone interviewing versus the internet: comparing sample representativeness and response quality. Public Opin Q 73:641–678

    Google Scholar 

  • Couper MP (2011) The future of modes of data collection. Public Opin Q 75:889–908

    Google Scholar 

  • Denver M, Pickett JT, Bushway SD (2017) Criminal records and employment: a survey of experiences and attitudes in the United States. Justice Q 35:584–613

    Google Scholar 

  • Dum CP, Socia KM, Rydberg J (2017) Public support for emergency shelter housing interventions concerning stigmatized populations. Criminol Public Policy 16:835–877

    Google Scholar 

  • DuMouchel WH, Duncan GJ (1983) Using sample survey weights in multiple regression analyses of stratified samples. J Am Stat Assoc 75:535–543

    Google Scholar 

  • Elliott MR, Valliant R (2017) Inference for nonprobability samples. Stat Sci 32:249–264

    Google Scholar 

  • Elwert F, Winship C (2014) Endogenous selection bias: the problem of conditioning on a collider variable. Annu Rev Sociol 40:31–53

    Google Scholar 

  • Enns PK, Ramirez M (2018) Privatizing punishment: testing theories of public support for private prison and immigration detention facilities. Criminology 56:546–573

    Google Scholar 

  • ESOMAR 28: Surveymonkey Audience (2013) European Society for Opinion and Marketing Research, Amsterdam. https://www.esomar.org/

  • Gelman A (2007) Struggles with survey weighting and regression modeling. Stat Sci 22:153–164

    Google Scholar 

  • Gelman A, Carlin JB (2002) Postratification and weighting adjustments. In: Groves RM, Dillman DA, Eltinge JL, Little RJA (eds) Survey nonresponse. Wiley, New York, pp 289–302

    Google Scholar 

  • Gottlieb A (2017) The effect of message frames on public attitudes toward criminal justice reform for nonviolent offenses. Crime Delinq 63:636–656

    Google Scholar 

  • Greenland S (2003) Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiol 14:300–306

    Google Scholar 

  • Groves RM, Fowler FJ, Couper MP, Lepkowski J, Singer E, Tourangeau R (2009) Survey methodology, 2nd edn. Wiley, Hoboken

    Google Scholar 

  • Holbert RL, Shah DV, Kwak N (2004) Fear, authority, and justice: crime-related TV viewing and endorsements of capital punishment and gun ownership. Journal Mass Commun Q 81:343–363

    Google Scholar 

  • Horton JJ, Rand DG, Zeckhauser RJ (2011) The online laboratory: conducting experiments in a real labor market. Exp Econ 14:399–425

    Google Scholar 

  • Hox JJ, De Leeuw ED, Zijlmans EA (2015) Measurement equivalence in mixed mode surveys. Front Psychol 6:1–10

    Google Scholar 

  • Johnson D (2009) Anger about crime and support for punitive criminal justice policies. Punishm Soc 11:51–66

    Google Scholar 

  • Johnson D, Kuhns JB (2009) Striking out: race and support for police use of force. Justice Q 26:592–623

    Google Scholar 

  • Jones DN, Olderbak SG (2014) The associations among dark personalities and sexual tactics across different scenarios. J Interp Viol 29:1050–1070

    Google Scholar 

  • Keeter S, McGeeney K, Mercer A, Hatley N, Pattern E, Perrin A (2015) Coverage error in internet surveys. Pew Research Center, Washington. Retrieved from https://www.pewresearch.org/methods/2015/09/22/coverage-error-in-internet-surveys/

  • King RD, Wheelock D (2007) Group threat and social control: race, perceptions of minorities and the desire to punish. Soc Forces 85:1255–1280

    Google Scholar 

  • Lageson SE, McElrath S, Palmer KE (2018) Gendered public support for criminalizing “Revenge Porn”. Feminist Criminol 14:560–583

    Google Scholar 

  • Lehmann PS, Pickett JT (2017) Experience versus expectation: economic insecurity, the Great Recession, and support for the death penalty. Justice Q 34:873–902

    Google Scholar 

  • Levay KE, Freese J, Druckman JN (2016) The demographic and political composition of Mechanical Turk samples. Sage Open 6:1–17

    Google Scholar 

  • Little A, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York

    Google Scholar 

  • Mercer AW, Kreuter F, Keeter S, Stuart EA (2017) Theory and practice in nonprobability surveys: parallels between causal inference and survey inference. Public Opin Q 81:250–271

    Google Scholar 

  • Mercer A, Lau A, Kennedy C (2018) For weighting online opt-in samples, what matters most?. Pew Research Center, Washington

    Google Scholar 

  • Morgan SL, Winship C (2015) Counterfactuals and causal inference. Cambridge University Press, Oxford

    Google Scholar 

  • Mullinix KJ, Leeper TJ, Druckman JN, Free se J (2015) The generalizability of survey experiments. J Exp Pol Sci 2:109–138

    Google Scholar 

  • Nicolaas G, Calderwood L, Lynn P, Roberts C (2014) Web surveys for the general population: How, why and when?. National Centre for Research Methods, Southampton. Retrieved from http://eprints.ncrm.ac.uk/3309/3/GenPopWeb.pdf

  • Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251):aac4716

    Google Scholar 

  • Page BI, Shapiro RY (1992) The rational public: fifty years of trends in Americans’ policy preferences. Chicago University Press, Chicago

    Google Scholar 

  • Pasek J (2016) When will nonprobability surveys mirror probability surveys? Considering types of inference and weighting strategies as criteria for correspondence. Int J Public Opin Res 28:269–291

    Google Scholar 

  • Pasek J, Krosnick JA (2010) Measuring intent to participate and participation in the 2010 census and their correlates and trends: comparisons of RDD telephone and non–probability sample internet survey data. Statistical Research Division of the US Census Bureau, Washington. Retrieved from https://www.mod.gu.se/digitalAssets/1456/1456661_pasek-krosnick-mode-census.pdf

  • Peer E, Vosgerau J, Acquisti A (2014) Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behav Res Methods 46:1023–1031

    Google Scholar 

  • Peffley M, Hurwitz J (2007) Persuasion and resistance: race and the death penalty in America. Am J Pol Sci 51:996–1012

    Google Scholar 

  • Peytchev A (2009) Survey breakoff. Public Opin Q 73:74–97

    Google Scholar 

  • Peytchev A (2011) Breakoff and unit nonresponse across web surveys. J Off Stat 27:33–47

    Google Scholar 

  • Pfeffermann D (1993) The role of sampling weights when modeling survey data. Int Stat Rev 61:317–337

    Google Scholar 

  • Pickett JT (2016) On the social foundations for crimmigration: latino threat and support for expanded police powers. J Quant Criminol 32:103–132

    Google Scholar 

  • Pickett JT, Mancini C, Mears DP (2013) Vulnerable victims, monstrous offenders, and unmanageable risk: explaining public opinion on the social control of sex crime. Criminology 51:729–759

    Google Scholar 

  • Pickett JT, Cullen F, Bushway SD, Chiricos T, Alpert G (2018) The response rate test: nonresponse bias and the future of survey research in criminology and criminal justice. Criminologist 43:7–11

    Google Scholar 

  • Rivers D (2007) Sampling for web surveys. Joint Statistical Meetings, Salt Lake

    Google Scholar 

  • Roche SP, Pickett JT, Gertz M (2016) The scary world of online news? Internet news exposure and public attitdues toward crime and justice. J Quant Criminol 32:215–236

    Google Scholar 

  • Ross J, Irani L, Silberman M, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers? Shifting demographics in Mechanical Turk. In: Edwards K, Rodden T, Proceedings of the ACM conference on human factors in computing systems. ACM, New York

  • Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66:688

    Google Scholar 

  • Sackett PR, Yang H (2000) Correction for range restriction: an expanded typology. J Appl Psychol 85:112–118

    Google Scholar 

  • Shadish W, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin, Boston

    Google Scholar 

  • Shadish WR, Clark MH, Steiner PM (2008) Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. J Am Stat Assoc 103:1334–1344

    Google Scholar 

  • Sheehan KB, Pittman M (2016) Amazon’s Mechanical Turk for academics: The HIT handbook for social science research. Melvin and Leigh, Irvine

    Google Scholar 

  • Silver JR, Pickett JT (2015) Toward a better understanding of politicized policing attitudes: conflicted conservatism and support for police use of force. Criminology 53:650–676

    Google Scholar 

  • Silver JR, Silver E (2017) Why are conservatives more punitive than liberals? A moral foundations approach. Law Human Behav 41:258–272

    Google Scholar 

  • Simmons AD (2017) Cultivating support for punitive criminal justice policies: news sectors and the moderating effects of audience characteristics. Soc Forces 96:299–328

    Google Scholar 

  • Simmons AD, Bobo LD (2015) Can non-full-probability internet surveys yield useful data? A comparison with full-probability face-to-face surveys in the domain of race and social inequality attitudes. Sociol Methodol 45:357–387

    Google Scholar 

  • Solon G, Haider SJ, Wooldridge JM (2015) What are we weighting for? J Hum Resour 50:301–316

    Google Scholar 

  • Stewart N, Chandler J, Paolacci G (2017) Crowdsourcing samples in cognitive science. Trends Cogn Sci 21:736–748

    Google Scholar 

  • Tourangeau R, Yan T (2007) Sensitive questions in surveys. Psychol Bull 133(5):859

    Google Scholar 

  • Tourangeau R, Frederick G, Conrad FG, Couper MP (2013) The science of web surveys. Oxford University Press, Oxford

    Google Scholar 

  • Unnever JD, Cullen FT (2010) The social sources of Americans’ punitiveness: a test of three competing models. Criminology 48:99–129

    Google Scholar 

  • Unnever JD, Cullen FT, Jonson CL (2008) Race, racism, and support for capital punishment. Crime Justice 37:45–96

    Google Scholar 

  • Valliant R, Dever JA (2011) Estimating propensity adjustments for volunteer web surveys. Sociol Methods Res 40:105–137

    Google Scholar 

  • Vaughan TJ, Holleran LB, Silver J (2019) Applying moral foundations theory to the explanation of capital jurors’ sentencing decisions. Justice Q. https://doi.org/10.1080/07418825.2018.1537400

    Article  Google Scholar 

  • Wang W, Rothschild D, Goel S, Gelman A (2015) Forecasting elections with non-representative polls. Int J Forecast 31:980–991

    Google Scholar 

  • Weinberg JD, Freese J, McElhattan D (2014) Comparing data characteristics and results of an online factorial survey between a population-based and a crowdsource-recruited sample. Sociol Sci 1:292–310

    Google Scholar 

  • Williams R (2009) Using heterogeneous choice models to compare logit and probit coefficients across groups. Sociol Methods Res 37:531–559

    Google Scholar 

  • Winship C, Radbill L (1994) Sampling weights and regression analysis. Soc Methods Res 23:230–257

    Google Scholar 

  • Yeager DS, Krosnick JA, Chang L, Javitz HS, Levendusky MS, Simpser A, Wang R (2011) Comparing the accuracy of RDD telephone surveys and internet surveys conducted with probability and non-probability samples. Public Opin Q 75:709–747

    Google Scholar 

  • Zhou H, Fishbach A (2016) The pitfall of experimenting on the web: how unattended selective attrition leads to surprising (yet false) research conclusions. J Pers Soc Psychol 111:493–504

    Google Scholar 

Download references

Acknowledgements

The authors thank Jasmine Silver, Sean Roche, Luzi Shi, Megan Denver, and Shawn Bushway for their help collecting data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew J. Thompson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 86 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thompson, A.J., Pickett, J.T. Are Relational Inferences from Crowdsourced and Opt-in Samples Generalizable? Comparing Criminal Justice Attitudes in the GSS and Five Online Samples. J Quant Criminol 36, 907–932 (2020). https://doi.org/10.1007/s10940-019-09436-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10940-019-09436-7

Keywords

Navigation