Are Relational Inferences from Crowdsourced and Opt-in Samples Generalizable? Comparing Criminal Justice Attitudes in the GSS and Five Online Samples

Thompson, Andrew J.; Pickett, Justin T.

doi:10.1007/s10940-019-09436-7

Are Relational Inferences from Crowdsourced and Opt-in Samples Generalizable? Comparing Criminal Justice Attitudes in the GSS and Five Online Samples

Revisions
Published: 13 November 2019

Volume 36, pages 907–932, (2020)
Cite this article

Journal of Quantitative Criminology Aims and scope Submit manuscript

1932 Accesses
71 Citations
14 Altmetric
1 Mention
Explore all metrics

Abstract

Objectives

Similar to researchers in other disciplines, criminologists increasingly are using online crowdsourcing and opt-in panels for sampling, because of their low cost and convenience. However, online non-probability samples’ “fitness for use” will depend on the inference type and outcome variables of interest. Many studies use these samples to analyze relationships between variables. We explain how selection bias—when selection is a collider variable—and effect heterogeneity may undermine, respectively, the internal and external validity of relational inferences from crowdsourced and opt-in samples. We then examine whether such samples yield generalizable inferences about the correlates of criminal justice attitudes specifically.

Methods

We compare multivariate regression results from five online non-probability samples drawn either from Amazon Mechanical Turk or an opt-in panel to those from the General Social Survey (GSS). The online samples include more than 4500 respondents nationally and four outcome variables measuring criminal justice attitudes. We estimate identical models for the online non-probability and GSS samples.

Results

Regression coefficients in the online samples are normally in the same direction as the GSS coefficients, especially when they are statistically significant, but they differ considerably in magnitude; more than half (54%) fall outside the GSS’s 95% confidence interval.

Conclusions

Online non-probability samples appear useful for estimating the direction but not the magnitude of relationships between variables, at least absent effective model-based adjustments. However, adjusting only for demographics, either through weighting or statistical control, is insufficient. We recommend that researchers conduct both a provisional generalizability check and a model-specification test before using these samples to make relational inferences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Social Media Use and Mental Health among Young Adults

Article 01 November 2017

Chloe Berryman, Christopher J. Ferguson & Charles Negy

What is Qualitative in Qualitative Research

Article Open access 27 February 2019

Patrik Aspers & Ugo Corte

The Trustworthiness of Content Analysis

Notes

There may also be fewer errors of observation in online web surveys because of the elimination of interviewer effects, less potential for social desirability bias, and higher quality responding (Chang and Krosnick 2009; Weinberg et al. 2014; Yeager et al. 2011). However, issues such as respondent nonnaïveté may lead to unique types of errors of observation that are especially problematic for these surveys (Chandler et al. 2014).
Although selection on X does not introduce bias, it does reduce efficiency and statistical power (Berk 1983). It also has different consequences for the bivariate correlation (r_YX) and regression coefficient (b_YX), because both variables are outcomes for the correlation (Sackett and Yang 2000). The Pearson correlation between two variables, X and Y, is simply the geometric mean of the slopes (b_YX and b_XY) from regressing Y on X and then X on Y.
The reason is that typically there are more possible sources of confounded sampling than of endogenous sampling. In an online study of death penalty support, for example, all common causes of SONS and death penalty support (e.g., race, gender, political ideology) would be confounders, whereas the only potential source of endogenous sampling would be death penalty support (or a variable caused by death penalty support).
The response rate of the 2016 GSS sample is not yet available, however, response rates have consistently hovered around 70% since the year 2000.
For more information about how the questionnaire items are administered, see Appendix Q of the General Social Survey (GSS), retrieved from http://gss.norc.org/DOCUMENTS/CODEBOOK/Q.pdf.
The analytic samples for the models estimated with the SurveyMonkey sample are much smaller than the full sample for two reasons. First, several hundred cases have item missing data on education. SurveyMonkey measured this variable at the profile stage of panel recruitment and provided it to us. Changes in the profiling process before our survey resulted in several hundred panelists lacking data on this pre-recorded variable. This data appears to be missing at random with respect to both outcomes—neither outcome differs significantly across respondents with versus without item missing data on education. Second, 288 respondents answered “don’t know” to the cappun question, and 101 to the fear question, and these responses are treated as missing in the analysis.
There is one small presentational difference in the law enforcement spending question asked in the MTurk17 and GSS samples. In the GSS respondents are asked, “We are faced with many problems in this country, none of which can be solved easily or inexpensively. I’m going to name some of these problems, and for each one I’d like you to tell me whether you think we're spending too much money on it, too little money, or about the right amount. First (READ ITEM A)… are we spending too much, too little, or about the right amount on (ITEM)?” Respondents are then asked to decide their spending preferences on a variety of issues. In the MTurk17 sample, it is a standalone question with the same introduction (i.e., respondents are not asked about spending on other topics).
For presentational purposes, we divided the continuous age variable by 50. This approach, suggested by one reviewer, makes it easier to see the differences across samples by widening the otherwise small confidence intervals.
To weight the GSS data we used the “WTSSALL” variable and adjusted for the geographic clustering of respondents with the “VSTRAT” and “VPSU” variables. We did this in Stata 15 using the following command: svyset [weight = wtssall], strata(vstrat) psu(vpsu) singleunit(scaled).
As Page and Shapiro (1992, p. 422) explain, “the evidence indicates that ‘house effects’ are mostly limited to one specific area: ‘don’t knows’ … Thus it is generally safe to compare identical questions across survey organizations, so long as one excludes ‘don’t knows’”.
Typically, to compare logistic regression coefficients across samples, we would need to use heterogeneous choice models to control for the confounding effects of group differences in residual variation (Williams 2009). But as one reviewer pointed out, the GSS and online samples are assumed to represent the same population, and thus we should not expect differences in residual variation across the samples absent selection bias.
In addition to the figures, tables comparing the weighted GSS and unweighted online estimates can be found in the online supplementary materials.
If we reverse the comparison, and focus instead on the confidence intervals in the online samples, we find that 22 out of 56 (or 39%) excluded the GSS regression coefficient.
We also estimated supplementary models with both the GSS and online samples unweighted (see Tables C1–C7 in the online supplement). The substantive conclusions were unchanged.

References

Ansolabehere S, Rivers D (2013) Cooperative survey research. Annu Rev Polit Sci 16:307–329
Google Scholar
Baker R, Blumberg SJ, Brick JM, Couper MP, Courtright M, Dennis JM, Dillman D, Frankel MR, Garland P, Groves RM, Kennedy C, Krosnick JA, Lavrakas PJ, Lee S, Link M, Piekarski L, Rao K, Thomas RK, Zahs D (2010) Research synthesis: aAPOR report on online panels. Public Opin Q 74:711–781
Google Scholar
Baker R, Brick JM, Bates NA, Battaglia M, Couper MP, Dever JA, Gile KJ, Tourangeau R (2013) Summary report of the AAPOR task force on non-probability sampling. J Sur Stat Methodol 1:90–143
Google Scholar
Berk RA (1983) An introduction to sample selection bias in sociological data. Am Sociol Rev 48:386–398
Google Scholar
Berk RA, Ray SC (1982) Selection biases in sociological data. Soc Sci Res 11:352–398
Google Scholar
Berryessa CM (2018) The effects of psychiatric and “biological” labels on lay sentencing and punishment decisions. J Exp Criminol 14:241–256
Google Scholar
Bhutta C (2012) Not by the book: facebook as a sampling frame. Sociol Methods Res 41:57–88
Google Scholar
Blair J, Czaja RF, Blair EA (2013) Designing surveys: a guide to decisions and procedures. Sage, Thousand Oaks
Google Scholar
Bollen KA, Biemer PP, Karr AF, Tueller S, Berzofsky ME (2016) Are survey weights needed? A review of diagnostic tests in regression analysis. Annu Rev Stat Appl 3:375–392
Google Scholar
Brandon DM, Long JH, Loraas TM, Mueller-Phillips J, Vansant B (2013) Online instrument delivery and participant recruitment services: emerging opportunities for behavioral accounting research. Behav Res Account 26:1–23
Google Scholar
Brown EK, Socia KM (2017) Twenty-first century punitiveness: social sources of punitive American views reconsidered. J Quant Criminol 33:935–959
Google Scholar
Bullock JG, Green DP, Ha SE (2010) Yes, but what’s the mechanism? (don’t expect an easy answer). J Pers Soc Psychol 98:550–558
Google Scholar
Callegaro M, Villar A, Krosnick J, Yeager D (2014) A critical review of studies investigating the quality of data obtained with online panels. In: Callegaro M, Baker R, Bethlehem J, Goritz A, Krosnick J, Lavrakas P (eds) Online panel research: a data quality perspective. Wiley, New York, pp 23–53
Google Scholar
Callegaro M, Manfreda KL, Vehovar V (2015) Web survey methodology. Sage, Thousand Oaks
Google Scholar
Casey LS, Chandler J, Levine AS, Proctor A, Strolovitch DZ (2017) Intertemporal differences among MTurk workers: time-based sample variations and implications for online data collection. SAGE Open 7:1–15
Google Scholar
Chandler J, Shapiro D (2016) Conducting clinical research using crowdsourced convenience samples. Annu Rev Clin Psychol 12:53–81
Google Scholar
Chandler J, Mueller P, Paolacci G (2014) Nonnaïveté among Amazon Mechanical Turk workers: consequences and solutions for behavioral researchers. Behav Res Methods 46:112–130
Google Scholar
Chang L, Krosnick JA (2009) National surveys via RDD telephone interviewing versus the internet: comparing sample representativeness and response quality. Public Opin Q 73:641–678
Google Scholar
Couper MP (2011) The future of modes of data collection. Public Opin Q 75:889–908
Google Scholar
Denver M, Pickett JT, Bushway SD (2017) Criminal records and employment: a survey of experiences and attitudes in the United States. Justice Q 35:584–613
Google Scholar
Dum CP, Socia KM, Rydberg J (2017) Public support for emergency shelter housing interventions concerning stigmatized populations. Criminol Public Policy 16:835–877
Google Scholar
DuMouchel WH, Duncan GJ (1983) Using sample survey weights in multiple regression analyses of stratified samples. J Am Stat Assoc 75:535–543
Google Scholar
Elliott MR, Valliant R (2017) Inference for nonprobability samples. Stat Sci 32:249–264
Google Scholar
Elwert F, Winship C (2014) Endogenous selection bias: the problem of conditioning on a collider variable. Annu Rev Sociol 40:31–53
Google Scholar
Enns PK, Ramirez M (2018) Privatizing punishment: testing theories of public support for private prison and immigration detention facilities. Criminology 56:546–573
Google Scholar
ESOMAR 28: Surveymonkey Audience (2013) European Society for Opinion and Marketing Research, Amsterdam. https://www.esomar.org/
Gelman A (2007) Struggles with survey weighting and regression modeling. Stat Sci 22:153–164
Google Scholar
Gelman A, Carlin JB (2002) Postratification and weighting adjustments. In: Groves RM, Dillman DA, Eltinge JL, Little RJA (eds) Survey nonresponse. Wiley, New York, pp 289–302
Google Scholar
Gottlieb A (2017) The effect of message frames on public attitudes toward criminal justice reform for nonviolent offenses. Crime Delinq 63:636–656
Google Scholar
Greenland S (2003) Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiol 14:300–306
Google Scholar
Groves RM, Fowler FJ, Couper MP, Lepkowski J, Singer E, Tourangeau R (2009) Survey methodology, 2nd edn. Wiley, Hoboken
Google Scholar
Holbert RL, Shah DV, Kwak N (2004) Fear, authority, and justice: crime-related TV viewing and endorsements of capital punishment and gun ownership. Journal Mass Commun Q 81:343–363
Google Scholar
Horton JJ, Rand DG, Zeckhauser RJ (2011) The online laboratory: conducting experiments in a real labor market. Exp Econ 14:399–425
Google Scholar
Hox JJ, De Leeuw ED, Zijlmans EA (2015) Measurement equivalence in mixed mode surveys. Front Psychol 6:1–10
Google Scholar
Johnson D (2009) Anger about crime and support for punitive criminal justice policies. Punishm Soc 11:51–66
Google Scholar
Johnson D, Kuhns JB (2009) Striking out: race and support for police use of force. Justice Q 26:592–623
Google Scholar
Jones DN, Olderbak SG (2014) The associations among dark personalities and sexual tactics across different scenarios. J Interp Viol 29:1050–1070
Google Scholar
Keeter S, McGeeney K, Mercer A, Hatley N, Pattern E, Perrin A (2015) Coverage error in internet surveys. Pew Research Center, Washington. Retrieved from https://www.pewresearch.org/methods/2015/09/22/coverage-error-in-internet-surveys/
King RD, Wheelock D (2007) Group threat and social control: race, perceptions of minorities and the desire to punish. Soc Forces 85:1255–1280
Google Scholar
Lageson SE, McElrath S, Palmer KE (2018) Gendered public support for criminalizing “Revenge Porn”. Feminist Criminol 14:560–583
Google Scholar
Lehmann PS, Pickett JT (2017) Experience versus expectation: economic insecurity, the Great Recession, and support for the death penalty. Justice Q 34:873–902
Google Scholar
Levay KE, Freese J, Druckman JN (2016) The demographic and political composition of Mechanical Turk samples. Sage Open 6:1–17
Google Scholar
Little A, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York
Google Scholar
Mercer AW, Kreuter F, Keeter S, Stuart EA (2017) Theory and practice in nonprobability surveys: parallels between causal inference and survey inference. Public Opin Q 81:250–271
Google Scholar
Mercer A, Lau A, Kennedy C (2018) For weighting online opt-in samples, what matters most?. Pew Research Center, Washington
Google Scholar
Morgan SL, Winship C (2015) Counterfactuals and causal inference. Cambridge University Press, Oxford
Google Scholar
Mullinix KJ, Leeper TJ, Druckman JN, Free se J (2015) The generalizability of survey experiments. J Exp Pol Sci 2:109–138
Google Scholar
Nicolaas G, Calderwood L, Lynn P, Roberts C (2014) Web surveys for the general population: How, why and when?. National Centre for Research Methods, Southampton. Retrieved from http://eprints.ncrm.ac.uk/3309/3/GenPopWeb.pdf
Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251):aac4716
Google Scholar
Page BI, Shapiro RY (1992) The rational public: fifty years of trends in Americans’ policy preferences. Chicago University Press, Chicago
Google Scholar
Pasek J (2016) When will nonprobability surveys mirror probability surveys? Considering types of inference and weighting strategies as criteria for correspondence. Int J Public Opin Res 28:269–291
Google Scholar
Pasek J, Krosnick JA (2010) Measuring intent to participate and participation in the 2010 census and their correlates and trends: comparisons of RDD telephone and non–probability sample internet survey data. Statistical Research Division of the US Census Bureau, Washington. Retrieved from https://www.mod.gu.se/digitalAssets/1456/1456661_pasek-krosnick-mode-census.pdf
Peer E, Vosgerau J, Acquisti A (2014) Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behav Res Methods 46:1023–1031
Google Scholar
Peffley M, Hurwitz J (2007) Persuasion and resistance: race and the death penalty in America. Am J Pol Sci 51:996–1012
Google Scholar
Peytchev A (2009) Survey breakoff. Public Opin Q 73:74–97
Google Scholar
Peytchev A (2011) Breakoff and unit nonresponse across web surveys. J Off Stat 27:33–47
Google Scholar
Pfeffermann D (1993) The role of sampling weights when modeling survey data. Int Stat Rev 61:317–337
Google Scholar
Pickett JT (2016) On the social foundations for crimmigration: latino threat and support for expanded police powers. J Quant Criminol 32:103–132
Google Scholar
Pickett JT, Mancini C, Mears DP (2013) Vulnerable victims, monstrous offenders, and unmanageable risk: explaining public opinion on the social control of sex crime. Criminology 51:729–759
Google Scholar
Pickett JT, Cullen F, Bushway SD, Chiricos T, Alpert G (2018) The response rate test: nonresponse bias and the future of survey research in criminology and criminal justice. Criminologist 43:7–11
Google Scholar
Rivers D (2007) Sampling for web surveys. Joint Statistical Meetings, Salt Lake
Google Scholar
Roche SP, Pickett JT, Gertz M (2016) The scary world of online news? Internet news exposure and public attitdues toward crime and justice. J Quant Criminol 32:215–236
Google Scholar
Ross J, Irani L, Silberman M, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers? Shifting demographics in Mechanical Turk. In: Edwards K, Rodden T, Proceedings of the ACM conference on human factors in computing systems. ACM, New York
Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66:688
Google Scholar
Sackett PR, Yang H (2000) Correction for range restriction: an expanded typology. J Appl Psychol 85:112–118
Google Scholar
Shadish W, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin, Boston
Google Scholar
Shadish WR, Clark MH, Steiner PM (2008) Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. J Am Stat Assoc 103:1334–1344
Google Scholar
Sheehan KB, Pittman M (2016) Amazon’s Mechanical Turk for academics: The HIT handbook for social science research. Melvin and Leigh, Irvine
Google Scholar
Silver JR, Pickett JT (2015) Toward a better understanding of politicized policing attitudes: conflicted conservatism and support for police use of force. Criminology 53:650–676
Google Scholar
Silver JR, Silver E (2017) Why are conservatives more punitive than liberals? A moral foundations approach. Law Human Behav 41:258–272
Google Scholar
Simmons AD (2017) Cultivating support for punitive criminal justice policies: news sectors and the moderating effects of audience characteristics. Soc Forces 96:299–328
Google Scholar
Simmons AD, Bobo LD (2015) Can non-full-probability internet surveys yield useful data? A comparison with full-probability face-to-face surveys in the domain of race and social inequality attitudes. Sociol Methodol 45:357–387
Google Scholar
Solon G, Haider SJ, Wooldridge JM (2015) What are we weighting for? J Hum Resour 50:301–316
Google Scholar
Stewart N, Chandler J, Paolacci G (2017) Crowdsourcing samples in cognitive science. Trends Cogn Sci 21:736–748
Google Scholar
Tourangeau R, Yan T (2007) Sensitive questions in surveys. Psychol Bull 133(5):859
Google Scholar
Tourangeau R, Frederick G, Conrad FG, Couper MP (2013) The science of web surveys. Oxford University Press, Oxford
Google Scholar
Unnever JD, Cullen FT (2010) The social sources of Americans’ punitiveness: a test of three competing models. Criminology 48:99–129
Google Scholar
Unnever JD, Cullen FT, Jonson CL (2008) Race, racism, and support for capital punishment. Crime Justice 37:45–96
Google Scholar
Valliant R, Dever JA (2011) Estimating propensity adjustments for volunteer web surveys. Sociol Methods Res 40:105–137
Google Scholar
Vaughan TJ, Holleran LB, Silver J (2019) Applying moral foundations theory to the explanation of capital jurors’ sentencing decisions. Justice Q. https://doi.org/10.1080/07418825.2018.1537400
Article Google Scholar
Wang W, Rothschild D, Goel S, Gelman A (2015) Forecasting elections with non-representative polls. Int J Forecast 31:980–991
Google Scholar
Weinberg JD, Freese J, McElhattan D (2014) Comparing data characteristics and results of an online factorial survey between a population-based and a crowdsource-recruited sample. Sociol Sci 1:292–310
Google Scholar
Williams R (2009) Using heterogeneous choice models to compare logit and probit coefficients across groups. Sociol Methods Res 37:531–559
Google Scholar
Winship C, Radbill L (1994) Sampling weights and regression analysis. Soc Methods Res 23:230–257
Google Scholar
Yeager DS, Krosnick JA, Chang L, Javitz HS, Levendusky MS, Simpser A, Wang R (2011) Comparing the accuracy of RDD telephone surveys and internet surveys conducted with probability and non-probability samples. Public Opin Q 75:709–747
Google Scholar
Zhou H, Fishbach A (2016) The pitfall of experimenting on the web: how unattended selective attrition leads to surprising (yet false) research conclusions. J Pers Soc Psychol 111:493–504
Google Scholar

Download references

Acknowledgements

The authors thank Jasmine Silver, Sean Roche, Luzi Shi, Megan Denver, and Shawn Bushway for their help collecting data.

Author information

Authors and Affiliations

School of Criminal Justice, University at Albany, SUNY, 135 Western Avenue, Albany, NY, 12222, USA
Andrew J. Thompson & Justin T. Pickett

Authors

Andrew J. Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Justin T. Pickett
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew J. Thompson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 86 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thompson, A.J., Pickett, J.T. Are Relational Inferences from Crowdsourced and Opt-in Samples Generalizable? Comparing Criminal Justice Attitudes in the GSS and Five Online Samples. J Quant Criminol 36, 907–932 (2020). https://doi.org/10.1007/s10940-019-09436-7

Download citation

Published: 13 November 2019
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10940-019-09436-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Are Relational Inferences from Crowdsourced and Opt-in Samples Generalizable? Comparing Criminal Justice Attitudes in the GSS and Five Online Samples