Abstract
Survey data collection costs have risen to a point where many survey researchers are abandoning large, expensive probability-based samples in favor of less expensive nonprobability samples. The empirical literature suggests this strategy may be unwise for many reasons, among them probability samples tend to outperform nonprobability samples on accuracy when assessed against population benchmarks. Nevertheless, the attractive cost properties and convenience of nonprobability samples suggest they are here to stay. But instead of forgoing probability sampling entirely, we consider a method of combining both probability and nonprobability samples in a way that exploits their strengths to overcome their weaknesses. Using Bayesian inference, we evaluate the use of nonprobability data as a supplement to probability-based estimations based on small probability samples. In a case study involving actual survey data, we show that specifying prior distributions using nonprobability data reduces variances and mean-squared errors considerably for estimates of two commonly used health variables, height and weight, compared to the probability-only sample estimates. We further show that these gains in efficiency yield expected cost savings up to 66% based on actual cost data from eight nonprobability surveys conducted by different commercial vendors and assumed cost data for a probability-based Internet panel. We conclude with a discussion of these findings, their implications for survey practice, and possible research extensions.
Electronic Supplementary Material: The online version of this chapter (https://doi.org/10.1007/10.1007/978-3-030-54936-7_4) contains supplementary material, which is available to authorized users.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Bootstrap methods have been used in many contexts and were originally proposed by Efron (1979). The general approach is to randomly draw subsamples with replacement from the full sample a large number of times and estimate the statistic of interest in each subsample before combining them using a bootstrap estimator.
- 2.
We assume the GIP per unit cost is higher than the per unit costs of the nonprobability surveys due to the interviewer-administered recruitment and setup costs of equipping the offline population. Further, we reason that, in practice, a high response rate would be desired for the small probability sample to minimize the risk of nonresponse bias in the sparse sample, for which extensive recruitment efforts may be needed.
References
AAPOR, Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys, 9th edn. (American Association for Public Opinion Research, 2016)
S. Ansolabehere, D. Rivers, Cooperative survey research. Ann. Rev. Polit. Sci. 16, 307–329 (2013)
R. Baker, J.M. Brick, N.A. Bates, M. Battaglia, M.P. Couper, J.A. Dever, K.J. Gile, R. Tourangeau, Summary report of the AAPOR task force on non-probability sampling. J. Surv. Stat. Methodol. 1(2), 90–143 (2013)
T. Bayes, An essay towards solving a problem in the doctrine of chances. Philos. Trans. 53, 370–418 (1763)
K.S. Berbaum, D.D. Dorfman, E.A. Franken, R.T. Caldwell, An empirical comparison of discrete ratings and subjective probability ratings. Acad. Radiol. 9(7), 756–763 (2002)
A.G. Blom, C. Gathmann, U. Krieger, Setting up an online panel representative of the general population: the German internet panel. Field Methods 27(4), 391–408 (2015)
A.G. Blom, J.M.E. Herzing, C. Cornesse, J.W. Sakshaug, U. Krieger, D. Bossert, Does the recruitment of offline households increase the sample representativeness of probability-based online panels? evidence from the German internet panel. Soc. Sci. Comput. Rev. 35(4), 498–520 (2017)
A.G. Blom, D. Ackermann-Piek, S.C. Helmschrott, C. Cornesse, J.W. Sakshaug, The representativeness of online panels: coverage, sampling and weighting, in Paper Presented at the General Online Research Conference (2017)
D. Briggs, D. Fecht, K. De Hoogh, Census data issues for epidemiology and health risk assessment: experiences from the small area health statistics unit. J. R. Stat. Soc. Ser. A (Stat. Soc.) 170(2), 355–378 (2007)
L. Chang, J.A. Krosnick, National surveys via RDD telephone interviewing versus the internet comparing sample representativeness and response quality. Public Opin. Q. 73(4), 641–678 (2009)
C. Cornesse, A.G Blom, D. Dutwin, J.A. Krosnick, E.D. De Leeuw, S. Legleye, J. Pasek, D. Pennay, B. Phillips, J. W. Sakshaug, B. Struminskaya, A. Wenz, A Review of Conceptual Approaches and Empirical Evidence on Probability and Nonprobability Sample Survey Research. J. Surv. Stat. Methodol. 8(1), 4–36 (2020)
B.O. Daponte, J.B. Kadane, L.J. Wolfson, Bayesian demography: projecting the Iraqi Kurdish population, 1977–1990. J. Am. Stat. Assoc. 92(440), 1256–1267 (1997)
D. Dutwin, T.D. Buskirk, Apples to oranges or gala versus golden delicious? comparing data quality of nonprobability internet samples to low response rate probability samples. Public Opin. Q. 81(S1), 213–239 (2017)
B. Efron, Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1–26 (1979)
A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin, Bayesian Data Analysis, Vol. 3 (Chapman & Hall/CRC, Boca Raton, 2013)
S. Lee, Propensity score adjustment as a weighting scheme for volunteer panel web surveys. J. Off. Stat. 22(2), 329 (2006)
S. Lee, R. Valliant, Estimation for volunteer panel web surveys using propensity score adjustment and calibration adjustment. Sociol. Methods Res. 37(3), 319–343 (2009)
N. Malhotra, J.A. Krosnick, The effect of survey mode and sampling on inferences about political attitudes and behavior: comparing the 2000 and 2004 ANES to internet surveys with nonprobability samples. Polit. Anal. 15, 286–323 (2007)
S. Marchetti, C. Giusti, M. Pratesi, The use of twitter data to improve small area estimates of households? share of food consumption expenditure in italy. AStA Wirtschafts-und Sozialstatistisches Archiv 10(2–3), 79–93 (2016)
A.H. Murphy, H. Daan, Impacts of feedback and experience on the quality of subjective probability forecasts. Comparison of results from the first and second years of the Zierikzee experiment. Mon. Weather Rev. 112(3), 413–423 (1984)
A. O’Hagan, C.E. Buck, A. Daneshkhah, J.R. Eiser, P.H. Garthwaite, D.J. Jenkinson, J.E. Oakley, T. Rakow, Uncertain Judgments Eliciting Expert’s Probabilities (Wiley, Chichester, 2006)
J. Pasek, When will nonprobability surveys mirror probability surveys? considering types of inference and weighting strategies as criteria for correspondence. Int. J. Public Opin. Res. 28(2), 269–291 (2016)
D.W. Pennay, D. Neiger, P.J. Lavrakas, K.A. Borg, S. Mission, N. Honey, Australian online panels benchmarking study, in Presented at the 69th Annual Conference of the World Association for Public Opinion Research, Austin, May (2016)
A.T. Porter, S.H. Holan, C.K. Wikle, N. Cressie, Spatial fay–herriot models for small area estimation with functional covariates. Spatial Stat. 10, 27–42 (2014)
S.S. Qian, K.H. Reckhow, Modeling phosphorus trapping in wetlands using nonparametric Bayesian regression. Water Res. Res. 34(7), 1745–1754 (1998)
R Core Team, R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, 2016)
J.N. Rao, Small-Area Estimation. (Wiley Online Library, Hoboken, 2003)
J. Raymer, A. Wiśniowski, J.J. Forster, P.W. Smith, J. Bijak, Integrated modeling of European migration. J. Am. Stat. Assoc. 108(503), 801–819 (2013)
S. Renooij, C. Witteman, Talking probabilities: communicating probabilistic information with words and numbers. Int. J. Approx. Reason. 22(3), 169–194 (1999)
D. Rivers, Sampling for web surveys, in Joint Statistical Meetings (2007)
D. Rivers, D. Bailey, Inference from matched samples in the 2008 us national elections, in Proceedings of the Joint Statistical Meetings, Vol. 1, pp. 627–39 (YouGov/Polimetrix Palo Alto, 2009)
J.W. Sakshaug, A. Wiśniowski, D.A. Perez-Ruiz, A.G. Blom, Supplementing small probability samples with nonprobability samples: a Bayesian approach. J. Off. Stat. 35(3), 653–681 (2019)
C.P. Schmertmann, S.M. Cavenaghi, R.M. Assunção, J.E. Potter, Bayes plus brass: estimating total fertility for many small areas from sparse census data. Popul. Stud. 67(3), 255–273 (2013)
R. Valliant, J.A. Dever, Estimating propensity adjustments for volunteer web surveys. Sociol. Methods Res. 40(1), 105–137 (2011)
L.C. van der Gaag, S. Renooij, C.L.M. Witteman, B.M.P. Aleman, B.G. Taal, Probabilities for a probabilistic network: a case study in oesophageal cancer. Artif. Intell. Med. 25(2), 123–148 (2002)
M.D. Vescio, R.L. Thompson, Forecaster?s forum: subjective tornado probability forecasts in severe weather watches. Weather Forecast 16(1), 192–195 (2001)
A. Wiśniowski, J.W. Sakshaug, D.A. Perez-Ruiz, A.G. Blom, Integrating probability and nonprobability samples for survey inference. J. Surv. Stat. Methodol. 8, 120–147 (2020)
D.S. Yeager, J.A. Krosnick, L. Chang, H.S. Javitz, M.S. Levendusky, A. Simpser, R. Wang, Comparing the accuracy of RDD telephone surveys and internet surveys conducted with probability and non-probability samples. Public Opin. Q. nfr020 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Sakshaug, J.W., Wiśniowski, A., Perez Ruiz, D.A., Blom, A.G. (2021). Combining Scientific and Non-scientific Surveys to Improve Estimation and Reduce Costs. In: Rudas, T., Péli, G. (eds) Pathways Between Social Science and Computational Social Science. Computational Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-54936-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-54936-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54935-0
Online ISBN: 978-3-030-54936-7
eBook Packages: Social SciencesSocial Sciences (R0)