Combining Scientific and Non-scientific Surveys to Improve Estimation and Reduce Costs

Sakshaug, Joseph W.; Wiśniowski, Arkadiusz; Perez Ruiz, Diego Andres; Blom, Annelies G.

doi:10.1007/978-3-030-54936-7_4

Combining Scientific and Non-scientific Surveys to Improve Estimation and Reduce Costs

Joseph W. Sakshaug¹⁷,
Arkadiusz Wiśniowski¹⁸,
Diego Andres Perez Ruiz¹⁹ &
…
Annelies G. Blom²⁰

Chapter
First Online: 10 August 2020

679 Accesses

Part of the book series: Computational Social Sciences ((CSS))

Abstract

Survey data collection costs have risen to a point where many survey researchers are abandoning large, expensive probability-based samples in favor of less expensive nonprobability samples. The empirical literature suggests this strategy may be unwise for many reasons, among them probability samples tend to outperform nonprobability samples on accuracy when assessed against population benchmarks. Nevertheless, the attractive cost properties and convenience of nonprobability samples suggest they are here to stay. But instead of forgoing probability sampling entirely, we consider a method of combining both probability and nonprobability samples in a way that exploits their strengths to overcome their weaknesses. Using Bayesian inference, we evaluate the use of nonprobability data as a supplement to probability-based estimations based on small probability samples. In a case study involving actual survey data, we show that specifying prior distributions using nonprobability data reduces variances and mean-squared errors considerably for estimates of two commonly used health variables, height and weight, compared to the probability-only sample estimates. We further show that these gains in efficiency yield expected cost savings up to 66% based on actual cost data from eight nonprobability surveys conducted by different commercial vendors and assumed cost data for a probability-based Internet panel. We conclude with a discussion of these findings, their implications for survey practice, and possible research extensions.

Electronic Supplementary Material: The online version of this chapter (https://doi.org/10.1007/10.1007/978-3-030-54936-7_4) contains supplementary material, which is available to authorized users.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Bootstrap methods have been used in many contexts and were originally proposed by Efron (1979). The general approach is to randomly draw subsamples with replacement from the full sample a large number of times and estimate the statistic of interest in each subsample before combining them using a bootstrap estimator.
2.
We assume the GIP per unit cost is higher than the per unit costs of the nonprobability surveys due to the interviewer-administered recruitment and setup costs of equipping the offline population. Further, we reason that, in practice, a high response rate would be desired for the small probability sample to minimize the risk of nonresponse bias in the sparse sample, for which extensive recruitment efforts may be needed.

References

AAPOR, Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys, 9th edn. (American Association for Public Opinion Research, 2016)
Google Scholar
S. Ansolabehere, D. Rivers, Cooperative survey research. Ann. Rev. Polit. Sci. 16, 307–329 (2013)
Article Google Scholar
R. Baker, J.M. Brick, N.A. Bates, M. Battaglia, M.P. Couper, J.A. Dever, K.J. Gile, R. Tourangeau, Summary report of the AAPOR task force on non-probability sampling. J. Surv. Stat. Methodol. 1(2), 90–143 (2013)
Article Google Scholar
T. Bayes, An essay towards solving a problem in the doctrine of chances. Philos. Trans. 53, 370–418 (1763)
Article Google Scholar
K.S. Berbaum, D.D. Dorfman, E.A. Franken, R.T. Caldwell, An empirical comparison of discrete ratings and subjective probability ratings. Acad. Radiol. 9(7), 756–763 (2002)
Article Google Scholar
A.G. Blom, C. Gathmann, U. Krieger, Setting up an online panel representative of the general population: the German internet panel. Field Methods 27(4), 391–408 (2015)
Article Google Scholar
A.G. Blom, J.M.E. Herzing, C. Cornesse, J.W. Sakshaug, U. Krieger, D. Bossert, Does the recruitment of offline households increase the sample representativeness of probability-based online panels? evidence from the German internet panel. Soc. Sci. Comput. Rev. 35(4), 498–520 (2017)
Article Google Scholar
A.G. Blom, D. Ackermann-Piek, S.C. Helmschrott, C. Cornesse, J.W. Sakshaug, The representativeness of online panels: coverage, sampling and weighting, in Paper Presented at the General Online Research Conference (2017)
Google Scholar
D. Briggs, D. Fecht, K. De Hoogh, Census data issues for epidemiology and health risk assessment: experiences from the small area health statistics unit. J. R. Stat. Soc. Ser. A (Stat. Soc.) 170(2), 355–378 (2007)
Google Scholar
L. Chang, J.A. Krosnick, National surveys via RDD telephone interviewing versus the internet comparing sample representativeness and response quality. Public Opin. Q. 73(4), 641–678 (2009)
Article Google Scholar
C. Cornesse, A.G Blom, D. Dutwin, J.A. Krosnick, E.D. De Leeuw, S. Legleye, J. Pasek, D. Pennay, B. Phillips, J. W. Sakshaug, B. Struminskaya, A. Wenz, A Review of Conceptual Approaches and Empirical Evidence on Probability and Nonprobability Sample Survey Research. J. Surv. Stat. Methodol. 8(1), 4–36 (2020)
Google Scholar
B.O. Daponte, J.B. Kadane, L.J. Wolfson, Bayesian demography: projecting the Iraqi Kurdish population, 1977–1990. J. Am. Stat. Assoc. 92(440), 1256–1267 (1997)
Google Scholar
D. Dutwin, T.D. Buskirk, Apples to oranges or gala versus golden delicious? comparing data quality of nonprobability internet samples to low response rate probability samples. Public Opin. Q. 81(S1), 213–239 (2017)
Article Google Scholar
B. Efron, Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1–26 (1979)
Article Google Scholar
A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin, Bayesian Data Analysis, Vol. 3 (Chapman & Hall/CRC, Boca Raton, 2013)
Google Scholar
S. Lee, Propensity score adjustment as a weighting scheme for volunteer panel web surveys. J. Off. Stat. 22(2), 329 (2006)
Google Scholar
S. Lee, R. Valliant, Estimation for volunteer panel web surveys using propensity score adjustment and calibration adjustment. Sociol. Methods Res. 37(3), 319–343 (2009)
Article Google Scholar
N. Malhotra, J.A. Krosnick, The effect of survey mode and sampling on inferences about political attitudes and behavior: comparing the 2000 and 2004 ANES to internet surveys with nonprobability samples. Polit. Anal. 15, 286–323 (2007)
Article Google Scholar
S. Marchetti, C. Giusti, M. Pratesi, The use of twitter data to improve small area estimates of households? share of food consumption expenditure in italy. AStA Wirtschafts-und Sozialstatistisches Archiv 10(2–3), 79–93 (2016)
Article Google Scholar
A.H. Murphy, H. Daan, Impacts of feedback and experience on the quality of subjective probability forecasts. Comparison of results from the first and second years of the Zierikzee experiment. Mon. Weather Rev. 112(3), 413–423 (1984)
Google Scholar
A. O’Hagan, C.E. Buck, A. Daneshkhah, J.R. Eiser, P.H. Garthwaite, D.J. Jenkinson, J.E. Oakley, T. Rakow, Uncertain Judgments Eliciting Expert’s Probabilities (Wiley, Chichester, 2006)
Book Google Scholar
J. Pasek, When will nonprobability surveys mirror probability surveys? considering types of inference and weighting strategies as criteria for correspondence. Int. J. Public Opin. Res. 28(2), 269–291 (2016)
Article Google Scholar
D.W. Pennay, D. Neiger, P.J. Lavrakas, K.A. Borg, S. Mission, N. Honey, Australian online panels benchmarking study, in Presented at the 69th Annual Conference of the World Association for Public Opinion Research, Austin, May (2016)
Google Scholar
A.T. Porter, S.H. Holan, C.K. Wikle, N. Cressie, Spatial fay–herriot models for small area estimation with functional covariates. Spatial Stat. 10, 27–42 (2014)
Article Google Scholar
S.S. Qian, K.H. Reckhow, Modeling phosphorus trapping in wetlands using nonparametric Bayesian regression. Water Res. Res. 34(7), 1745–1754 (1998)
Article Google Scholar
R Core Team, R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, 2016)
Google Scholar
J.N. Rao, Small-Area Estimation. (Wiley Online Library, Hoboken, 2003)
Google Scholar
J. Raymer, A. Wiśniowski, J.J. Forster, P.W. Smith, J. Bijak, Integrated modeling of European migration. J. Am. Stat. Assoc. 108(503), 801–819 (2013)
Article Google Scholar
S. Renooij, C. Witteman, Talking probabilities: communicating probabilistic information with words and numbers. Int. J. Approx. Reason. 22(3), 169–194 (1999)
Article Google Scholar
D. Rivers, Sampling for web surveys, in Joint Statistical Meetings (2007)
Google Scholar
D. Rivers, D. Bailey, Inference from matched samples in the 2008 us national elections, in Proceedings of the Joint Statistical Meetings, Vol. 1, pp. 627–39 (YouGov/Polimetrix Palo Alto, 2009)
Google Scholar
J.W. Sakshaug, A. Wiśniowski, D.A. Perez-Ruiz, A.G. Blom, Supplementing small probability samples with nonprobability samples: a Bayesian approach. J. Off. Stat. 35(3), 653–681 (2019)
Article Google Scholar
C.P. Schmertmann, S.M. Cavenaghi, R.M. Assunção, J.E. Potter, Bayes plus brass: estimating total fertility for many small areas from sparse census data. Popul. Stud. 67(3), 255–273 (2013)
Article Google Scholar
R. Valliant, J.A. Dever, Estimating propensity adjustments for volunteer web surveys. Sociol. Methods Res. 40(1), 105–137 (2011)
Article Google Scholar
L.C. van der Gaag, S. Renooij, C.L.M. Witteman, B.M.P. Aleman, B.G. Taal, Probabilities for a probabilistic network: a case study in oesophageal cancer. Artif. Intell. Med. 25(2), 123–148 (2002)
Article Google Scholar
M.D. Vescio, R.L. Thompson, Forecaster?s forum: subjective tornado probability forecasts in severe weather watches. Weather Forecast 16(1), 192–195 (2001)
Article Google Scholar
A. Wiśniowski, J.W. Sakshaug, D.A. Perez-Ruiz, A.G. Blom, Integrating probability and nonprobability samples for survey inference. J. Surv. Stat. Methodol. 8, 120–147 (2020)
Article Google Scholar
D.S. Yeager, J.A. Krosnick, L. Chang, H.S. Javitz, M.S. Levendusky, A. Simpser, R. Wang, Comparing the accuracy of RDD telephone surveys and internet surveys conducted with probability and non-probability samples. Public Opin. Q. nfr020 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Mannheim, Ludwig Maximilian University of Munich, and Institute for Employment Research, Nuremberg, Germany
Joseph W. Sakshaug
Department of Social Statistics, University of Manchester, Manchester, UK
Arkadiusz Wiśniowski
School of Mathematics, University of Manchester, Manchester, UK
Diego Andres Perez Ruiz
School of Social Sciences, University of Mannheim, Mannheim, Germany
Annelies G. Blom

Authors

Joseph W. Sakshaug
View author publications
You can also search for this author in PubMed Google Scholar
Arkadiusz Wiśniowski
View author publications
You can also search for this author in PubMed Google Scholar
Diego Andres Perez Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Annelies G. Blom
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joseph W. Sakshaug .

Editor information

Editors and Affiliations

Department of Statistics, Eötvös Loránd University, Budapest, Hungary
Tamás Rudas
Department of Sociology Gáspár Károli, University of the Reformed Church in Hungary, Centre for Social Sciences, Hungarian Academy of Sciences Centre of Excellence, Budapest, Hungary
Gábor Péli

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sakshaug, J.W., Wiśniowski, A., Perez Ruiz, D.A., Blom, A.G. (2021). Combining Scientific and Non-scientific Surveys to Improve Estimation and Reduce Costs. In: Rudas, T., Péli, G. (eds) Pathways Between Social Science and Computational Social Science. Computational Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-54936-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-54936-7_4
Published: 10 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54935-0
Online ISBN: 978-3-030-54936-7
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics