Abstract
This article attempts to offer some perspectives on Bayesian inference for finite population quantities when the units in the population are assumed to exhibit complex dependencies. Beginning with an overview of Bayesian hierarchical models, including some that yield design-based Horvitz-Thompson estimators, the article proceeds to introduce dependence in finite populations and sets out inferential frameworks for ignorable and nonignorable responses. Multivariate dependencies using graphical models and spatial processes are discussed and some salient features of two recent analyses for spatial finite populations are presented.
Similar content being viewed by others
References
Arora, V. and Lahiri, P. (1997). On the superiority of the bayesian method over the blup in small area estimation problems. Stat. Sin. 7, 1053–1063. http://www.jstor.org/stable/24306172.
Arora, V., Lahiri, P. and Mukherjee, K. (1997). Empirical bayes estimation of finite population means from complex surveys. J. Am. Stat. Assoc. 92, 1555–1562.
Banerjee, S., Carlin, B.P. and Gelfand, A.E. (2014). Hierarchical Modeling and Analysis for Spatial Data, 2nd edn. Chapman & Hall/CRC, Boca Raton, FL.
Basu, D. (1971). An essay on the logical foundations of survey sampling, part 1. Holt, Rinehart and Winston, Toronto, p 203–242.
Boyle, D., King, A., Kourakos, G., et al. (2012). Groundwater Nitrate Occurrence, Technical Report 4. Tech. rep., Center for Watershed Sciences, University of California, Davis, Davis, CA.
Bruno, F., Cocchi, D. and Vagheggini, A. (2013). Finite population properties of individual predictors based on spatial pattern. Environ. Ecol. Stat. 20, 467–494.
Chan-Golston, A.M., Banerjee, S. and Handcock, M.S. (2020). Bayesian inference for finite populations under spatial process settings. Environmetrics 31, e2606. https://doi.org/10.1002/env.2606, https://onlinelibrary.wiley.com/doi/abs/10.1002/env.2606.
Chan-Golston, A.M., Banerjee, S., Belin, T.R., et al. (2022). Bayesian finite-population inference with spatially correlated measurements. Jpn. J. Stat. Data Sci. 5, 407–430. https://doi.org/10.1007/s42081-022-00178-8.
Cicchitelli, G. and Montanari, G.E. (2012). Model-assisted estimation of a spatial population mean. Int. Stat. Rev. 80, 111–126.
Clayton, D. and Kaldor, J. (1987). Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics 43, 671–681.
Cochran, W.G. (1977). Sampling Techniques, 3rd edn. John Wiley & Sons, Hoboken, NJ.
Cox, D. and Wermuth, N. (1996). Multivariate Dependencies. Chapman & Hall/CRC, Boca Raton, FL.
Cox, D.R. and Wermuth, N. (1993). Linear dependencies represented by chain graphs. Stat. Sci. 8, 204 – 218. https://doi.org/10.1214/ss/1177010887.
Cressie, N. and Wikle, C.K. (2011). Statistics for Spatio-Temporal Data. John Wiley & Sons, Hoboken, NJ.
Datta, G.S. and Ghosh, M. (1991). Bayesian prediction in linear models: applications to small area estimation. Ann. Stat., pp. 1748–1770.
de Valpine, P., Turek, D., Paciorek, C., et al. (2017). Programming with models: writing statistical algorithms for general model structures with NIMBLE. J. Comput. Graph. Stat. 26, 403–413. https://doi.org/10.1080/10618600.2016.1172487.
de Valpine, P., Paciorek, C., Turek, D., et al. (2023). NIMBLE: MCMC, Particle Filtering, and Programmable Hierarchical Modeling. https://doi.org/10.5281/zenodo.1211190, https://cran.r-project.org/package=nimble, R package version 1.0.1.
Diggle, P.J., Menezes R. and Su T.L. (2010). Geostatistical inference under preferential sampling. J. R. Stat. Soc.: Ser. C 59, 191–232.
Di Zio, M., Liseo, B. and Ranalli, M.G. (2023) Bayesian Ideas in Survey Sampling: The Legacy of Basu. Sankhya A. https://doi.org/10.1007/s13171-023-00327-5
Ericson, W.A. (1969). Subjective Bayesian models in sampling finite populations. J. R. Stat. Soc. Ser. B 31, 195–233.
Finley, A.O., Banerjee S. and MacFarlane D.W. (2011). A hierarchical model for quantifying forest variables over large heterogeneous landscapes with uncertain forest areas. J. Am. Stat. Assoc. 106, 31–48. https://doi.org/10.1198/jasa.2011.ap09653, pMID: 26139950.
Finley, A.O., Andersen, H.E., Babcock, C., et al. (in press) Models to support forest inventory and small area estimation using sparsely sampled lidar: a case study involving g-liht lidar in tanana, alaska. J. Agric. Biol. Environ. Stat.
Gelfand, A.E. and Banerjee S. (2010). Multivariate spatial process models. In: Handbook of Spatial Statistics, (A. Gelfand, P. Diggle and M. Fuentes, et al eds.). CRC Press, Boca Raton, FL, p. 495–516.
Gelfand, A.E. and Ghosh, S.K. (1998). Model choice: a minimum posterior predictive loss approach. Biometrika 85, 1–11.
Gelman, A. (2007). Struggles with survey weighting and regression modeling. Stat. Sci. 22, 153–164.
Gelman, A., Carlin, J.B., Stern, H.S., et al. (2014). Bayesian Data Analysis, 3rd edn. Chapman & Hall/CRC, Boca Raton, FL.
Genton, M.G. and Kleiber, W. (2015). Cross-covariance functions for multivariate geostatistics. Stat. Sci., pp. 147–163.
Ghosh, M. (2012). Finite population sampling: a model-design synthesis. Stat. Transit. 13, 235–242.
Ghosh, M. and Meeden, G. (1997). Bayesian Methods for Finite Population Sampling. Chapman & Hall, London.
Ghosh, M. and Rao, J.N.K. (1994). Small area estimation: an appraisal. Stat. Sci. 9, 55–93.
Ghosh, M. and Sinha, B.K. (1990). On the consistency between model-and design-based estimators in survey sampling. Commun. Stat. - Theory Methods 19, 689–702. https://doi.org/10.1080/03610929008830226.
Ghosh, M., Natarajan, K., Stroud, T.W.F., et al. (1998). Generalized linear models for small-area estimation. J. Am. Stat. Assoc. 93, 273–282.
Gneiting, T. and Raftery, A.E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359–378.
Guhaniyogi, R. and Banerjee, S. (2018). Meta-kriging: scalable bayesian modeling and inference for massive spatial datasets. Technometrics 60, 430–444.
Harter, T., Dzurella, K., Kourakos, G., et al. (2017). Nitrogen Fertilizer Loading to Groundwater in the Central Valley, Final Report to the Fertilizer Research Education Program Projects 11-0301 and 15-0454. Tech. rep., California Department of Food and Agriculture and University of California Davis, Davis, CA.
Horvitz, D.G. and Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663–685.
Kalton, G. (2019). Developments in survey research over the past 60 years: a personal perspective. Int. Stat. Rev. 87, S10–S30. https://doi.org/10.1111/insr.12287, https://onlinelibrary.wiley.com/doi/abs/10.1111/insr.12287.
Kish, L. (1965). Survey Sampling. John Wiley & Sons, Inc., Hoboken, New Jersey.
Kish, L. (1995). The hundred years’ wars of survey sampling. Stat. in Transition 2, 813–830.
Little, R. and Rubin, D. (2002). Statistical Analysis with Missing Data. John Wiley & Sons, Inc., Hoboken, New Jersey.
Little, R.J. (1982). Models for nonresponse in sample surveys. J. Am. Stat. Assoc. 77, 237–250.
Little, R.J. (2004). To model or not to model? Competing modes of inference for finite population sampling. J. Am. Stat. Assoc. 99, 546–556.
Lunn, D., Spiegelhalter, D., Thomas, A., et al. (2009). The bugs project: evolution, critique and future directions. Stat. Med. 28, 3049–3067. https://doi.org/10.1002/sim.3680, https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.3680.
Malec, D. and Sedransk, J. (1985). Bayesian inference for finite population parameters in multistage cluster sampling. J. Am. Stat. Assoc. 80, 897–902.
Narain, R. (1951). On sampling without replacement with varying probabilities. Journal of the Indian Society of Agricultural Statistics 3, 169–175.
Rao, J.N.K. (2011). Impact of frequentist and Bayesian methods on survey sampling practice: a selective appraisal. Stat. Sci. 26, 240–256. https://doi.org/10.1214/10-STS346.
Rao, J.N.K. and Ghangurde, P.D. (1972). Bayesian optimization in sampling finite populations. J. Am. Stat. Assoc. 67, 439–443. https://doi.org/10.1080/01621459.1972.10482406, https://www.tandfonline.com/doi/abs/10.1080/01621459.1972.10482406.
Rao, J.N.K. and Molina, I. (2015). Small Area Estimation, 2nd edn. John Wiley & Sons, Hoboken, NJ.
Ripley, B.D. (2004). Spatial Statistics. John Wiley & Sons, Hoboken, NJ.
Rubin, D.B. (1976). Inference and missing data. Biometrika 63, 581–592.
Scott, A. and Smith, T.M.F. (1969). Estimation in multi-stage surveys. J. Am. Stat. Assoc. 64, 830–840.
Stan Development Team (2024) RStan: the R interface to Stan. https://mc-stan.org/, r package version 2.32.5.
Tang, G., Little, R.J. and Raghunathan, T.E. (2003). Analysis of multivariate missing data with nonignorable nonresponse. Biometrika 90, 747–764.
Vehtari, A., Gelman, A. and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 27, 1413–1432.
Ver Hoef, J. (2002). Sampling and geostatistics for spatial data. Écoscience 9, 152–161.
Watanabe, S. (2013). A widely applicable bayesian information criterion. J. Mach. Learn. Res. 14, 867–897. http://jmlr.org/papers/v14/watanabe13a.html.
Zhang, L. and Banerjee, S. (2022). Spatial factor modeling: a bayesian matrix-normal approach for misaligned data. Biometrics 78, 560–573. https://doi.org/10.1111/biom.13452, https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.13452.
Acknowledgements
The author wishes to thank the editors and an anonymous referee for valuable feedback. The author is especially grateful to Professors Roderick J. Little and Trivellore Raghunathan from the University of Michigan, Ann Arbor, U.S.A., and Professor J.N.K. Rao from Carleton University, Ottawa, Canada, for insightful discussions on inference for finite populations. The work of the author has been supported, in part, by the National Science Foundation (NSF) from grants NSF/DMS 1916349 and NSF/DMS 2113778, by the National Institute of Environmental Health Sciences (NIEHS) from grants R01ES030210 and R01ES027027 and by the National Institute of General Medical Science from grant R01GM148761.
Funding
The work of the author has been supported, in part, by the National Science Foundation (NSF) from grants NSF/DMS 1916349 and NSF/DMS 2113778, by the National Institute of Environmental Health Sciences (NIEHS) from grants R01ES030210 and R01ES027027 and by the National Institute of General Medical Science from grant R01GM148761.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that there are no financial or non-financial conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Banerjee, S. Finite Population Survey Sampling: An Unapologetic Bayesian Perspective. Sankhya A (2024). https://doi.org/10.1007/s13171-024-00348-8
Received:
Published:
DOI: https://doi.org/10.1007/s13171-024-00348-8