Skip to main content
Log in

Finite Population Survey Sampling: An Unapologetic Bayesian Perspective

  • Published:
Sankhya A Aims and scope Submit manuscript

Abstract

This article attempts to offer some perspectives on Bayesian inference for finite population quantities when the units in the population are assumed to exhibit complex dependencies. Beginning with an overview of Bayesian hierarchical models, including some that yield design-based Horvitz-Thompson estimators, the article proceeds to introduce dependence in finite populations and sets out inferential frameworks for ignorable and nonignorable responses. Multivariate dependencies using graphical models and spatial processes are discussed and some salient features of two recent analyses for spatial finite populations are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Arora, V. and Lahiri, P. (1997). On the superiority of the bayesian method over the blup in small area estimation problems. Stat. Sin. 7, 1053–1063. http://www.jstor.org/stable/24306172.

  • Arora, V., Lahiri, P. and Mukherjee, K. (1997). Empirical bayes estimation of finite population means from complex surveys. J. Am. Stat. Assoc. 92, 1555–1562.

    Article  MathSciNet  Google Scholar 

  • Banerjee, S., Carlin, B.P. and Gelfand, A.E. (2014). Hierarchical Modeling and Analysis for Spatial Data, 2nd edn. Chapman & Hall/CRC, Boca Raton, FL.

  • Basu, D. (1971). An essay on the logical foundations of survey sampling, part 1. Holt, Rinehart and Winston, Toronto, p 203–242.

    Google Scholar 

  • Boyle, D., King, A., Kourakos, G., et al. (2012). Groundwater Nitrate Occurrence, Technical Report 4. Tech. rep., Center for Watershed Sciences, University of California, Davis, Davis, CA.

    Google Scholar 

  • Bruno, F., Cocchi, D. and Vagheggini, A. (2013). Finite population properties of individual predictors based on spatial pattern. Environ. Ecol. Stat. 20, 467–494.

    Article  MathSciNet  Google Scholar 

  • Chan-Golston, A.M., Banerjee, S. and Handcock, M.S. (2020). Bayesian inference for finite populations under spatial process settings. Environmetrics 31, e2606. https://doi.org/10.1002/env.2606, https://onlinelibrary.wiley.com/doi/abs/10.1002/env.2606.

    Article  MathSciNet  Google Scholar 

  • Chan-Golston, A.M., Banerjee, S., Belin, T.R., et al. (2022). Bayesian finite-population inference with spatially correlated measurements. Jpn. J. Stat. Data Sci. 5, 407–430. https://doi.org/10.1007/s42081-022-00178-8.

    Article  MathSciNet  Google Scholar 

  • Cicchitelli, G. and Montanari, G.E. (2012). Model-assisted estimation of a spatial population mean. Int. Stat. Rev. 80, 111–126.

    Article  MathSciNet  Google Scholar 

  • Clayton, D. and Kaldor, J. (1987). Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics 43, 671–681.

    Article  Google Scholar 

  • Cochran, W.G. (1977). Sampling Techniques, 3rd edn. John Wiley & Sons, Hoboken, NJ.

    Google Scholar 

  • Cox, D. and Wermuth, N. (1996). Multivariate Dependencies. Chapman & Hall/CRC, Boca Raton, FL.

    Article  MathSciNet  Google Scholar 

  • Cox, D.R. and Wermuth, N. (1993). Linear dependencies represented by chain graphs. Stat. Sci. 8, 204 – 218. https://doi.org/10.1214/ss/1177010887.

    Google Scholar 

  • Cressie, N. and Wikle, C.K. (2011). Statistics for Spatio-Temporal Data. John Wiley & Sons, Hoboken, NJ.

    Google Scholar 

  • Datta, G.S. and Ghosh, M. (1991). Bayesian prediction in linear models: applications to small area estimation. Ann. Stat., pp. 1748–1770.

  • de Valpine, P., Turek, D., Paciorek, C., et al. (2017). Programming with models: writing statistical algorithms for general model structures with NIMBLE. J. Comput. Graph. Stat. 26, 403–413. https://doi.org/10.1080/10618600.2016.1172487.

  • de Valpine, P., Paciorek, C., Turek, D., et al. (2023). NIMBLE: MCMC, Particle Filtering, and Programmable Hierarchical Modeling. https://doi.org/10.5281/zenodo.1211190, https://cran.r-project.org/package=nimble, R package version 1.0.1.

    Article  MathSciNet  Google Scholar 

  • Diggle, P.J., Menezes R. and Su T.L. (2010). Geostatistical inference under preferential sampling. J. R. Stat. Soc.: Ser. C 59, 191–232.

  • Di Zio, M., Liseo, B. and Ranalli, M.G. (2023) Bayesian Ideas in Survey Sampling: The Legacy of Basu. Sankhya A. https://doi.org/10.1007/s13171-023-00327-5

  • Ericson, W.A. (1969). Subjective Bayesian models in sampling finite populations. J. R. Stat. Soc. Ser. B 31, 195–233.

    MathSciNet  Google Scholar 

  • Finley, A.O., Banerjee S. and MacFarlane D.W. (2011). A hierarchical model for quantifying forest variables over large heterogeneous landscapes with uncertain forest areas. J. Am. Stat. Assoc. 106, 31–48. https://doi.org/10.1198/jasa.2011.ap09653, pMID: 26139950.

  • Finley, A.O., Andersen, H.E., Babcock, C., et al. (in press) Models to support forest inventory and small area estimation using sparsely sampled lidar: a case study involving g-liht lidar in tanana, alaska. J. Agric. Biol. Environ. Stat.

  • Gelfand, A.E. and Banerjee S. (2010). Multivariate spatial process models. In: Handbook of Spatial Statistics, (A. Gelfand, P. Diggle and M. Fuentes, et al eds.). CRC Press, Boca Raton, FL, p. 495–516.

    Chapter  Google Scholar 

  • Gelfand, A.E. and Ghosh, S.K. (1998). Model choice: a minimum posterior predictive loss approach. Biometrika 85, 1–11.

    Article  MathSciNet  Google Scholar 

  • Gelman, A. (2007). Struggles with survey weighting and regression modeling. Stat. Sci. 22, 153–164.

    MathSciNet  Google Scholar 

  • Gelman, A., Carlin, J.B., Stern, H.S., et al. (2014). Bayesian Data Analysis, 3rd edn. Chapman & Hall/CRC, Boca Raton, FL.

    Google Scholar 

  • Genton, M.G. and Kleiber, W. (2015). Cross-covariance functions for multivariate geostatistics. Stat. Sci., pp. 147–163.

  • Ghosh, M. (2012). Finite population sampling: a model-design synthesis. Stat. Transit. 13, 235–242.

    Google Scholar 

  • Ghosh, M. and Meeden, G. (1997). Bayesian Methods for Finite Population Sampling. Chapman & Hall, London.

    Book  Google Scholar 

  • Ghosh, M. and Rao, J.N.K. (1994). Small area estimation: an appraisal. Stat. Sci. 9, 55–93.

    MathSciNet  Google Scholar 

  • Ghosh, M. and Sinha, B.K. (1990). On the consistency between model-and design-based estimators in survey sampling. Commun. Stat. - Theory Methods 19, 689–702. https://doi.org/10.1080/03610929008830226.

    Article  MathSciNet  Google Scholar 

  • Ghosh, M., Natarajan, K., Stroud, T.W.F., et al. (1998). Generalized linear models for small-area estimation. J. Am. Stat. Assoc. 93, 273–282.

    Article  MathSciNet  Google Scholar 

  • Gneiting, T. and Raftery, A.E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359–378.

    Article  MathSciNet  Google Scholar 

  • Guhaniyogi, R. and Banerjee, S. (2018). Meta-kriging: scalable bayesian modeling and inference for massive spatial datasets. Technometrics 60, 430–444.

    Article  MathSciNet  Google Scholar 

  • Harter, T., Dzurella, K., Kourakos, G., et al. (2017). Nitrogen Fertilizer Loading to Groundwater in the Central Valley, Final Report to the Fertilizer Research Education Program Projects 11-0301 and 15-0454. Tech. rep., California Department of Food and Agriculture and University of California Davis, Davis, CA.

  • Horvitz, D.G. and Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663–685.

    Article  MathSciNet  Google Scholar 

  • Kalton, G. (2019). Developments in survey research over the past 60 years: a personal perspective. Int. Stat. Rev. 87, S10–S30. https://doi.org/10.1111/insr.12287, https://onlinelibrary.wiley.com/doi/abs/10.1111/insr.12287.

  • Kish, L. (1965). Survey Sampling. John Wiley & Sons, Inc., Hoboken, New Jersey.

    Google Scholar 

  • Kish, L. (1995). The hundred years’ wars of survey sampling. Stat. in Transition 2, 813–830.

    Google Scholar 

  • Little, R. and Rubin, D. (2002). Statistical Analysis with Missing Data. John Wiley & Sons, Inc., Hoboken, New Jersey.

    Article  MathSciNet  Google Scholar 

  • Little, R.J. (1982). Models for nonresponse in sample surveys. J. Am. Stat. Assoc. 77, 237–250.

    Article  MathSciNet  Google Scholar 

  • Little, R.J. (2004). To model or not to model? Competing modes of inference for finite population sampling. J. Am. Stat. Assoc. 99, 546–556.

    Book  Google Scholar 

  • Lunn, D., Spiegelhalter, D., Thomas, A., et al. (2009). The bugs project: evolution, critique and future directions. Stat. Med. 28, 3049–3067. https://doi.org/10.1002/sim.3680, https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.3680.

  • Malec, D. and Sedransk, J. (1985). Bayesian inference for finite population parameters in multistage cluster sampling. J. Am. Stat. Assoc. 80, 897–902.

    Article  MathSciNet  Google Scholar 

  • Narain, R. (1951). On sampling without replacement with varying probabilities. Journal of the Indian Society of Agricultural Statistics 3, 169–175.

    MathSciNet  Google Scholar 

  • Rao, J.N.K. (2011). Impact of frequentist and Bayesian methods on survey sampling practice: a selective appraisal. Stat. Sci. 26, 240–256. https://doi.org/10.1214/10-STS346.

  • Rao, J.N.K. and Ghangurde, P.D. (1972). Bayesian optimization in sampling finite populations. J. Am. Stat. Assoc. 67, 439–443. https://doi.org/10.1080/01621459.1972.10482406, https://www.tandfonline.com/doi/abs/10.1080/01621459.1972.10482406.

    Article  MathSciNet  Google Scholar 

  • Rao, J.N.K. and Molina, I. (2015). Small Area Estimation, 2nd edn. John Wiley & Sons, Hoboken, NJ.

    Book  Google Scholar 

  • Ripley, B.D. (2004). Spatial Statistics. John Wiley & Sons, Hoboken, NJ.

    Google Scholar 

  • Rubin, D.B. (1976). Inference and missing data. Biometrika 63, 581–592.

    Article  MathSciNet  Google Scholar 

  • Scott, A. and Smith, T.M.F. (1969). Estimation in multi-stage surveys. J. Am. Stat. Assoc. 64, 830–840.

    Article  Google Scholar 

  • Stan Development Team (2024) RStan: the R interface to Stan. https://mc-stan.org/, r package version 2.32.5.

  • Tang, G., Little, R.J. and Raghunathan, T.E. (2003). Analysis of multivariate missing data with nonignorable nonresponse. Biometrika 90, 747–764.

    Article  MathSciNet  Google Scholar 

  • Vehtari, A., Gelman, A. and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 27, 1413–1432.

    Article  MathSciNet  Google Scholar 

  • Ver Hoef, J. (2002). Sampling and geostatistics for spatial data. Écoscience 9, 152–161.

    Article  Google Scholar 

  • Watanabe, S. (2013). A widely applicable bayesian information criterion. J. Mach. Learn. Res. 14, 867–897. http://jmlr.org/papers/v14/watanabe13a.html.

  • Zhang, L. and Banerjee, S. (2022). Spatial factor modeling: a bayesian matrix-normal approach for misaligned data. Biometrics 78, 560–573. https://doi.org/10.1111/biom.13452, https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.13452.

Download references

Acknowledgements

The author wishes to thank the editors and an anonymous referee for valuable feedback. The author is especially grateful to Professors Roderick J. Little and Trivellore Raghunathan from the University of Michigan, Ann Arbor, U.S.A., and Professor J.N.K. Rao from Carleton University, Ottawa, Canada, for insightful discussions on inference for finite populations. The work of the author has been supported, in part, by the National Science Foundation (NSF) from grants NSF/DMS 1916349 and NSF/DMS 2113778, by the National Institute of Environmental Health Sciences (NIEHS) from grants R01ES030210 and R01ES027027 and by the National Institute of General Medical Science from grant R01GM148761.

Funding

The work of the author has been supported, in part, by the National Science Foundation (NSF) from grants NSF/DMS 1916349 and NSF/DMS 2113778, by the National Institute of Environmental Health Sciences (NIEHS) from grants R01ES030210 and R01ES027027 and by the National Institute of General Medical Science from grant R01GM148761.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sudipto Banerjee.

Ethics declarations

Conflict of interest

The author declares that there are no financial or non-financial conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Banerjee, S. Finite Population Survey Sampling: An Unapologetic Bayesian Perspective. Sankhya A (2024). https://doi.org/10.1007/s13171-024-00348-8

Download citation

  • Received:

  • Published:

  • DOI: https://doi.org/10.1007/s13171-024-00348-8

Keywords

Mathematics Subject Classification

Navigation