The complexity of survey data and the availability of data from auxiliary sources motivate researchers to explore estimation methods that extend beyond traditional survey-based estimation. The U.S. Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System (BRFSS) collects a wide range of health information, including whether respondents have a personal doctor. While the BRFSS focuses on state-level estimation, there is demand for county-level estimation of health indicators using BRFSS data. A hierarchical Bayes small area estimation model is developed to combine county-level BRFSS survey data with county-level data from auxiliary sources, while accounting for various sources of error and nested geographical levels. To mitigate extreme proportions and unstable survey variances, a transformation is applied to the survey data. Model-based county-level predictions are constructed for prevalence of having a personal doctor for all the counties in the U.S., including those where BRFSS survey data were not available. An evaluation study using only the counties with large BRFSS sample sizes to fit the model versus using all the counties with BRFSS data to fit the model is also presented.
This is a preview of subscription content,to check access.
Access this article
Similar content being viewed by others
Battese G, Harter R, Fuller W (1988) An error-components model for prediction of county crop areas using survey and satellite data. J Am Stat Assoc 83:28–36
Berkowitz Z, Zhang X, Richards T, Nadel M, Peipins L, Holt J (2018) Multilevel small-area estimation of colorectal cancer screening in the United States. Cancer Epidemiol Biomark Prev 27(3):245–253
Berkowitz Z, Zhang X, Richards T et al (2019) Multilevel regression for small-area estimation of mammography use in the United States. Cancer Epidemiol Biomark Prev 28(1):32–40
Browne W, Draper D (2006) A comparison of Bayesian and likelihood based methods for fitting multilevel models. Bayesian Anal 1(3):473–514
Cadwell B, Thompson T, Boyle J, Baker L (2010) Bayesian small area estimation of diabetes prevalence by U.S. county, 2005. J Data Sci 8:173–188
Erciulescu A, Cruze N, Nandram B (2020) Statistical challenges in combining survey and auxiliary data to produce official statistics. J Off Stat 36(1):63–88
Erciulescu A, Opsomer J (2019) A model-based approach to predict employee compensation components, In: Joint statistical meetings proceedings, Government Statistics Section, American Statistical Association, July 27–August 1; Alexandria, pp 1601–1623
Fabrizi E, Ferrante MR, Trivisano C (2016) Hierarchical Beta regression models for the estimation of poverty and inequality parameters in small areas. In: Analysis of poverty data by small area methods. Wiley, pp 299–314
Fay R, Herriot R (1979) Estimates of income for small places: an application of James–Stein procedures to census data. J Am Stat Assoc 74(366a):269–277
Fuller W, Goyeneche J (1998) Estimation of the state variance component. (Unpublished manuscript)
Gabler S, Häder S, Lahiri P (1999) A model based justification of Kish’s formula for design effects forweighting and clustering. Surv Methodol 25:105–106
Gelman A (2006) Prior distributions for variance parameters in hierarchical models (Comment on an article by Browne and Draper). Bayesian Anal 1(3):515–534
Holt J, Matthews K, Lu H et al (2019) Small area estimates of populations with chronic conditions for community preparedness for public health emergencies. Am J Public Health 109(S4):S325–S331
Janicki R (2020) Properties of the beta regression model for small area estimation of proportions and applicationto estimation of poverty rates. Commun Stat Theor Methods 49(9):2264–2284
Kish L (1965) Survey sampling. Wiley, New York
Krenzke T, Mohadjer L, Li J, et al (2020) Program for the international assessment of adult competencies (PIAAC): state and county estimation methodology report. Tech. Reports NCES2020225, U.S. Department of Education, Rockville: Westat. https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2020225
Lahiri P, Suntornchost J (2015) Variable selection for linear mixed models with applications in small areaestimation. Sankhya B 77(2):312–320
Liu B, Parsons V, Feuer E et al (2019) Small area estimation of cancer risk factors and screening behaviors in U.S. counties by combining two large national health surveys. Prev Chronic Dis 16:E119:190013
Pierannunzi C, Xu F, Wallace R et al (2016) A methodological approach to small area estimation for the Behavioral Risk Factor Surveillance System. Prev Chronic Dis 13:E91:150480
Polson N, Scott J (2012) On the half-Cauchy prior for a global scale parameter. Bayesian Anal 7(4):887–902
Raghunathan T, Xie D, Schenker N et al (2007) Combining information from two surveys to estimate county-level prevalence rates of cancer risk factors and screening. J Am Stat Assoc 102:474–486
Torabi M, Rao J (2014) On small area estimation under a sub-area level model. J Multivar Anal 127:36–55
Watanabe S (2013) A widely applicable Bayesian information criterion. J Mach Learn Res 14:867–897
Wieczorek J, Hawala S (2011) A bayesian zero-one inflated beta model for estimating poverty in us counties. In: Proceedings of the American statistical sssociation, section on survey research methods. American Statistical Association, Alexandria, VA
Zhang Z, Holt J, Lu H et al (2014) Multilevel regression and poststratification for small-area estimation of population health outcomes: a case study of chronic obstructive pulmonary disease prevalence using the Behavioral Risk Factor Surveillance System. Am J Epidemiol 179(8):1025–1033
This work was conducted under a CDC-Westat project. The authors thank Carol Pierannunzi, the CDC’s main contact for the project, for helpful discussions and comments. Dr. Li contributed to this work while she was a Senior Statistician at Westat. Disclaimer: The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
The work described in this paper was conducted under contract with the Centers for Disease Control and Prevention (CDC Contract #HHSD2002013M53968B Order #75D30120F09442). The BRFSS data are confidential to CDC so cannot be shared.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A Auxiliary data pool
See Table 5.
Appendix B STAN code
1.1 Model specification
1.2 Model fit
About this article
Cite this article
Erciulescu, A., Li, J., Krenzke, T. et al. Hierarchical Bayes small area estimation for county-level health prevalence to having a personal doctor. Stat Methods Appl (2022). https://doi.org/10.1007/s10260-022-00678-7