## Abstract

High sampling variability complicates estimation of demographic rates in small areas. In addition, many countries have imperfect vital registration systems, with coverage quality that varies significantly between regions. We develop a Bayesian regression model for small-area mortality schedules that simultaneously addresses the problems of small local samples and underreporting of deaths. We combine a relational model for mortality schedules with probabilistic prior information on death registration coverage derived from demographic estimation techniques, such as Death Distribution Methods, and from field audits by public health experts. We test the model on small-area data from Brazil. Incorporating external estimates of vital registration coverage though priors improves small-area mortality estimates by accounting for underregistration and automatically producing measures of uncertainty. Bayesian estimates show that when mortality levels in small areas are compared, noise often dominates signal. Differences in local point estimates of life expectancy are often small relative to uncertainty, even for relatively large areas in a populous country like Brazil.

### Similar content being viewed by others

## Notes

In the appendix, we also demonstrate that a negative binomial distribution for total deaths implies a negative binomial distribution for registered deaths. A negative binomial model would be appropriate if the data exhibit

*overdispersion*—that is, higher variance than predicted by a Poisson model. With the Brazilian data that we use in this article, extensive experimentation produced no evidence of meaningful overdispersion, and posterior distributions of mortality rates were virtually identical with Poisson and negative binomial specifications. We therefore use a Poisson distribution for*D*in this article.By omitting an explicit prior, we assume

*a priori*that μ is equally likely to take any positive real value. The omitted (improper) prior is therefore*f*_{μ}(μ) ∝*I*(μ ≥ 0), where*I*() is a (0,1) indicator function. This yields a proper posterior distribution for (μ, π) and a proper marginal posterior for μ in Eq. (4).Gonzaga and Schmertmann (2016) showed that this property makes the specific choice of a standard schedule

**λ**^{*}far less important than in other relational models used in demography. Note that the TOPALS model includes indirect standardization as a special case in which all**α**values are equal and the standard schedule is shifted up or down by the same amount at all ages.In Brazil, municipalities are the smallest areas responsible for registering vital events.

For simplicity we consider the Federal District that contains Brasília to be a state.

Even microregions are fairly large “small areas.” With a single exception (the remote island of Fernando de Noronha, with a total resident population of only 2630 in 2010), all had resident populations of at least 20,000 in 2010. Rounded to the nearest thousand, the 10th, 50th, and 90th percentiles of microregional population were, respectively, 63,000, 173,000, and 557,000. The largest microregion, metropolitan São Paulo, had a 2010 population of more than 13 million.

The complete name in Portuguese is

*Busca ativa de óbitos e nascimentos no Nordeste e na Amazônia Legal*(Active search for deaths and births in the Northeast and the Amazonian administrative region).In practice, we used identical weights for each region:

*w*= (.035, .109, .856) for males and*w*= (.037, .047, .916) for females. These were calculated from national deaths over 2009–2011.Hyperparameters,

*K*, correspond to sample sizes in a field audit. Prior uncertainty about*K*represents uncertainty about the precision of the field audit estimates of π. Our (hyper)priors for*K*are fairly conservative: they imply that the most likely precision of the field audit estimates is equivalent to results from an audit of slightly fewer than*K*= 25 deaths in a region.Denoting the mean and variance of DDM estimates as \( \overline{x} \) and

*s*^{2}, the method of moments estimators (cf. Glen and Leemis 2017:227–228) are \( {\upphi}_2=\overline{x} \) and \( {K}_2=\frac{\overline{x}\left(1-\overline{x}\right)}{s^2}-1 \).Priors based on

*busca ativa*estimates are constructed from a single coverage estimate for each region by adding a hyperparameter for the estimate’s unknown precision. In contrast, priors from DDM estimates are based on multiple estimates per region and use the variance of those estimates as an index of (im)precision. A third alternative, which we do not use here, is to choose beta distribution parameters ϕ and*K*so that available estimates are all in a specified range of prior probability—for example, a 90 % probability that*π*∈ [min(*DDM*), max(*DDM*)].Because we have only state-level prior information about death registration in this age group, we can assess only the joint prior probability of a

*set*of substate coverage levels, (π_{2α}. . . π_{2z}), by looking at whether their weighted average is likely.This prior distribution arises from two lines in the

*Stan*programming language that we use for MCMC sampling. From our first principle (diffuse marginal distributions for each α_{i}), we add**α ~***normal*(0,4) to the model. From the second principle (small differences between consecutive parameter values) we add α_{i}– α_{i – 1}~*normal*(0, sqrt(0.5)), as in Gonzaga and Schmertmann (2016). These statements in*Stan*represent changes to the log prior density of any proposed**α**vector, which together yield this specific multivariate normal distribution. The results that we report in this article are extremely insensitive to the choice of priors for**α**.The high estimates for life expectancy in the South and Southeast probably result from IBGE’s

*compatibilização*step (Instituto Brasileiro de Geografia e Estatística 2013: tables 6 and 13), in which they adjust national totals by removing deaths from these two regions.

## References

Agostinho, C. (2009).

*Estudo sobre a mortalidade adulta, para Brasil entre 1980 e 2000 e Unidades da Federação em 2000: Uma aplicação dos métodos de distribuição de mortes*[A study of adult mortality in Brazil between 1980 and 2000, and in Brazilian states in 2000] (Unpublished doctoral dissertation). Faculdade de Ciências Econômicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil.Alexander, M., Zagheni, E., & Barbieri, M. (2017). A flexible Bayesian model for estimating subnational mortality.

*Demography, 54,*2025–2041.Alkema, L., Kantorová, V., Menozzi, C., & Biddlecom, A. (2013). National, regional, and global rates and trends in contraceptive prevalence and unmet need for family planning between 1990 and 2015: A systematic and comprehensive analysis.

*Lancet, 381,*1642–1652.Alkema, L., Raftery, A. E., Gerland, P., Clark, S. J., Pelletier, F., Buettner, T., & Heilig, G. K. (2011). Probabilistic projections of the total fertility rate for all countries.

*Demography, 48,*815–839.Bennett, N. G., & Horiuchi, S. (1981). Estimating the completeness of death registration in a closed population.

*Population Index, 47,*207–221.Bennett, N. G., & Horiuchi, S. (1984). Mortality estimation from registered deaths in less developed countries.

*Demography, 21,*217–233.Bernardinelli, L., & Montomoli, C. (1992). Empirical Bayes versus fully Bayesian analysis of geographical variation in disease risk.

*Statistics in Medicine, 11,*983–1007.Bhat, P. N. M. (2002). Completeness of India’s sample registration system: An assessment using the general growth balance method.

*Population Studies, 56,*119–134.Bignami-Van Assche, S. (2005, March–April).

*Province-specific mortality in China 1990–2000*. Paper presented at the annual meeting of the Population Association of America, Philadelphia, PA.Borges, D., Miranda, D., Duarte, T., Novaes, F., Ettel, K., Guimarães, T., & Ferreira, T. (2012).

*Mortes violentas no Brasil: Uma análise do fluxo de informações*[Violent deaths in Brazil: An analysis of the flow of information.]. Rio de Janeiro, Brazil: LAV/UERJ.Brass, W. (1971). Mortality models and their uses in demography.

*Transactions of the Faculty of Actuaries, 33,*123–142.Brass, W. (1975).

*Methods for estimating fertility and mortality from limited and defective data, based on seminars held 16–24 September 1971 at Centro Latinamerico de Demografia (CELADE) San Jose, Costa Rica*(Report). Chapel Hill, NC: International Program of Laboratories for Population Statistics.Campos, N. O. B., & Rodrigues, R. N. (2004). Ritmo de declínio nas taxas de mortalidade dos idosos nos estados do Sudeste, 1980–2000 [The pace of decline in mortality rates of the elderly in states of the Southeast, 1980–2000].

*Revista Brasileira de Estudos de População, 21,*323–342.Carpenter, B., Gelman, A., Hoffman, M. D., Lee, G., Goodrich, B., Betancourt, M., . . . Riddell, A. (2017). Stan: A probabilistic programming language.

*Journal of Statistical Software, 76,*1–32. https://doi.org/10.18637/jss.v076.i01Congdon, P. (2009). Life expectancies for small areas: A Bayesian random effects methodology.

*International Statistical Review, 77,*222–240.de Beer, J. (2012). Smoothing and projecting age-specific probabilities of death by TOPALS.

*Demographic Research, 27*(article 20), 543–592. https://doi.org/10.4054/DemRes.2012.27.20de Boor, C. (2001).

*Applied mathematical sciences: Vol. 27. A practical guide to splines*(Revised ed.). New York, NY: Springer-Verlag.de Mello Jorge, M. H. P., Gawryszewski, V. P., & Latorre, M. D. R. D. D. O. (1997). Análise dos dados de mortalidade [Analysis of mortality data.].

*Revista de Saúde Pública, 31,*5–25. https://doi.org/10.1590/S0034-89101997000500002de Mello Jorge, M. H. P., Laurenti, R., & Davidson Gotlieb, S. L. (2007). Análise da qualidade das estatísticas vitais brasileiras: A experiência de implantação do SIM e do SINASC [Quality analysis of Brazilian vital statistics: The experience of implementing the SIM and SINASC systems].

*Ciência e Saúde Coletiva, 12,*643–654.de Oliveira, G. L., Loschi, R. H., & Assunção, R. M. (2017). A random-censoring Poisson model for underreported data.

*Statistics in Medicine*. Advance online publication. doi: https://doi.org/10.1002/sim.7456Freire, F. H., Lima, E. C., Queiroz, B. L., Gonzaga, M. R., & Souza, F. H. (2015, May).

*Mortality estimates and construction of life tables for small areas in Brazil, 2010*. Paper presented at the annual meeting of the Population Association of America, San Diego, CA.Frias, P. G., Szwarcwald, C. L., de Souza, P. R. B., Jr., Almeida, W. D. S., & Lira, P. I. C. (2013). Correcting vital information: Estimating infant mortality, Brazil, 2000–2009.

*Revista de Saúde Pública, 47,*1048–1058.Gerland, P., Raftery, A. E., Ševčíková, H., Li, N., Gu, D., Spoorenberg, T., . . . Wilmoth, J. (2014). World population stabilization unlikely this century.

*Science, 346,*234–237.Glen, A. G., & Leemis, L. M. (Eds.). (2017).

*International series in operations research & management science. Computational probability applications*. Cham, Switzerland: Springer.Gonzaga, M. R., & Schmertmann, C. P. (2016). Estimating age-and sex-specific mortality rates for small areas with TOPALS regression: An application to Brazil in 2010.

*Revista Brasileira de Estudos de População, 33,*629–652.Greene, W. H. (1997).

*Econometric analysis*(3rd ed.). Upper Saddle River, NJ: Prentice Hall.Hill, K. (2007).

*Methods for measuring adult mortality in developing countries: A comparative review*(Global Burden of Disease 2000 in Aging Populations Research Paper No. 13). Cambridge, MA: Harvard Burden of Disease Unit.Hill, K., & Queiroz, B. (2010). Adjusting the general growth balance method for migration.

*Revista Brasileira de Estudos de População, 27,*7–20.Hill, K., You, D., & Choi, Y. (2009). Death distribution methods for estimating adult mortality: Sensitivity analysis with simulated data errors.

*Demographic Research, 21*(article 9), 235–254. https://doi.org/10.4054/DemRes.2009.21.9Hill, K. H. (1987). Estimating census and death registration completeness.

*Asian and Pacific Population Forum, 1*(3), 8–13, 23.Instituto Brasileiro de Geografia e Estatística (Ed.). (2013).

*Tábuas abreviadas de mortalidade por sexo e idade: Brasil, grandes regiões e unidades da federação, 2010*.*Estudos e pesquisas. Informação demográfica e socioeconomic*[Sex- and age-specific abbreviated life tables for Brazilian states and major regions in 2010: Studies and research. Demographic and socioeconomic information]. Rio de Janeiro, Brazil: Instituto Brasileiro de Geografia e Estatística (IBGE).Jonker, M. F., Van Lenthe, F. J., Congdon, P. D., Donkers, B., Burdorf, A., & Mackenbach, J. P. (2012). Comparison of Bayesian random-effects and traditional life expectancy estimations in small-area applications.

*American Journal of Epidemiology, 176,*929–937.Lynch, S. M. (2007).

*Introduction to applied Bayesian statistics and estimation for social scientists*. New York, NY: Springer.Målqvist, M., Eriksson, L., Nga, N. T., Fagerland, L. I., Hoa, D. P., Wallin, L., . . . Persson, L.-Å. (2008). Unreported births and deaths, a severe obstacle for improved neonatal survival in low-income countries: A population based study.

*BMC International Health and Human Rights, 8,*4. https://doi.org/10.1186/1472-698X-8-4Mathers, C. D., Ma Fat, D., Inoue, M., Rao, C., & Lopez, A. D. (2005). Counting the dead and what they died from: An assessment of the global status of cause of death data.

*Bulletin of the World Health Organization, 83,*171–177.Matos, K., de Godoy, M., & Baccarat, C. (2013). Mortalidade por causas externas em crianças, adolescentes e jovens: Uma revisão bibliográfica [Mortality from external causes in children, teenagers, and young adults: A bibliographic review].

*Espaço para a Saúde-Revista de Saúde Pública do Paraná, 14*(1/2), 82–93.Moreno, E., & Girón, J. (1998). Estimating with incomplete count data: A Bayesian approach.

*Journal of Statistical Planning and Inference, 66,*147–159.Murray, C. J. L., Rajaratnam, J. K., Marcus, J., Laakso, T., & Lopez, A. D. (2010). What can we conclude from death registration? Improved methods for evaluating completeness.

*PLoS Medicine, 7*(4), 1000262. https://doi.org/10.1371/journal.pmed.1000262Ocaña-Riola, R., & Mayoral-Cortés, J.-M. (2010). Spatio-temporal trends of mortality in small areas of Southern Spain.

*BMC Public Health, 10**,*1. https://doi.org/10.1186/1471-2458-10-26Paes, N. A. (2005). Avaliação da cobertura dos registros de óbitos dos estados brasileiros em 2000 [Assessment of completeness of death reporting in Brazilian states in 2000].

*Revista de Saúde Pública, 39,*882–890.Paes, N. A., & Albuquerque, M. E. E. (1999). Avaliação da qualidade dos dados populacionais e cobertura dos registros de óbitos para as regiões Brasileiras [Evaluation of population data quality and death registration coverage for Brazilian regions].

*Revista de Saúde Pública, 33,*33–43.Pletcher, S. D. (1999). Model fitting and hypothesis testing for age-specific mortality data.

*Journal of Evolutionary Biology, 12,*430–439.Preston, S., Coale, A. J., Trussell, J., & Weinstein, M. (1980). Estimating the completeness of reporting of adult deaths in populations that are approximately stable.

*Population Index, 46,*179–202.Preston, S., & Hill, K. (1980). Estimating the completeness of death registration.

*Population Studies, 34,*349–366.Queiroz, B. L. (2012, November).

*Estimativas do grau de cobertura e da esperança de vida para as unidades da federação no Brasil entre 2000 e 2010*[Estimates of the degree of coverage and life expectancy for Brazilian states between 2000 and 2010]. Paper presented at the XVIII Encontro de Estudos de População da ABEP, Aguas de Lindóia.Queiroz, B. L., Freire, F. H. M. A., Gonzaga, M. R., & Lima, E. E. C. (2017). Completeness of death-count coverage and adult mortality (45q15) for Brazilian states from 1980 to 2010.

*Revista Brasileira de Epidemiologia, 20,*21–33.Queiroz, B. L., Lima, E. C., Freire, F. H., & Gonzaga, M. R. (2013). Adult mortality estimates for small areas in Brazil, 1980–2010: A methodological approach.

*Lancet, 381,*S120.Raftery, A. E. (1988). Inference for the binomial

*N*parameter: A hierarchical Bayes approach.*Biometrika, 75,*223–228.Raftery, A. E., Chunn, J. L., Gerland, P., & Ševčíková, H. (2013). Bayesian probabilistic projections of life expectancy for all countries.

*Demography, 50,*777–801.Raftery, A. E., Lalic, N., Gerland, P., Li, N., & Heilig, G. (2014). Joint probabilistic projection of female and male life expectancy.

*Demographic Research, 30*(article 27), 795–822. https://doi.org/10.4054/DemRes.2014.30.27Riggan, W. B., Manton, K. G., Creason, J. P., Woodbury, M. A., & Stallard, E. (1991). Assessment of spatial variation of risks in small populations.

*Environmental Health Perspectives, 96,*223–238.Ševčíková, H., Li, N., Kantorová, V., Gerland, P., & Raftery, A. E. (2016). Age-specific mortality and fertility rates for probabilistic population projections. In R. Schoen (Ed.),

*Dynamic demographic analysis*(pp. 285–310). Cham, Switzerland: Springer.Soares Filho, A. M., Souza, M. F. M., Gazal-Carvalho, C., Malta, D. C., Alencar, A. P., Silva, M. M. A., & Morais Neto, O. L. (2007). Análise da mortalidade por homicídios no Brasil [Analysis of homicide mortality in Brazil].

*Epidemiologia e Serviços de Saúde, 16,*7–18.Stephens, A. S., Purdie, S., Yang, B., & Moore, H. (2013). Life expectancy estimation in small administrative areas with non-uniform population sizes: Application to Australian New South Wales local government areas.

*BMJ Open, 3*(12), e003710. https://doi.org/10.1136/bmjopen-2013-003710Szwarcwald, C. L., Morais Neto, O. L., Frias, P. G., de Souza, P. R. B., Jr., Escalante, J. J. C., de Lima, R. B., & Viola, R. C. (2011). Busca ativa de óbitos e nascimentos no Nordeste e na Amazônia Legal: Estimação das coberturas do SIM e do SINASC nos municípios Brasileiros [Active search for deaths and births in the Northeast and in the Legal Amazon: Estimation of coverage of SIM and SINASC in Brazilian municipalities]. In Ministry of Health (Ed.),

*Saúde Brasil 2010: Uma análise da situação e de evidências selecionadas de impacto de ações de vigilância em saúde*[Saúde Brasil 2010: An analysis of the health situation and selected evidence of the impact of health surveillance actions] (pp. 79–98). Brasília: Ministério da Saúde.Tsimbos, C., Kalogirou, S., & Verropoulou, G. (2014). Estimating spatial differentials in life expectancy in Greece at local authority level.

*Population, Space and Place, 20,*646–663.Wilmoth, J., Zureick, S., Canudas-Romo, V., Inoue, M., & Sawyer, C. (2012). A flexible two-dimensional mortality model for use in indirect estimation.

*Population Studies, 66,*1–28.You, D., Hug, L., Ejdemyr, S., Idele, P., Hogan, D., Mathers, C., . . . Alkema, L. (2015). Global, regional, and national levels and trends in under-5 mortality between 1990 and 2015, with scenario-based projections to 2030: A systematic analysis by the UN Inter-agency Group for Child Mortality Estimation.

*Lancet, 386,*2275–2286.

## Acknowledgments

This research was supported by the Capes Foundation of Brazil’s Ministry of Education. Marcos R. Gonzaga gratefully acknowledges support from Research Projects 470866/2014-4 (Estimativas de mortalidade e construção de tabelas de vida para pequenas áreas no Brasil, 1980 a 2010 MCTI/CNPQ/MEC/Capes/Ciências Sociais Aplicadas) and 454223/2014-5 (Estimativas de mortalidade e construção de tabelas de vida para pequenas áreas no Brasil, 1980 a 2010/MCTI/CNPQ/Universal 14/2014).

## Author information

### Authors and Affiliations

### Corresponding author

## Appendix: Statistical Distribution of Registered Deaths

### Appendix: Statistical Distribution of Registered Deaths

A generalized Poisson distribution for a random count variable *Y*, using a mixture of heterogeneous risks, is (Greene 1997:939–940)

where *z* is a multiplicative risk factor with density *g*(*z*) over positive real numbers. This mixture model, *Y* ∼ *PoissonMix*(λ, *g*), describes the distribution of count variable *Y* ∈ {0, 1, 2, . . .} in terms of a scalar parameter λ and a density function *g* (). It generalizes the Poisson distribution by allowing the mean and variance of *Y* to differ. In particular, it provides a framework for modeling overdispersion (*V*(*Y*) *> E*(*Y*)), which is often observed in count data.

The mixture model includes the standard Poisson distribution as a limiting case: as the distribution *g*(*z*) approaches a constant at *z* = 1, *Y*’s distribution approaches a Poisson with *E*(*Y*) = *V* (*Y*) = λ. It also includes the negative binomial distribution: if *g*(*z*) is a gamma density with *E*(*z*) = 1 and *V*(*z*) = 1 / θ, then *Y* has a negative binomial distribution with *E*(*Y*) = λ and *V*(*Y*) = λ + (λ^{2} / θ). Other {λ, *g* ()} mixtures yield other discrete distributions for *Y*.

Suppose that total deaths in a population follow a distribution in this generalized family, so that the probability of *D* deaths is

If deaths are registered independently with probability π, then

and the joint probability of a pair of integers (*R, D*) is

In terms of registered deaths (*R*) and unregistered deaths (*U* = *D − R*), the same expression is

The marginal probability of registered deaths (*R*) is therefore

The distribution of registered deaths (*R*) therefore has exactly the same mathematical form as the marginal distribution of total deaths (*D*), except that parameter λ is replaced with λπ. That is,

This general proof applies to special cases where *D* ~ *Poisson* or *D* ∼ *NegBinom*, as well as to other Poisson mixtures. Most importantly for this article, it demonstrates that if total deaths have a Poisson distribution with expected value λ = *N*μ, then registered deaths also have a Poisson distribution, but with expected value λπ = *N*μπ.

## Rights and permissions

## About this article

### Cite this article

Schmertmann, C.P., Gonzaga, M.R. Bayesian Estimation of Age-Specific Mortality and Life Expectancy for Small Areas With Defective Vital Records.
*Demography* **55**, 1363–1388 (2018). https://doi.org/10.1007/s13524-018-0695-2

Published:

Issue Date:

DOI: https://doi.org/10.1007/s13524-018-0695-2