Skip to main content

Advertisement

Log in

Boosted Regression Trees for Small-Area Population Forecasting

  • Original Research
  • Published:
Population Research and Policy Review Aims and scope Submit manuscript

Abstract

Small-area population forecasting, such as the forecasting of age/gender groupings at the level of US Census Tracts, is challenged by thorny issues including (1) small population sizes, (2) frequent and sometimes directionally opposing shifts in population dynamics between censuses, (3) data availability, and (4) the ongoing evolution of the US census geographies. It is, therefore, not surprising that evaluation studies suggest wide-ranging forecast errors. Estimates vary between lows between 10% and 20% and highs sometimes exceeding 100% within any given age/gender interval. Despite its successes, only recently have population forecasters begun to explore the possibilities presented by machine learning. Using 1990 and 2000 census data, we develop 10-year age/gender-structured 2010 population forecasts for 50,965 census tracts in the U.S. using a well-known machine learning technique: boosted regression trees. Using standard ex post facto measures of forecast error (MAPE, MALPE, and MAPE-R), we demonstrate that forecasts based on “out-of-the-box” boosted regression trees have greater accuracy and produce fewer and less extreme outliers than comparison forecasts produced by the Hamilton-Perry method (reported in Baker et al. in Population Res Policy Rev 40:1341–1354, 2021. https://doi.org/10.1007/s11113-020-09601-y).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Data utilized in this publication were obtained from: (1) https://nhgis.org/ and (2) via the U.S. Census Bureau’s Application Programming Interface (API), documented at https://www.census.gov/data/developers/about.html. Secondary posting of NHGIS data is precluded per their policy. Details necessary to re-extract these data are found in Table 1.

Notes

  1. These forecasts excluded any census tract with one or more zeros in its age/gender groups in any of the three study years 1990, 2000 and 2010. These exclusions were made because the cohort-change ratios are not well suited to deal with zeros because they are a ratio, a measure that is undefined when the denominator is zero. Also, the inclusion of zero populations exacerbated the impact of outlying errors on assessments of forecast accuracy.

  2. This study also found that uncontrolled H-P projections are surprising accurate at the census tract level, but that forecast errors were reduced when projections by age/gender were controlled to a total population forecast in a census tract.

References

  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.

    Article  Google Scholar 

  • Baker, J., Alcantara, A., Ruan, X. M., Ruiz, D., & Crouse, N. (2014a). Sub-County component estimates using administrative records: A case-study in New Mexico. In N. Hoque & L. Potter (Eds.), Emerging Techniques in Applied Demography (pp. 63–80). Springer.

    Google Scholar 

  • Baker, J., Alcantara, A., Ruan, X. M., & Watkins, K. (2014b). Spatial weighting improves accuracy and reduces bias in small-area demographic forecasts of urban Populations. Journal of Population Research, 31(4), 345–359.

    Article  Google Scholar 

  • Baker, J., Alcantara, A., Ruan, X. M., Watkins, K., & Vasan, S. (2013). A Comparative evaluation of accuracy and bias in census tract-level age/sex-specific population estimates: Component I (net-migration) vs Component III (Hamilton-Perry). Population Research and Policy Review, 32(6), 919–942.

    Article  Google Scholar 

  • Baker, J., Swanson, D., & Tayman, J. (2021). The accuracy of Hamilton-Perry population projections for census tracts in the United States. Population Research and Policy Review, 40, 1341–1354. https://doi.org/10.1007/s11113-020-09601-y

    Article  Google Scholar 

  • Baker, J., Swanson, D. A., Tayman, J., & Tedrow, L. M. (2017). Cohort change ratios and their applications. Springer.

    Book  Google Scholar 

  • Belkin, M., Hsu, D., & MA, S., & Mandal, S. (2019). Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proceedings of the National Academies of Science, 16(32), 15849–15854.

    Article  Google Scholar 

  • Breiman, L. (1996). Heuristics of Instability and Stabilization in Model Selection. The Annals of Statistics, 24(6), 2350–2383.

    Article  Google Scholar 

  • Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification & regression trees. Wadsworth.

    Google Scholar 

  • Chi, G., & Wang, D. (2017). Small-area Population Forecasting: A geographically weighted regression approach. 449–471 in D. Swanson (ed): Frontiers in Applied Demography. Springer: Dordrecht, The Netherlands.

  • Fragoso, T. M., Bertoli, W., & Louzada, F. (2018). Bayesian Model Averaging: A systematic review and conceptual classification. International Statistical Review, 86(1), 1–28.

    Article  Google Scholar 

  • Freund, Y., & Schapire, R. (1999). A Short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5), 771–780.

    Google Scholar 

  • Friedman, J. (1999). Greedy function approximation: A gradient boosting machine. https://biostat.jhsph.edu/~mmccall/articles/friedman_1999.pdf.

  • Friedman, J. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67.

    Google Scholar 

  • Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2), 337–407.

    Article  Google Scholar 

  • Hamilton, C. H., & Perry, J. (1962). A short-cut method for projecting population by age from one decennial census to another. Social Forces, 41, 163–170.

    Article  Google Scholar 

  • Hastie, T., & Tibshirani, R. (1990). Generalized additive models. Chapman & Hall.

    Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). New York.

    Book  Google Scholar 

  • Hauer, M. (2019). Population projections for U.S. counties by age, sex, and race controlled to shared socioeconomic pathways. Scientific Data. https://www.natur e.com/artic les/sdata 20195 .pdf.

  • Jivetti, B., & Hoque, N. (Eds.). (2020). Population change and public policy. Springer.

    Google Scholar 

  • Keyfitz, N. (1982). Choice of function for mortality analysis: Effective forecasting depends on a minimum parameter representation. Theoretical Population Biology, 21, 329–352.

    Article  Google Scholar 

  • Kintner, H., Merrick, T., Morrison, P., & Voss, P. (Eds.). (1997). Demographics: A casebook for business and government. Westview Press.

    Google Scholar 

  • Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer.

    Book  Google Scholar 

  • Lunn, D. J., Simpson, S. N., Diamond, I., & Middleton, L. (1998). The accuracy of age-specific population estimates for small areas in Britain. Population Studies, 52(3), 327–344.

    Article  Google Scholar 

  • Mueller, J. T., & Santos-Lozada, A. R. (2022). The 2020 U.S. census differential privacy method introduces disproportionate discrepancies for rural and non-white populations. Population Research and Policy Review. https://doi.org/10.1007/s11113-022-09698-3

    Article  Google Scholar 

  • Pol, L., & Thomas, R. (1997). Demography for business decision-making. Praeger.

    Google Scholar 

  • Pol, L., & Thomas, R. (2012). Demography of health Care. Plenum.

    Google Scholar 

  • Raftery, A., & Ševčíková, H. (2021). Probabilistic population forecasting: Short to very long-term. International Journal of Forecasting. https://doi.org/10.1016/j.ijforecast.2021.09.001

    Article  Google Scholar 

  • Rayer, S., & Smith, S. K. (2014). Population projections by age for Florida and its counties: Assessing accuracy and the impact of adjustments. Population Research and Policy Review, 33(5), 747–770.

    Article  Google Scholar 

  • Rees, P., Norman, P., & Brown, D. (2004). A framework for progressively improving small area population estimates. Journal of the Royal Statistical Society, 167(1), 5–36.

    Article  Google Scholar 

  • Ruggles, S., & Van Riper, D. (2021). The role of chance in the census bureau database reconstruction experiment. Population Research and Policy Review. https://doi.org/10.1007/s11113-021-09674-3

    Article  Google Scholar 

  • Schapire, R., & Freund, Y. (2014). Boosting: Foundations & algorithms. MIT Press.

    Google Scholar 

  • Siegel, J. S. (2002). Applied demography: Applications to business, government, law and public policy. Academic Press.

    Google Scholar 

  • Smith, S., & Shahidullah, M. (1995). An evaluation of population projection errors for census tracts. Journal of the American Statistical Association, 90(429), 64–71.

    Article  Google Scholar 

  • Smith, S. K., & Tayman, J. (2003). An Evaluation of Population Projections by Age. Demography, 40(4), 741–757.

    Article  Google Scholar 

  • Smith, S., Tayman, J., & Swanson, D. (2001). State and local population projections: Methodology and analysis. Kluwer Academic Publishers.

    Google Scholar 

  • Smith, S., Tayman, J., & Swanson, D. (2013). A practitioner’s guide to state and local population projections. Springer.

    Book  Google Scholar 

  • Swanson, D., & Tayman, J. (2014). Measuring uncertainty in population forecasts: A new approach. pp. 203–215 in Marco Marsili and Giorgia Capacci (eds.) Proceedings of the 6th EUROSTAT/UNECE Work Session on Demographic Projections. National Institute of Statistics: Rome, Italy.

  • Swanson, D., Bryan, T., & Sewell, R. (2021). The effect of the differential privacy disclosure avoidance system proposed by the census bureau on 2020 census products: Four case studies of census blocks in Alaska. PAA Affairs, https://www.populationassociation.org/blogs/paa-web1/2021/03/30/the-effect-of-the-differential-privacy-disclosure.

  • Swanson, D., & Coleman, C. (2007). On the MAPE-R as a measure of cross-sectional estimation & forecast accuracy. Journal of Economic and Social Measurement, 32(4), 219–233.

    Article  Google Scholar 

  • Swanson, D., & Pol, L. (2004). Contemporary developments in applied demography within the United States. Journal of Applied Social Science, 21(2), 26–56.

    Google Scholar 

  • Swanson, D., & Tayman, J. (1999). On the validity of the MAPE as a measure of population forecast accuracy. Population Research and Policy Review, 18(4), 299–322.

    Article  Google Scholar 

  • Swanson, D., Tayman, J., & Barr, C. F. (2000). A note on the measurement of accuracy for subnational demographic estimates. Demography, 37(2), 193–202.

    Article  Google Scholar 

  • Swanson, D., Tayman, J., & Bryan, T. (2011). MAPE-R: A rescaled measure of accuracy for cross-sectional, sub-national forecasts. Journal of Population Research, 28, 225–243.

    Article  Google Scholar 

  • Tayman, J., Smith, S., & Rayer, S. (2011). Evaluating population forecast accuracy: A regression approach using county data. Population Research and Policy Review, 30(2), 235–262.

    Article  Google Scholar 

  • Tayman, J., Swanson, D., & Barr, C. F. (1999). In search of the ideal measure of accuracy for subnational demographic forecasts. Population Research and Policy Review, 18(5), 387–409.

    Article  Google Scholar 

  • Tibshirani, R., & Friedman, J. (2020). A pliable lasso. Journal of Computational and Graphical Statistics, 29(1), 215–225.

    Article  Google Scholar 

  • Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistics. Journal of the Royal Statistical Society B, 63(2), 411–423.

    Article  Google Scholar 

  • Wilson, T. (2016). Evaluation of alternative cohort-component models for local area population forecasts. Population Research and Policy Review., 35, 241–261.

    Article  Google Scholar 

  • Wilson, T., Grossman, M., Alexander, M., Rees, P., & Temple, J. (2021). Methods for small area population forecasts: State-of-the-art and research needs. Population Research and Policy Review, Online First. https://doi.org/10.1007/s11113-021-09671-6

    Article  Google Scholar 

  • Wood, S. N. (2017). Generalized additive models: An introduction with R (2nd ed.). Boca Raton, FL.

    Book  Google Scholar 

Download references

Acknowledgements

We thank Tom Wilson, Irina Grossman, and two anonymous reviewers for their helpful comments on earlier drafts of this paper and, more generally, on the methods deployed therein. While we are grateful for this help, any remaining errors in logic or method remain our own.

Funding

The authors did not receive funding or any other form of support from any organization or individual for the submitted work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jack Baker.

Ethics declarations

Competing interests

The authors did not receive funding or any other form of support from any organization or individual for the submitted work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baker, J., Swanson, D. & Tayman, J. Boosted Regression Trees for Small-Area Population Forecasting. Popul Res Policy Rev 42, 51 (2023). https://doi.org/10.1007/s11113-023-09795-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11113-023-09795-x

Keywords

Navigation