Boosted Regression Trees for Small-Area Population Forecasting

Baker, Jack; Swanson, David; Tayman, Jeff

doi:10.1007/s11113-023-09795-x

Boosted Regression Trees for Small-Area Population Forecasting

Original Research
Published: 02 June 2023

Volume 42, article number 51, (2023)
Cite this article

Population Research and Policy Review Aims and scope Submit manuscript

Jack Baker¹,
David Swanson^2,3 &
Jeff Tayman⁴

339 Accesses
2 Citations
Explore all metrics

Abstract

Small-area population forecasting, such as the forecasting of age/gender groupings at the level of US Census Tracts, is challenged by thorny issues including (1) small population sizes, (2) frequent and sometimes directionally opposing shifts in population dynamics between censuses, (3) data availability, and (4) the ongoing evolution of the US census geographies. It is, therefore, not surprising that evaluation studies suggest wide-ranging forecast errors. Estimates vary between lows between 10% and 20% and highs sometimes exceeding 100% within any given age/gender interval. Despite its successes, only recently have population forecasters begun to explore the possibilities presented by machine learning. Using 1990 and 2000 census data, we develop 10-year age/gender-structured 2010 population forecasts for 50,965 census tracts in the U.S. using a well-known machine learning technique: boosted regression trees. Using standard ex post facto measures of forecast error (MAPE, MALPE, and MAPE-R), we demonstrate that forecasts based on “out-of-the-box” boosted regression trees have greater accuracy and produce fewer and less extreme outliers than comparison forecasts produced by the Hamilton-Perry method (reported in Baker et al. in Population Res Policy Rev 40:1341–1354, 2021. https://doi.org/10.1007/s11113-020-09601-y).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Methods for Small Area Population Forecasts: State-of-the-Art and Research Needs

Article 16 August 2021

A machine learning approach to small area estimation: predicting the health, housing and well-being of the population of Netherlands

Article Open access 06 June 2022

Spatial weighting improves accuracy in small-area demographic forecasts of urban census tract populations

Article 02 November 2014

Data availability

Data utilized in this publication were obtained from: (1) https://nhgis.org/ and (2) via the U.S. Census Bureau’s Application Programming Interface (API), documented at https://www.census.gov/data/developers/about.html. Secondary posting of NHGIS data is precluded per their policy. Details necessary to re-extract these data are found in Table 1.

Notes

These forecasts excluded any census tract with one or more zeros in its age/gender groups in any of the three study years 1990, 2000 and 2010. These exclusions were made because the cohort-change ratios are not well suited to deal with zeros because they are a ratio, a measure that is undefined when the denominator is zero. Also, the inclusion of zero populations exacerbated the impact of outlying errors on assessments of forecast accuracy.
This study also found that uncontrolled H-P projections are surprising accurate at the census tract level, but that forecast errors were reduced when projections by age/gender were controlled to a total population forecast in a census tract.

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.
Article Google Scholar
Baker, J., Alcantara, A., Ruan, X. M., Ruiz, D., & Crouse, N. (2014a). Sub-County component estimates using administrative records: A case-study in New Mexico. In N. Hoque & L. Potter (Eds.), Emerging Techniques in Applied Demography (pp. 63–80). Springer.
Google Scholar
Baker, J., Alcantara, A., Ruan, X. M., & Watkins, K. (2014b). Spatial weighting improves accuracy and reduces bias in small-area demographic forecasts of urban Populations. Journal of Population Research, 31(4), 345–359.
Article Google Scholar
Baker, J., Alcantara, A., Ruan, X. M., Watkins, K., & Vasan, S. (2013). A Comparative evaluation of accuracy and bias in census tract-level age/sex-specific population estimates: Component I (net-migration) vs Component III (Hamilton-Perry). Population Research and Policy Review, 32(6), 919–942.
Article Google Scholar
Baker, J., Swanson, D., & Tayman, J. (2021). The accuracy of Hamilton-Perry population projections for census tracts in the United States. Population Research and Policy Review, 40, 1341–1354. https://doi.org/10.1007/s11113-020-09601-y
Article Google Scholar
Baker, J., Swanson, D. A., Tayman, J., & Tedrow, L. M. (2017). Cohort change ratios and their applications. Springer.
Book Google Scholar
Belkin, M., Hsu, D., & MA, S., & Mandal, S. (2019). Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proceedings of the National Academies of Science, 16(32), 15849–15854.
Article Google Scholar
Breiman, L. (1996). Heuristics of Instability and Stabilization in Model Selection. The Annals of Statistics, 24(6), 2350–2383.
Article Google Scholar
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification & regression trees. Wadsworth.
Google Scholar
Chi, G., & Wang, D. (2017). Small-area Population Forecasting: A geographically weighted regression approach. 449–471 in D. Swanson (ed): Frontiers in Applied Demography. Springer: Dordrecht, The Netherlands.
Fragoso, T. M., Bertoli, W., & Louzada, F. (2018). Bayesian Model Averaging: A systematic review and conceptual classification. International Statistical Review, 86(1), 1–28.
Article Google Scholar
Freund, Y., & Schapire, R. (1999). A Short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5), 771–780.
Google Scholar
Friedman, J. (1999). Greedy function approximation: A gradient boosting machine. https://biostat.jhsph.edu/~mmccall/articles/friedman_1999.pdf.
Friedman, J. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67.
Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2), 337–407.
Article Google Scholar
Hamilton, C. H., & Perry, J. (1962). A short-cut method for projecting population by age from one decennial census to another. Social Forces, 41, 163–170.
Article Google Scholar
Hastie, T., & Tibshirani, R. (1990). Generalized additive models. Chapman & Hall.
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). New York.
Book Google Scholar
Hauer, M. (2019). Population projections for U.S. counties by age, sex, and race controlled to shared socioeconomic pathways. Scientific Data. https://www.natur e.com/artic les/sdata 20195 .pdf.
Jivetti, B., & Hoque, N. (Eds.). (2020). Population change and public policy. Springer.
Google Scholar
Keyfitz, N. (1982). Choice of function for mortality analysis: Effective forecasting depends on a minimum parameter representation. Theoretical Population Biology, 21, 329–352.
Article Google Scholar
Kintner, H., Merrick, T., Morrison, P., & Voss, P. (Eds.). (1997). Demographics: A casebook for business and government. Westview Press.
Google Scholar
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer.
Book Google Scholar
Lunn, D. J., Simpson, S. N., Diamond, I., & Middleton, L. (1998). The accuracy of age-specific population estimates for small areas in Britain. Population Studies, 52(3), 327–344.
Article Google Scholar
Mueller, J. T., & Santos-Lozada, A. R. (2022). The 2020 U.S. census differential privacy method introduces disproportionate discrepancies for rural and non-white populations. Population Research and Policy Review. https://doi.org/10.1007/s11113-022-09698-3
Article Google Scholar
Pol, L., & Thomas, R. (1997). Demography for business decision-making. Praeger.
Google Scholar
Pol, L., & Thomas, R. (2012). Demography of health Care. Plenum.
Google Scholar
Raftery, A., & Ševčíková, H. (2021). Probabilistic population forecasting: Short to very long-term. International Journal of Forecasting. https://doi.org/10.1016/j.ijforecast.2021.09.001
Article Google Scholar
Rayer, S., & Smith, S. K. (2014). Population projections by age for Florida and its counties: Assessing accuracy and the impact of adjustments. Population Research and Policy Review, 33(5), 747–770.
Article Google Scholar
Rees, P., Norman, P., & Brown, D. (2004). A framework for progressively improving small area population estimates. Journal of the Royal Statistical Society, 167(1), 5–36.
Article Google Scholar
Ruggles, S., & Van Riper, D. (2021). The role of chance in the census bureau database reconstruction experiment. Population Research and Policy Review. https://doi.org/10.1007/s11113-021-09674-3
Article Google Scholar
Schapire, R., & Freund, Y. (2014). Boosting: Foundations & algorithms. MIT Press.
Google Scholar
Siegel, J. S. (2002). Applied demography: Applications to business, government, law and public policy. Academic Press.
Google Scholar
Smith, S., & Shahidullah, M. (1995). An evaluation of population projection errors for census tracts. Journal of the American Statistical Association, 90(429), 64–71.
Article Google Scholar
Smith, S. K., & Tayman, J. (2003). An Evaluation of Population Projections by Age. Demography, 40(4), 741–757.
Article Google Scholar
Smith, S., Tayman, J., & Swanson, D. (2001). State and local population projections: Methodology and analysis. Kluwer Academic Publishers.
Google Scholar
Smith, S., Tayman, J., & Swanson, D. (2013). A practitioner’s guide to state and local population projections. Springer.
Book Google Scholar
Swanson, D., & Tayman, J. (2014). Measuring uncertainty in population forecasts: A new approach. pp. 203–215 in Marco Marsili and Giorgia Capacci (eds.) Proceedings of the 6th EUROSTAT/UNECE Work Session on Demographic Projections. National Institute of Statistics: Rome, Italy.
Swanson, D., Bryan, T., & Sewell, R. (2021). The effect of the differential privacy disclosure avoidance system proposed by the census bureau on 2020 census products: Four case studies of census blocks in Alaska. PAA Affairs, https://www.populationassociation.org/blogs/paa-web1/2021/03/30/the-effect-of-the-differential-privacy-disclosure.
Swanson, D., & Coleman, C. (2007). On the MAPE-R as a measure of cross-sectional estimation & forecast accuracy. Journal of Economic and Social Measurement, 32(4), 219–233.
Article Google Scholar
Swanson, D., & Pol, L. (2004). Contemporary developments in applied demography within the United States. Journal of Applied Social Science, 21(2), 26–56.
Google Scholar
Swanson, D., & Tayman, J. (1999). On the validity of the MAPE as a measure of population forecast accuracy. Population Research and Policy Review, 18(4), 299–322.
Article Google Scholar
Swanson, D., Tayman, J., & Barr, C. F. (2000). A note on the measurement of accuracy for subnational demographic estimates. Demography, 37(2), 193–202.
Article Google Scholar
Swanson, D., Tayman, J., & Bryan, T. (2011). MAPE-R: A rescaled measure of accuracy for cross-sectional, sub-national forecasts. Journal of Population Research, 28, 225–243.
Article Google Scholar
Tayman, J., Smith, S., & Rayer, S. (2011). Evaluating population forecast accuracy: A regression approach using county data. Population Research and Policy Review, 30(2), 235–262.
Article Google Scholar
Tayman, J., Swanson, D., & Barr, C. F. (1999). In search of the ideal measure of accuracy for subnational demographic forecasts. Population Research and Policy Review, 18(5), 387–409.
Article Google Scholar
Tibshirani, R., & Friedman, J. (2020). A pliable lasso. Journal of Computational and Graphical Statistics, 29(1), 215–225.
Article Google Scholar
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistics. Journal of the Royal Statistical Society B, 63(2), 411–423.
Article Google Scholar
Wilson, T. (2016). Evaluation of alternative cohort-component models for local area population forecasts. Population Research and Policy Review., 35, 241–261.
Article Google Scholar
Wilson, T., Grossman, M., Alexander, M., Rees, P., & Temple, J. (2021). Methods for small area population forecasts: State-of-the-art and research needs. Population Research and Policy Review, Online First. https://doi.org/10.1007/s11113-021-09671-6
Article Google Scholar
Wood, S. N. (2017). Generalized additive models: An introduction with R (2nd ed.). Boca Raton, FL.
Book Google Scholar

Download references

Acknowledgements

We thank Tom Wilson, Irina Grossman, and two anonymous reviewers for their helpful comments on earlier drafts of this paper and, more generally, on the methods deployed therein. While we are grateful for this help, any remaining errors in logic or method remain our own.

Funding

The authors did not receive funding or any other form of support from any organization or individual for the submitted work.

Author information

Authors and Affiliations

Farmers Life, Bellevue, WA, 98005, USA
Jack Baker
University of California Riverside, Riverside, CA, 92521, USA
David Swanson
Center for Studies in Demography and Ecology, University of Washington, Seattle, WA, 98195, USA
David Swanson
University of California San Diego, San Diego, CA, 92093, USA
Jeff Tayman

Authors

Jack Baker
View author publications
You can also search for this author in PubMed Google Scholar
David Swanson
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Tayman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jack Baker.

Ethics declarations

Competing interests

The authors did not receive funding or any other form of support from any organization or individual for the submitted work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Baker, J., Swanson, D. & Tayman, J. Boosted Regression Trees for Small-Area Population Forecasting. Popul Res Policy Rev 42, 51 (2023). https://doi.org/10.1007/s11113-023-09795-x

Download citation

Received: 08 July 2022
Accepted: 30 April 2023
Published: 02 June 2023
DOI: https://doi.org/10.1007/s11113-023-09795-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosted Regression Trees for Small-Area Population Forecasting

Abstract

Access this article

Similar content being viewed by others

Methods for Small Area Population Forecasts: State-of-the-Art and Research Needs

A machine learning approach to small area estimation: predicting the health, housing and well-being of the population of Netherlands

Spatial weighting improves accuracy in small-area demographic forecasts of urban census tract populations

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Boosted Regression Trees for Small-Area Population Forecasting

Abstract

Access this article

Similar content being viewed by others

Methods for Small Area Population Forecasts: State-of-the-Art and Research Needs

A machine learning approach to small area estimation: predicting the health, housing and well-being of the population of Netherlands

Spatial weighting improves accuracy in small-area demographic forecasts of urban census tract populations

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation