Skip to main content

An Aggregate Learning Approach for Interpretable Semi-supervised Population Prediction and Disaggregation Using Ancillary Data

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019)

Abstract

Census data provide detailed information about population characteristics at a coarse resolution. Nevertheless, fine-grained, high-resolution mappings of population counts are increasingly needed to characterize population dynamics and to assess the consequences of climate shocks, natural disasters, investments in infrastructure, development policies, etc. Disaggregating these census is a complex machine learning, and multiple solutions have been proposed in past research. We propose in this paper to view the problem in the context of the aggregate learning paradigm, where the output value for all training points is not known, but where it is only known for aggregates of the points (i.e. in this context, for regions of pixels where a census is available). We demonstrate with a very simple and interpretable model that this method is on par, and even outperforms on some metrics, the state-of-the-art, despite its simplicity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  2. Briggs, D.J., Gulliver, J., Fecht, D., Vienneau, D.M.: Dasymetric modelling of small-area population distribution using land cover and light emissions data. Remote Sens. Environ. 108(4), 451–466 (2007). https://doi.org/10.1016/j.rse.2006.11.020

    Article  Google Scholar 

  3. Center for International Earth Science Information Network - CIESIN - Columbia University: Gridded population of the world, Version 4 (GPWv4): Population density, Revision 10, 11 July 2018 (2017). https://doi.org/10.7927/H4DZ068D

  4. Center for International Earth Science Information Network - CIESIN - Columbia University: U.S. census grids 2010 (Summary file 1), 19 July 2018 (2017). https://doi.org/10.7927/H40Z716C

  5. Dmowska, A., Stepinski, T.F.: High resolution dasymetric model of U.S. demographics with application to spatial distribution of racial diversity. Appl. Geogr. 53, 417–426 (2014). https://doi.org/10.1016/j.apgeog.2014.07.003

    Article  Google Scholar 

  6. Doupe, P., Bruzelius, E., Faghmous, J., Ruchman, S.G.: Equitable development through deep learning: the case of sub-national population density estimation. In: Proceedings of the 7th Annual Symposium on Computing for Development, DEV 2016, pp. 6:1–6:10. ACM, New York (2016). https://doi.org/10.1145/3001913.3001921

  7. Eicher, C.L., Brewer, C.A.: Dasymetric mapping and areal interpolation: implementation and evaluation. Cartogr. Geogr. Inf. Sci. 28(2), 125–138 (2001)

    Article  Google Scholar 

  8. Flowerdew, R., Green, M.: Developments in areal interpolation methods and GIS. In: Fischer, M.M., Nijkamp, P. (eds.) Geographic Information Systems, Spatial Modelling and Policy Evaluation, pp. 73–84. Springer, Heidelberg (1993). https://doi.org/10.1007/978-3-642-77500-0_5

    Chapter  Google Scholar 

  9. Gallego, F.J.: A population density grid of the European union. Popul. Environ. 31(6), 460–473 (2010). https://doi.org/10.1007/s11111-010-0108-y

    Article  Google Scholar 

  10. Goodchild, M.F., Anselin, L., Deichmann, U.: A framework for the areal interpolation of socioeconomic data. Environ. Plan. A 25(3), 383–397 (1993)

    Article  Google Scholar 

  11. Hahnloser, R.H., Sarpeshkar, R., Mahowald, M.A., Douglas, R.J., Seung, H.S.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405(6789), 947 (2000)

    Article  Google Scholar 

  12. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980

  13. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  14. Mennis, J.: Generating surface models of population using dasymetric mapping. Prof. Geogr. 55(1), 31–42 (2003)

    Google Scholar 

  15. Monmonier, M.S., Schnell, G.A.: Land use and land cover data and the mapping of population density. Int. Yearb. Cartogr. 24(115), e121 (1984)

    Google Scholar 

  16. Musicant, D.R., Christensen, J.M., Olson, J.F.: Supervised learning by training on aggregate outputs. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 252–261. IEEE (2007)

    Google Scholar 

  17. Paszke, A., et al.: Automatic differentiation in PyTorch (2017)

    Google Scholar 

  18. Robinson, C., Hohman, F., Dilkina, B.: A deep learning approach for population estimation from satellite imagery. In: Proceedings of the 1st ACM SIGSPATIAL Workshop on Geospatial Humanities, pp. 47–54. ACM (2017)

    Google Scholar 

  19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  20. Stevens, F.R., Gaughan, A.E., Linard, C., Tatem, A.J.: Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. Plos One 10(2), 1–22 (2015). https://doi.org/10.1371/journal.pone.0107042

    Article  Google Scholar 

  21. Tian, Y., Yue, T., Zhu, L., Clinton, N.: Modeling population density using land cover data. Ecol. Model. 189(1–2), 72–88 (2005)

    Article  Google Scholar 

  22. Tobler, W.R.: Smooth pycnophylactic interpolation for geographical regions. J. Am. Stat. Assoc. 74(367), 519–530 (1979)

    Article  MathSciNet  Google Scholar 

  23. UN Economic and Social Council: Resolution adopted by the economic and social council on 10 June 2015 (2020 world population and housing census programme), August 2015. http://www.un.org/ga/search/view_doc.asp?symbol=E/RES/2015/10

  24. Wright, J.K.: A method of mapping densities of population: with cape cod as an example. Geogr. Rev. 26(1), 103–110 (1936)

    Article  Google Scholar 

Download references

Acknowledgments

Computational resources have been provided by the supercomputing facilities of the Université catholique de Louvain (CISM/UCL) and the Consortium des Équipements de Calcul Intensif en Fédération Wallonie Bruxelles (CÉCI) funded by the Fond de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under convention 2.5020.11. We would like to thank Pavel Demin and the CP3 group that shared with us part of their reserved resources. The second and third authors acknowledge financial support from the ARC convention on “New approaches to understanding and modeling global migration trends” (convention 18/23-091).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillaume Derval .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 96 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Derval, G., Docquier, F., Schaus, P. (2020). An Aggregate Learning Approach for Interpretable Semi-supervised Population Prediction and Disaggregation Using Ancillary Data. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11908. Springer, Cham. https://doi.org/10.1007/978-3-030-46133-1_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46133-1_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46132-4

  • Online ISBN: 978-3-030-46133-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics