A hybrid approach for the spatial disaggregation of socio-economic indicators

Abstract

While statistical information on socio-economic activities is widely available, the data are often collected or released only at a relatively aggregated level. In these aggregated forms, the data are useful for broad-scale assessments, although we often need to disaggregate the source data in order to provide more localized estimates, and in order to analyze correlations against geophysical variables. Spatial disaggregation techniques can be used in this context, to transform data from a set of source zones into a set of target zones, with different geometry and with a higher general level of spatial resolution. Still, few previous studies in the area have attempted to leverage state-of-the-art spatial disaggregation procedures in the context of socio-economic variables, instead focusing on applications related to population modeling. In this article, we report on experiments with a hybrid spatial disaggregation technique that combines state-of-the-art regression analysis procedures with the classic methods of dasymetric mapping and pycnophylactic interpolation. The hybrid procedure was used together with population density, land coverage, nighttime satellite imagery, and OpenStreetMap road density, as ancillary data to disaggregate different types of socio-economic indicators to a high-resolution grid. Our test specifically leveraged data relative to the Portuguese territory, resulting in the production of raster datasets with a resolution of 30 arc-seconds per cell. The article discusses the spatial disaggregation methodology and the quality of the obtained results under different experimental conditions.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    http://www.openstreetmap.org.

  2. 2.

    http://www.rdocumentation.org/packages/sp/versions/1.2-3/topics/spsample.

  3. 3.

    http://www.r-project.org.

  4. 4.

    http://cran.r-project.org/web/views/Spatial.html.

  5. 5.

    http://cran.r-project.org/web/packages/pycno/.

  6. 6.

    https://github.com/pierreroudier/dissever.

  7. 7.

    http://cran.r-project.org/web/packages/caret.

  8. 8.

    http://beta.sedac.ciesin.columbia.edu/data/collection/gpw-v4.

  9. 9.

    http://ngdc.noaa.gov/eog/viirs/download_monthly.html.

  10. 10.

    http://land.copernicus.eu/pan-european/corine-land-cover/clc-2012/view.

  11. 11.

    http://github.com/tyrasd/osm-node-density.

  12. 12.

    http://download.geofabrik.de/europe.html.

  13. 13.

    https://www.flickr.com/services/api.

  14. 14.

    https://dev.twitter.com.

References

  1. 1.

    Andersen, R.: Modern Methods for Robust Regression. No. 152 in Quantitative Applications in the Social Sciences. Sage Publications, Thousand Oaks (2008)

    Google Scholar 

  2. 2.

    Antoni, J.P., Vuidel, G., Aupet, J.B., Aube, J.: Generating a located synthetic population: a prerequisite to agent-based urban modelling. In: Proceedings of the European Colloquium of Quantitative and Theoretical Geography (2011)

  3. 3.

    Antoni, J.P., Vuidel, G., Klein, O.: Generating a located synthetic population of individuals, households, and dwellings. Working Paper Series, Luxembourg Institute of Socio-Economic Research (2017)

  4. 4.

    Antoniou, V., Fonte, C.C., See, L., Estima, J., Arsanjani, J.J., Lupia, F., Minghini, M., Foody, G., Fritz, S.: Investigating the feasibility of geo-tagged photographs as sources of land cover input data. ISPRS Int. J. GeoInf. 5(5), 64 (2016)

    Article  Google Scholar 

  5. 5.

    Bivand, R.S., Pebesma, E., Gmez-Rubio, V.: Applied Spatial Data Analysis with R. Springer, Berlin (2012)

    Google Scholar 

  6. 6.

    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  7. 7.

    Briggs, D.J., Gulliver, J., Fecht, D., Vienneau, D.M.: Dasymetric modelling of small-area population distribution using land cover and light emissions data. Remote Sens. Environ. 108(4), 451–466 (2007)

    Article  Google Scholar 

  8. 8.

    Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 7(3), 1247–1250 (2014)

    Article  Google Scholar 

  9. 9.

    Chambers, R., Tzavidis, N.: M-quantile models for small area estimation. Biometrika 93(2), 255–268 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  10. 10.

    Chandra, H., Salvati, N., Chambers, R., Tzavidis, N.: Small area estimation under spatial nonstationarity. Comput. Stat. Data Anal. 56(10), 2875–2888 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  11. 11.

    Cunha, E., Martins, B.: Using one-class classifiers and multiple kernel learning for defining imprecise geographic regions. Int. J. Geogr. Inf. Sci. 28(11), 2220–2241 (2014)

    Article  Google Scholar 

  12. 12.

    Deville, P., Linard, C., Martin, S., Gilbert, M., Stevens, F.R., Gaughan, A.E., Blondel, V.D., Tatem, A.J.: Dynamic population mapping using mobile phone data. Proc. Natl. Acad. Sci. 111(45), 15888–15893 (2014)

    Article  Google Scholar 

  13. 13.

    Doll, C.N.H., Muller, J.P., Elvidge, C.: Night-time imagery as a tool for global mapping of socio-economic parameters and greenhouse gas emissions. Ambio 29(3), 157–162 (2000)

    Article  Google Scholar 

  14. 14.

    Douglass, R., Meyer, D., Ram, M., Rideout, D., Song, D.: High resolution population estimates from telecommunications data. Euro. Phys. J. Data Sci. 4(1), 4 (2015)

    Google Scholar 

  15. 15.

    Eicher, C.L., Brewer, C.A.: Dasymetric mapping and areal interpolation: Implementation and evaluation. Cartogr. Geogr. Inf. Sci. 28(2), 125–138 (2001)

    Article  Google Scholar 

  16. 16.

    Elvidge, C., Erwin, E., Baugh, K., Ziskin, D., Tuttle, B., Ghosh, T., Sutton, P.: Overview of dmsp nightime lights and future possibilities. In: Proceedings of the Joint Urban Remote Sensing Event, pp. 1–5 (2009)

  17. 17.

    Elvidge, C.D., Baugh, K.E., Kihn, E.A., Kroehl, H.W., Davis, E.R., Davis, C.: Relation between satellite observed visible to near infrared emissions, population, and energy consumption. Int. J. Remote Sens. 18(6), 1373–1379 (1997)

    Article  Google Scholar 

  18. 18.

    Fisher, P.F., Langford, M.: Modelling the errors in areal interpolation between zonal systems by monte carlo simulation. Environ. Plann. A 27(2), 211–224 (1995)

    Article  Google Scholar 

  19. 19.

    Fotheringham, A.S., Brunsdon, C., Charlton, M.E.: Geographically Weighted Regression : The Analysis of Spatially Varying Relationships. Wiley, Hoboken (2002)

    Google Scholar 

  20. 20.

    Gallego, F.J.: A population density grid of the European Union. Popul. Environ. 31(6), 460–473 (2010)

    Article  Google Scholar 

  21. 21.

    García-Palomares, J.C., Gutiérrez, J., Mínguez, C.: Identification of tourist hot spots based on social networks: a comparative analysis of european metropolises using photo-sharing services and GIS. Appl. Geogr. 63(1), 408–417 (2015)

    Article  Google Scholar 

  22. 22.

    Giri, C.P.: Remote Sensing of Land Use and Land Cover: Principles and Applications. CRC Press, Boca Raton (2012)

    Google Scholar 

  23. 23.

    Giusti, C., Tzavidis, N., Pratesi, M., Salvati, N.: Resistance to outliers of M-quantile and robust random effects small area models. Commun. Stat. Simul. Comput. 43(3), 549–568 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  24. 24.

    Goerlich, F.J., Cantarino, I.: A population density grid for spain. Int. J. Geogr. Inf. Sci. 27(12), 2247–2263 (2013)

    Article  Google Scholar 

  25. 25.

    Goodchild, M.F., Anselin, L., Deichmann, U.: A framework for the areal interpolation of socioeconomic data. Environ. Plan. A 25(3), 383–397 (1993)

    Article  Google Scholar 

  26. 26.

    Goodchild, M.F., Lam, N.S.N.: Areal interpolation: a variant of the traditional spatial problem. Department of Geography, University of Western Ontario London, Canada (1980)

    Google Scholar 

  27. 27.

    Gregory, I.N.: The accuracy of areal interpolation techniques: standardising 19th and 20th century census data to allow long-term comparisons. Comput. Environ. Urban Syst. 26(4), 293–314 (2002)

    Article  Google Scholar 

  28. 28.

    Gupta, M.R., Chen, Y.: Theory and use of the EM algorithm. Found. Trends Signal Process. 4(3), 223–296 (2010)

    Article  MATH  Google Scholar 

  29. 29.

    Harris, P., Brunsdon, C., Fotheringham, A.S.: Links, comparisons and extensions of the geographically weighted regression model when used as a spatial predictor. Stoch. Environ. Res. Risk Assess. 25(2), 123–138 (2011)

    Article  Google Scholar 

  30. 30.

    Harris, P., Fotheringham, A., Crespo, R., Charlton, M.: The use of geographically weighted regression for spatial prediction: an evaluation of models using simulated data sets. Math. Geosci. 42(6), 657–680 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  31. 31.

    Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman & Hall, Boca Raton (1990)

    Google Scholar 

  32. 32.

    Hawelka, B., Sitko, I., Beinat, E., Sobolevsky, S., Kazakopoulos, P., Ratti, C.: Geo-located Twitter as proxy for global mobility patterns. Cartogr. Geogr. Inf. Sci. 41(3), 260–271 (2014)

    Article  Google Scholar 

  33. 33.

    Hawley, K., Moellering, H.: A comparative analysis of areal interpolation methods. Cartogr. Geogr. Inf. Sci. 32(4), 411–423 (2005)

    Article  Google Scholar 

  34. 34.

    Heymann Y., S.C.C.G., Bossard, M.: CORINE land cover technical guide. Technical Report EUR12585, Office for Official Publications of the European Communities (1994)

  35. 35.

    Kádár, B.: Measuring tourist activities in cities using geotagged photography. Tour. Geogr. 16(1), 88–104 (2014)

    Article  Google Scholar 

  36. 36.

    Kádár, B., Gede, M.: Where do tourists go? Visualizing and analysing the spatial distribution of geotagged photography. Cartogr. Int. J. Geogr. Inf. Geovis. 48(2), 78–88 (2013)

    Google Scholar 

  37. 37.

    Kim, G., Barros, A.P.: Downscaling of remotely sensed soil moisture with a modified fractal interpolation method using contraction mapping and ancillary data. Remote Sens. Environ. 83(3), 400–413 (2002)

    Article  Google Scholar 

  38. 38.

    Kim, H., Yao, X.: Pycnophylactic interpolation revisited: integration with the dasymetric-mapping method. Int. J. Remote Sens. 31(21), 5657–5671 (2010)

    Article  Google Scholar 

  39. 39.

    Kuhn, M.: Building predictive models in R using the caret package. J. Stat. Softw. 28(5), 1–26 (2008)

    Article  Google Scholar 

  40. 40.

    Langford, M.: Rapid facilitation of dasymetric-based population interpolation by means of raster pixel maps. Comput. Environ. Urban Syst. 31(1), 19–32 (2007)

    Article  Google Scholar 

  41. 41.

    Li, D., Zhao, X., Li, X.: Remote sensing of human beings a perspective from nighttime light. Geospat. Inf. Sci. 19(1), 69–79 (2016)

    Article  Google Scholar 

  42. 42.

    Lin, J., Cromley, R., Zhang, C.: Using geographically weighted regression to solve the areal interpolation problem. Ann. GIS 17(1), 1–14 (2011)

    Article  Google Scholar 

  43. 43.

    Lin, J., Cromley, R.G.: Evaluating geo-located Twitter data as a control layer for areal interpolation of population. Appl. Geogr. 58(1), 41–47 (2015)

    Article  Google Scholar 

  44. 44.

    Longley, P.A., Adnan, M., Lansley, G.: The geotemporal demographics of Twitter usage. Environ. Plan. A 47(2), 465–484 (2015)

    Article  Google Scholar 

  45. 45.

    Malone, B.P., McBratney, A.B., Minasny, B., Wheeler, I.: A general method for downscaling Earth resource information. Comput. Geosci. 41(1), 119–125 (2012)

    Article  Google Scholar 

  46. 46.

    Nagle, N.N., Buttenfield, B.P., Leyk, S., Spielman, S.: Dasymetric modeling and uncertainty. Ann. Assoc. Am. Geogr. 104(1), 80–95 (2014)

    Article  Google Scholar 

  47. 47.

    Nordhaus, W.D.: Alternative Approaches to Spatial Rescaling. Technical Report. Yale University, New Haven (2003)

  48. 48.

    Nordhaus, W.D.: Geography and macroeconomics: new data and new findings. Proc. Natl. Acad. Sci. 103(10), 3510–3517 (2006)

    Article  Google Scholar 

  49. 49.

    Patel, N.N., Stevens, F.R., Huang, Z., Gaughan, A.E., Elyazar, I., Tatem, A.J.: Improving large area population mapping using geotweet densities. Trans. GIS 21(2), 317–331 (2016)

    Article  Google Scholar 

  50. 50.

    Paul, M.J., Dredze, M., Broniatowski, D.: Twitter improves influenza forecasting. PLoS Curr. 6(1), 18 (2014)

    Google Scholar 

  51. 51.

    Quinlan, R.J.: Learning with continuous classes. In: Proceedings of the Australian Joint Conference On Artificial Intelligence, pp. 343–348 (1992)

  52. 52.

    Reibel, M., Bufalino, M.E.: Street-weighted interpolation techniques for demographic count estimation in incompatible zone systems. Environ. Plan. A 37(1), 127–139 (2005)

    Article  Google Scholar 

  53. 53.

    Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, Hoboken (2005)

    Google Scholar 

  54. 54.

    Salvati, N., Tzavidis, N., Pratesi, M., Chambers, R.: Small area estimation via M-quantile geographically weighted regression. Test 21(1), 1–28 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  55. 55.

    Schmid, T., Münnich, R.T.: Spatial robust small area estimation. Stat. Pap. 55(3), 653–670 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  56. 56.

    Sémécurbe, F., Tannier, C., Roux, S.G.: Spatial distribution of human population in france: Exploring the modifiable areal unit problem using multifractal analysis. Geogr. Anal. 48(3), 292–313 (2016)

    Article  Google Scholar 

  57. 57.

    Batista e Silva, F., Gallego, J., Lavalle, C.: A high-resolution population grid map for europe. J. Maps 9(1), 16–28 (2013)

    Article  Google Scholar 

  58. 58.

    Stevens, F.R., Gaughan, A.E., Linard, C., Tatem, A.J.: Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PLoS ONE 10(2), 1–22 (2015)

    Article  Google Scholar 

  59. 59.

    Tobler, W.: A computer movie simulating urban growth in the detroit region. Econ. Geogr. 46(2), 234–240 (1970)

    Article  Google Scholar 

  60. 60.

    Tobler, W.: Smooth pycnophylactic interpolation for geographical regions. J. Am. Stat. Assoc. 74(367), 519–530 (1979)

    MathSciNet  Article  Google Scholar 

  61. 61.

    Tobler, W., Deichmann, U., Gottsegen, J., Maloy, K.: The Global Demography Project. Technical Report 95-6, National Center for Geographic Information and Analysis, Santa Barbara (1995)

  62. 62.

    Vega, K.V.A.: Aplicacin de la Interpolacin Fractal en Downscaling de Imgenes Satelitales NOAA-AVHRR de Temperatura de Superficie en Terrenos de Topografia Compleja. Ph.D. thesis, Universidad de Chile (2012)

  63. 63.

    Whitworth, A., Carter, E., Ballas, D., Moon, G.: Estimating uncertainty in spatial microsimulation approaches to small area estimation: a new approach to solving an old problem. Comput. Environ. Urban Syst. 63, 50–57 (2016)

    Article  Google Scholar 

  64. 64.

    Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30(1), 79–82 (2005)

    Article  Google Scholar 

  65. 65.

    Wu, Ss, Qiu, X., Wang, L.: Population estimation methods in GIS and remote sensing: a review. GISci. Remote Sens. 42(1), 80–96 (2005)

    Article  Google Scholar 

  66. 66.

    Xu, G., Xu, X., Liu, M., Sun, A.Y., Wang, K.: Spatial downscaling of TRMM precipitation product using a combined multifractal and regression approach: demonstration for South China. Water 7(6), 3083–3102 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported through Fundação para a Ciência e Tecnologia (FCT), through project grants with references PTDC/EEI-SCR/1743/2014 (Saturn) and EXPL/EEI-ESS/0427/2013 (KD-LBSN), as well as through the INESC-ID multi-annual funding from the PIDDAC programme (UID/CEC/50021/2013).

Author information

Affiliations

Authors

Corresponding author

Correspondence to João Monteiro.

Appendix

Appendix

Table 5 The socio-economic variables considered in our case study

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Monteiro, J., Martins, B. & Pires, J.M. A hybrid approach for the spatial disaggregation of socio-economic indicators. Int J Data Sci Anal 5, 189–211 (2018). https://doi.org/10.1007/s41060-017-0080-z

Download citation

Keywords

  • Spatial analysis
  • Downscaling
  • Geographic information systems
  • Regression-based spatial disaggregation
  • Socio-economic indicators