Journal of Population Research

, Volume 29, Issue 1, pp 91–112 | Cite as

The impact of incomplete geocoding on small area population estimates

  • Jack Baker
  • Adelamar Alcantara
  • Xiaomin Ruan
  • Kendra Watkins


Small-area population estimates are often made using geocoded address data in conjunction with the housing-unit method. Previous research, however, suggests that these data are subject to systematic incompleteness that biases estimates of race, ethnicity, and other important demographic characteristics. This incompleteness is driven largely by an inability to complete georeference address-based datasets. Given these challenges, small-area demographers need further, and to date largely unavailable, information on the amount of error typically introduced by using incompletely geocoded data to estimate population. More specifically, we argue that applied demographers should like to know if these errors are statistically significant, spatially patterned, or systematically related to specific population characteristics. This paper evaluates the impact of incomplete geocoding on accuracy in small-area population estimates, using a Vintage 2000 set of block-group estimates of the household population for the Albuquerque, NM metro area. Precise estimates of the impact of incomplete geocoding on the accuracy of estimates are made, associations with specific demographic characteristics are considered, and a simple potential remediation based on Horvitz-Thompson theory is presented. The implications of these results for the practice of applied demography are reviewed.


Small area estimation Housing-unit method Geocoding 


  1. Aschengrau, A., & Seage, G. (2003). Essentials of epidemiology in public health (2nd ed.). Sudbury: Jones-Bartlett.Google Scholar
  2. Baer, W. C. (1990). Aging of the housing stock and components of inventory change. In D. Myers (Ed.), Housing demography: Linking demographic structure and housing markets (pp. 249–273). Madison: University of Wisconsin.Google Scholar
  3. Baker, J. (2010). Estimating New Mexico municipalities: The devil is in the details (of data). New Mexico Business: Current Report. August.Google Scholar
  4. Belsley, D. A., Kuh, E., & Welch, R. (1980). Regression diagnostics: Identifying influential data and source of collinearity. New York: Wiley.CrossRefGoogle Scholar
  5. Berke, O. (2005). Exploratory spatial relative risk mapping. Preventative Veterinary Medicine, 71, 173–182.CrossRefGoogle Scholar
  6. Boscoe, F. P., McLaughlin, C., Shymura, M. J., & Kelb, C. L. (2003). Visualization for the spatial scan statistic using nested circles. Health and Place, 9, 273–277.CrossRefGoogle Scholar
  7. Brown, W. (2008). Changes to the housing unit stock: Loss of housing units. Presentation at the New York State Data Center Affiliate Meeting. May 15, 2008. New York: West Point.Google Scholar
  8. Bryan, T. (2000). US Census Bureau population estimates and evaluations with loss functions. Statistics in Transition, 4(4), 537–548.Google Scholar
  9. Bryan, T. (2004). Population estimates. In J. Siegel & D. Swanson (Eds.), The methods and materials of demography. New York: Springer.Google Scholar
  10. Casella, G., & George, E. (1992). Explaining the Gibbs sampler. The American Statistician, 46(3), 167–174.CrossRefGoogle Scholar
  11. Christensen, R. (1996). Log-linear models and logistic regression (2nd ed.). New York: Springer.Google Scholar
  12. Coleman, J. S. (1964). Introduction to mathematical sociology. New York: Free Press.Google Scholar
  13. De Bruin, S., & Bregt, A. (2001). Assessing fitness for use: The expected value of spatial datasets. International Journal of Geographical Information Science, 15(5), 457–471.CrossRefGoogle Scholar
  14. Drummond, W. J. (1995). Address matching: GIS technology for mapping human activity patterns. Journal of the American Planning Association, 61(2), 240–251.CrossRefGoogle Scholar
  15. Dwass, M. (1957). Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28, 181–187.CrossRefGoogle Scholar
  16. ESRI. (2009). Creating a composite address locator. Online at:
  17. Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically-weighted regression: The analysis of spatially-varying relationships. West Sussex: Wiley.Google Scholar
  18. Gilboa, S. M. (2006). Comparison of residential geocoding methods in a population-based study of air quality and birth defects. Environmental Research, 101, 256–262.CrossRefGoogle Scholar
  19. Goldberg, D. W., Wilson, J. P., & Knoblock, C. A. (2007). From text to geographic coordinates: The current state of geocoding. URISA Journal, 19(1), 33–46.Google Scholar
  20. Haining, R. (2003). Spatial data analysis: Theory and practice. New York: Cambridge.CrossRefGoogle Scholar
  21. Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.CrossRefGoogle Scholar
  22. Hough, G., & Swanson, D. (2006). An evaluation of the American community survey: Results from the oregon test site. Population Research and Policy Review, 25, 257–273.Google Scholar
  23. Jarosz, B. (2008). Using assessor parcel data to maintain housing unit counts for small area population estimates. In S. Murdock & D. Swanson (Eds.), Applied demography in the 21 st century. (pp. 89–101). New York: Springer.Google Scholar
  24. Judson, D., & Popoff, C. A. (2004). Selected general methods. In J. S. Siegel & D. Swanson (Eds.), The methods and materials of demography (pp. 644–675). New York: Springer.Google Scholar
  25. Jung, I., Kuldorff, M., & Klassen, A. (2007). A spatial scan statistic for ordinal data. Statistics in Medicine, 26, 1594–1607.CrossRefGoogle Scholar
  26. Karimi, H. A., & Durcik, M. (2004). Evaluation of uncertainties associated with geocoding techniques. Computer-Aided Civil and Infrastructural Engineering, 19, 170–185.CrossRefGoogle Scholar
  27. Kuldorff, M. (1997). A spatial scan statistic. Communication in Statistics: Theory and Methods, 26, 1481–1496.CrossRefGoogle Scholar
  28. Kuldorff, M. (1999). An isotonic spatial scan statistic for geographical disease surveillance. Journal of the National Institute of Public Health, 48, 94–101.Google Scholar
  29. Kuldorff, M., Heffernan, R., Hartman, J., Assuncao, R. M., & Mostashari, F. (2005). A space-time permutation scan statistic for the early detection of disease outbreaks. PloS Medicine, 2, 216–224.CrossRefGoogle Scholar
  30. Kuldorff, M., & Nagarwala, N. (1995). Spatial disease clusters: Detection and inference. Statistics in Medicine, 14, 799–810.CrossRefGoogle Scholar
  31. Le Sage, J., & Pace, K. R. (2004). Models for spatially-dependent missing data. Journal of Real Estate Finance and Economics, 29(2), 233–254.CrossRefGoogle Scholar
  32. Little, R., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.Google Scholar
  33. Little, R., & Schenker, N. (1994). Missing data. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook for statistical modeling in the social and behavioral sciences (pp. 39–75). New York: Plenum.Google Scholar
  34. Long, J. C. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks: Sage.Google Scholar
  35. Lunn, D., Simpson, S., Diamond, I., & Middleton, L. (1998). The accuracy of age-specific population estimates for small areas in Britain. Population Studies, 52, 327–344.CrossRefGoogle Scholar
  36. Murdock, S., & Ellis, D. (1991). Applied demography: An introduction to basic concepts, methods, and data. Boulder: Westview Press.Google Scholar
  37. National Research Council. (1980). Estimating population and income of small areas. Washington, DC: National Academy Press.Google Scholar
  38. National Research Council. (2010). Coverage measurement in the 2010 Census. In Robert Bell & Michael Cohen (Eds.). Washington DC: National Academies of Science.Google Scholar
  39. Naus, J. L. (1965). Clustering of random points in two dimensions. Biometrika, 52, 263–267.CrossRefGoogle Scholar
  40. Neill, D. B. (2009). An empirical comparison of spatial scan statistics for outbreak detection. International Journal of Health Geographics, 8(20), 1–16.Google Scholar
  41. Neter, J., Kutner, M., Wasserman, M., & Nachtshem, C. (1999). Applied linear statistical models (4th ed.). New York: McGraw-Hill.Google Scholar
  42. Oliver, M. N. (2005). Geographic bias related to geocoding in epidemiologic studies. International Journal of Health Geographics. 4(29):Online.Google Scholar
  43. Perrone, S. (2008). Address coverage improvement and evaluation program—2005 National estimate for coverage of the master address file. In S. H. Murdock & D. Swanson (Eds.), Applied demography in the 21 st century (pp. 37–85). New York: Springer.CrossRefGoogle Scholar
  44. Pollack, L. A., Gotway, C. A., Bates, J. H., Parihk-Patel, A., Richards, T., Seef, L. C., et al. (2006). Use of the spatial scan statistic to identify geographic variations in late-stage colorectal cancer in California (United States). Cancer Causes and Control, 17, 449–457.CrossRefGoogle Scholar
  45. Ratcliffe, J. H. (2001). On the accuracy of Tiger-type geocoded address data in relation to cadastral and census area units. International Journal of Geographical Information Science, 15(5), 473–485.CrossRefGoogle Scholar
  46. Ruan, X. M., Alcantara, A., Baker, J. (2008). Potential and pitfalls of geocoding for spatial demography and population estimates. The Map Legend. December.Google Scholar
  47. Rushton, G. (2006). Geocoding in cancer research: A review. American Journal of Preventitive Medicine, 30(2S), S16–S24.CrossRefGoogle Scholar
  48. Samford, M. R. (1967). On sampling without replacement with unequal probabilities of selection. Biometrika, 45, 499–513.Google Scholar
  49. Shahidullah, M., & Flotow, M. (2005). Criteria for selecting a suitable method for producing post-2000 county population estimates: A case study of population estimates in Illinois. Population Research and Policy Review, 24, 215–229.CrossRefGoogle Scholar
  50. Smith, S., & Mandell, M. (1984). A comparison of population estimation methods: Housing unit versus component II, ratio correlation, and administrative records. Journal of the American Statistical Association, 79(386), 282–289.CrossRefGoogle Scholar
  51. Smith, S., & Shahidullah, M. (1995). An evaluation of projection errors for census tracts. Journal of the American Statistical Association, 90(429), 64–71.CrossRefGoogle Scholar
  52. Smith, S., Tayman, J., & Swanson, D. (1999). State and local population projections: Methodology and analysis. New York: Plenum.Google Scholar
  53. Sprott, J. C. (2004). A method for approximating missing data in spatial patterns. Computers and Graphics, 28, 113–117.CrossRefGoogle Scholar
  54. Starcynik, D., & Zitter, M. (1968). Accuracy of the housing unit method in preparing population estimates for cities. Demography, 5, 475–484.Google Scholar
  55. Swanson, D., & Pol, L. (2005). Contemporary developments in applied demography within the United States. Journal of Applied Sociology, 21(2), 26–56.Google Scholar
  56. Tayman, J., & Swanson, D. (1999). On the validity of MAPE as a measure of population forecast accuracy. Population Research and Policy Review, 18, 299–322.CrossRefGoogle Scholar
  57. Turnbull, B. W., Iwano, E. J., Burnett, W. S., Howe, H. L., & Clark, L. C. (1990). Monitoring for clustering of disease: Application to leukemia incidence in upstate New York. American Journal of Epidemiology, 67, 425–428.Google Scholar
  58. Voss, P. (2007). Demography as a spatial social science. Population Research and Policy Review, 26, 457–476.Google Scholar
  59. Wallenstein, S., Naus, J., & Glas, J. (1993). Power of the scan statistic for detection of clustering. Statistics in Medicine, 12, 1829–1843.CrossRefGoogle Scholar
  60. Weinstock, M. A. (1981). A generalized scan statistic test for the detection of clusters. International Journal of Epidemiology, 10, 289–293.CrossRefGoogle Scholar
  61. Witmer, J. A., & Samuels, M. L. (1998). Statistics for the life sciences. New York: Sinauer.Google Scholar
  62. Zandbergen, P. (2009). Geocoding quality and implications for spatial analysis. The Geography Compass, 3(2), 647–680.Google Scholar
  63. Zandbergen, P., & Ignizio, D. (2010). Comparison of dasymetric mapping techniques for small area population estimates. Cartography and Geographic Information Science, 37(3), 199–214.CrossRefGoogle Scholar
  64. Zhang, J., & Yu, K. F. (1998). What’s the relative risk? A method for correcting the odds ratio in cohort studies of common outcomes. Journal of the American Medical Association, 280(19), 1690–1691.CrossRefGoogle Scholar

Copyright information

© Springer Science & Business Media B.V. 2011

Authors and Affiliations

  • Jack Baker
    • 1
  • Adelamar Alcantara
    • 2
  • Xiaomin Ruan
    • 1
  • Kendra Watkins
    • 3
  1. 1.Geospatial and Population StudiesUniversity of New MexicoAlbuquerqueUSA
  2. 2.Department of Geography, Geospatial and Population StudiesUniversity of New MexicoAlbuquerqueUSA
  3. 3.Mid-Region Council of GovernmentsAlbuquerqueUSA

Personalised recommendations