Skip to main content
Log in

The impact of incomplete geocoding on small area population estimates

  • Published:
Journal of Population Research Aims and scope Submit manuscript

Abstract

Small-area population estimates are often made using geocoded address data in conjunction with the housing-unit method. Previous research, however, suggests that these data are subject to systematic incompleteness that biases estimates of race, ethnicity, and other important demographic characteristics. This incompleteness is driven largely by an inability to complete georeference address-based datasets. Given these challenges, small-area demographers need further, and to date largely unavailable, information on the amount of error typically introduced by using incompletely geocoded data to estimate population. More specifically, we argue that applied demographers should like to know if these errors are statistically significant, spatially patterned, or systematically related to specific population characteristics. This paper evaluates the impact of incomplete geocoding on accuracy in small-area population estimates, using a Vintage 2000 set of block-group estimates of the household population for the Albuquerque, NM metro area. Precise estimates of the impact of incomplete geocoding on the accuracy of estimates are made, associations with specific demographic characteristics are considered, and a simple potential remediation based on Horvitz-Thompson theory is presented. The implications of these results for the practice of applied demography are reviewed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aschengrau, A., & Seage, G. (2003). Essentials of epidemiology in public health (2nd ed.). Sudbury: Jones-Bartlett.

    Google Scholar 

  • Baer, W. C. (1990). Aging of the housing stock and components of inventory change. In D. Myers (Ed.), Housing demography: Linking demographic structure and housing markets (pp. 249–273). Madison: University of Wisconsin.

    Google Scholar 

  • Baker, J. (2010). Estimating New Mexico municipalities: The devil is in the details (of data). New Mexico Business: Current Report. August.

  • Belsley, D. A., Kuh, E., & Welch, R. (1980). Regression diagnostics: Identifying influential data and source of collinearity. New York: Wiley.

    Book  Google Scholar 

  • Berke, O. (2005). Exploratory spatial relative risk mapping. Preventative Veterinary Medicine, 71, 173–182.

    Article  Google Scholar 

  • Boscoe, F. P., McLaughlin, C., Shymura, M. J., & Kelb, C. L. (2003). Visualization for the spatial scan statistic using nested circles. Health and Place, 9, 273–277.

    Article  Google Scholar 

  • Brown, W. (2008). Changes to the housing unit stock: Loss of housing units. Presentation at the New York State Data Center Affiliate Meeting. May 15, 2008. New York: West Point.

    Google Scholar 

  • Bryan, T. (2000). US Census Bureau population estimates and evaluations with loss functions. Statistics in Transition, 4(4), 537–548.

    Google Scholar 

  • Bryan, T. (2004). Population estimates. In J. Siegel & D. Swanson (Eds.), The methods and materials of demography. New York: Springer.

    Google Scholar 

  • Casella, G., & George, E. (1992). Explaining the Gibbs sampler. The American Statistician, 46(3), 167–174.

    Article  Google Scholar 

  • Christensen, R. (1996). Log-linear models and logistic regression (2nd ed.). New York: Springer.

    Google Scholar 

  • Coleman, J. S. (1964). Introduction to mathematical sociology. New York: Free Press.

    Google Scholar 

  • De Bruin, S., & Bregt, A. (2001). Assessing fitness for use: The expected value of spatial datasets. International Journal of Geographical Information Science, 15(5), 457–471.

    Article  Google Scholar 

  • Drummond, W. J. (1995). Address matching: GIS technology for mapping human activity patterns. Journal of the American Planning Association, 61(2), 240–251.

    Article  Google Scholar 

  • Dwass, M. (1957). Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28, 181–187.

    Article  Google Scholar 

  • ESRI. (2009). Creating a composite address locator. Online at: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00250000003r000000.htm.

  • Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically-weighted regression: The analysis of spatially-varying relationships. West Sussex: Wiley.

    Google Scholar 

  • Gilboa, S. M. (2006). Comparison of residential geocoding methods in a population-based study of air quality and birth defects. Environmental Research, 101, 256–262.

    Article  Google Scholar 

  • Goldberg, D. W., Wilson, J. P., & Knoblock, C. A. (2007). From text to geographic coordinates: The current state of geocoding. URISA Journal, 19(1), 33–46.

    Google Scholar 

  • Haining, R. (2003). Spatial data analysis: Theory and practice. New York: Cambridge.

    Book  Google Scholar 

  • Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.

    Article  Google Scholar 

  • Hough, G., & Swanson, D. (2006). An evaluation of the American community survey: Results from the oregon test site. Population Research and Policy Review, 25, 257–273.

    Google Scholar 

  • Jarosz, B. (2008). Using assessor parcel data to maintain housing unit counts for small area population estimates. In S. Murdock & D. Swanson (Eds.), Applied demography in the 21 st century. (pp. 89–101). New York: Springer.

  • Judson, D., & Popoff, C. A. (2004). Selected general methods. In J. S. Siegel & D. Swanson (Eds.), The methods and materials of demography (pp. 644–675). New York: Springer.

  • Jung, I., Kuldorff, M., & Klassen, A. (2007). A spatial scan statistic for ordinal data. Statistics in Medicine, 26, 1594–1607.

    Article  Google Scholar 

  • Karimi, H. A., & Durcik, M. (2004). Evaluation of uncertainties associated with geocoding techniques. Computer-Aided Civil and Infrastructural Engineering, 19, 170–185.

    Article  Google Scholar 

  • Kuldorff, M. (1997). A spatial scan statistic. Communication in Statistics: Theory and Methods, 26, 1481–1496.

    Article  Google Scholar 

  • Kuldorff, M. (1999). An isotonic spatial scan statistic for geographical disease surveillance. Journal of the National Institute of Public Health, 48, 94–101.

    Google Scholar 

  • Kuldorff, M., Heffernan, R., Hartman, J., Assuncao, R. M., & Mostashari, F. (2005). A space-time permutation scan statistic for the early detection of disease outbreaks. PloS Medicine, 2, 216–224.

    Article  Google Scholar 

  • Kuldorff, M., & Nagarwala, N. (1995). Spatial disease clusters: Detection and inference. Statistics in Medicine, 14, 799–810.

    Article  Google Scholar 

  • Le Sage, J., & Pace, K. R. (2004). Models for spatially-dependent missing data. Journal of Real Estate Finance and Economics, 29(2), 233–254.

    Article  Google Scholar 

  • Little, R., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.

    Google Scholar 

  • Little, R., & Schenker, N. (1994). Missing data. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook for statistical modeling in the social and behavioral sciences (pp. 39–75). New York: Plenum.

    Google Scholar 

  • Long, J. C. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks: Sage.

    Google Scholar 

  • Lunn, D., Simpson, S., Diamond, I., & Middleton, L. (1998). The accuracy of age-specific population estimates for small areas in Britain. Population Studies, 52, 327–344.

    Article  Google Scholar 

  • Murdock, S., & Ellis, D. (1991). Applied demography: An introduction to basic concepts, methods, and data. Boulder: Westview Press.

    Google Scholar 

  • National Research Council. (1980). Estimating population and income of small areas. Washington, DC: National Academy Press.

    Google Scholar 

  • National Research Council. (2010). Coverage measurement in the 2010 Census. In Robert Bell & Michael Cohen (Eds.). Washington DC: National Academies of Science.

  • Naus, J. L. (1965). Clustering of random points in two dimensions. Biometrika, 52, 263–267.

    Article  Google Scholar 

  • Neill, D. B. (2009). An empirical comparison of spatial scan statistics for outbreak detection. International Journal of Health Geographics, 8(20), 1–16.

    Google Scholar 

  • Neter, J., Kutner, M., Wasserman, M., & Nachtshem, C. (1999). Applied linear statistical models (4th ed.). New York: McGraw-Hill.

    Google Scholar 

  • Oliver, M. N. (2005). Geographic bias related to geocoding in epidemiologic studies. International Journal of Health Geographics. 4(29):Online.

    Google Scholar 

  • Perrone, S. (2008). Address coverage improvement and evaluation program—2005 National estimate for coverage of the master address file. In S. H. Murdock & D. Swanson (Eds.), Applied demography in the 21 st century (pp. 37–85). New York: Springer.

    Chapter  Google Scholar 

  • Pollack, L. A., Gotway, C. A., Bates, J. H., Parihk-Patel, A., Richards, T., Seef, L. C., et al. (2006). Use of the spatial scan statistic to identify geographic variations in late-stage colorectal cancer in California (United States). Cancer Causes and Control, 17, 449–457.

    Article  Google Scholar 

  • Ratcliffe, J. H. (2001). On the accuracy of Tiger-type geocoded address data in relation to cadastral and census area units. International Journal of Geographical Information Science, 15(5), 473–485.

    Article  Google Scholar 

  • Ruan, X. M., Alcantara, A., Baker, J. (2008). Potential and pitfalls of geocoding for spatial demography and population estimates. The Map Legend. December.

  • Rushton, G. (2006). Geocoding in cancer research: A review. American Journal of Preventitive Medicine, 30(2S), S16–S24.

    Article  Google Scholar 

  • Samford, M. R. (1967). On sampling without replacement with unequal probabilities of selection. Biometrika, 45, 499–513.

    Google Scholar 

  • Shahidullah, M., & Flotow, M. (2005). Criteria for selecting a suitable method for producing post-2000 county population estimates: A case study of population estimates in Illinois. Population Research and Policy Review, 24, 215–229.

    Article  Google Scholar 

  • Smith, S., & Mandell, M. (1984). A comparison of population estimation methods: Housing unit versus component II, ratio correlation, and administrative records. Journal of the American Statistical Association, 79(386), 282–289.

    Article  Google Scholar 

  • Smith, S., & Shahidullah, M. (1995). An evaluation of projection errors for census tracts. Journal of the American Statistical Association, 90(429), 64–71.

    Article  Google Scholar 

  • Smith, S., Tayman, J., & Swanson, D. (1999). State and local population projections: Methodology and analysis. New York: Plenum.

    Google Scholar 

  • Sprott, J. C. (2004). A method for approximating missing data in spatial patterns. Computers and Graphics, 28, 113–117.

    Article  Google Scholar 

  • Starcynik, D., & Zitter, M. (1968). Accuracy of the housing unit method in preparing population estimates for cities. Demography, 5, 475–484.

    Google Scholar 

  • Swanson, D., & Pol, L. (2005). Contemporary developments in applied demography within the United States. Journal of Applied Sociology, 21(2), 26–56.

    Google Scholar 

  • Tayman, J., & Swanson, D. (1999). On the validity of MAPE as a measure of population forecast accuracy. Population Research and Policy Review, 18, 299–322.

    Article  Google Scholar 

  • Turnbull, B. W., Iwano, E. J., Burnett, W. S., Howe, H. L., & Clark, L. C. (1990). Monitoring for clustering of disease: Application to leukemia incidence in upstate New York. American Journal of Epidemiology, 67, 425–428.

    Google Scholar 

  • Voss, P. (2007). Demography as a spatial social science. Population Research and Policy Review, 26, 457–476.

    Google Scholar 

  • Wallenstein, S., Naus, J., & Glas, J. (1993). Power of the scan statistic for detection of clustering. Statistics in Medicine, 12, 1829–1843.

    Article  Google Scholar 

  • Weinstock, M. A. (1981). A generalized scan statistic test for the detection of clusters. International Journal of Epidemiology, 10, 289–293.

    Article  Google Scholar 

  • Witmer, J. A., & Samuels, M. L. (1998). Statistics for the life sciences. New York: Sinauer.

    Google Scholar 

  • Zandbergen, P. (2009). Geocoding quality and implications for spatial analysis. The Geography Compass, 3(2), 647–680.

    Google Scholar 

  • Zandbergen, P., & Ignizio, D. (2010). Comparison of dasymetric mapping techniques for small area population estimates. Cartography and Geographic Information Science, 37(3), 199–214.

    Article  Google Scholar 

  • Zhang, J., & Yu, K. F. (1998). What’s the relative risk? A method for correcting the odds ratio in cohort studies of common outcomes. Journal of the American Medical Association, 280(19), 1690–1691.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jack Baker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baker, J., Alcantara, A., Ruan, X. et al. The impact of incomplete geocoding on small area population estimates. J Pop Research 29, 91–112 (2012). https://doi.org/10.1007/s12546-011-9077-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12546-011-9077-y

Keywords

Navigation