Abstract
Small-area population estimates are often made using geocoded address data in conjunction with the housing-unit method. Previous research, however, suggests that these data are subject to systematic incompleteness that biases estimates of race, ethnicity, and other important demographic characteristics. This incompleteness is driven largely by an inability to complete georeference address-based datasets. Given these challenges, small-area demographers need further, and to date largely unavailable, information on the amount of error typically introduced by using incompletely geocoded data to estimate population. More specifically, we argue that applied demographers should like to know if these errors are statistically significant, spatially patterned, or systematically related to specific population characteristics. This paper evaluates the impact of incomplete geocoding on accuracy in small-area population estimates, using a Vintage 2000 set of block-group estimates of the household population for the Albuquerque, NM metro area. Precise estimates of the impact of incomplete geocoding on the accuracy of estimates are made, associations with specific demographic characteristics are considered, and a simple potential remediation based on Horvitz-Thompson theory is presented. The implications of these results for the practice of applied demography are reviewed.
Similar content being viewed by others
References
Aschengrau, A., & Seage, G. (2003). Essentials of epidemiology in public health (2nd ed.). Sudbury: Jones-Bartlett.
Baer, W. C. (1990). Aging of the housing stock and components of inventory change. In D. Myers (Ed.), Housing demography: Linking demographic structure and housing markets (pp. 249–273). Madison: University of Wisconsin.
Baker, J. (2010). Estimating New Mexico municipalities: The devil is in the details (of data). New Mexico Business: Current Report. August.
Belsley, D. A., Kuh, E., & Welch, R. (1980). Regression diagnostics: Identifying influential data and source of collinearity. New York: Wiley.
Berke, O. (2005). Exploratory spatial relative risk mapping. Preventative Veterinary Medicine, 71, 173–182.
Boscoe, F. P., McLaughlin, C., Shymura, M. J., & Kelb, C. L. (2003). Visualization for the spatial scan statistic using nested circles. Health and Place, 9, 273–277.
Brown, W. (2008). Changes to the housing unit stock: Loss of housing units. Presentation at the New York State Data Center Affiliate Meeting. May 15, 2008. New York: West Point.
Bryan, T. (2000). US Census Bureau population estimates and evaluations with loss functions. Statistics in Transition, 4(4), 537–548.
Bryan, T. (2004). Population estimates. In J. Siegel & D. Swanson (Eds.), The methods and materials of demography. New York: Springer.
Casella, G., & George, E. (1992). Explaining the Gibbs sampler. The American Statistician, 46(3), 167–174.
Christensen, R. (1996). Log-linear models and logistic regression (2nd ed.). New York: Springer.
Coleman, J. S. (1964). Introduction to mathematical sociology. New York: Free Press.
De Bruin, S., & Bregt, A. (2001). Assessing fitness for use: The expected value of spatial datasets. International Journal of Geographical Information Science, 15(5), 457–471.
Drummond, W. J. (1995). Address matching: GIS technology for mapping human activity patterns. Journal of the American Planning Association, 61(2), 240–251.
Dwass, M. (1957). Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28, 181–187.
ESRI. (2009). Creating a composite address locator. Online at: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00250000003r000000.htm.
Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically-weighted regression: The analysis of spatially-varying relationships. West Sussex: Wiley.
Gilboa, S. M. (2006). Comparison of residential geocoding methods in a population-based study of air quality and birth defects. Environmental Research, 101, 256–262.
Goldberg, D. W., Wilson, J. P., & Knoblock, C. A. (2007). From text to geographic coordinates: The current state of geocoding. URISA Journal, 19(1), 33–46.
Haining, R. (2003). Spatial data analysis: Theory and practice. New York: Cambridge.
Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.
Hough, G., & Swanson, D. (2006). An evaluation of the American community survey: Results from the oregon test site. Population Research and Policy Review, 25, 257–273.
Jarosz, B. (2008). Using assessor parcel data to maintain housing unit counts for small area population estimates. In S. Murdock & D. Swanson (Eds.), Applied demography in the 21 st century. (pp. 89–101). New York: Springer.
Judson, D., & Popoff, C. A. (2004). Selected general methods. In J. S. Siegel & D. Swanson (Eds.), The methods and materials of demography (pp. 644–675). New York: Springer.
Jung, I., Kuldorff, M., & Klassen, A. (2007). A spatial scan statistic for ordinal data. Statistics in Medicine, 26, 1594–1607.
Karimi, H. A., & Durcik, M. (2004). Evaluation of uncertainties associated with geocoding techniques. Computer-Aided Civil and Infrastructural Engineering, 19, 170–185.
Kuldorff, M. (1997). A spatial scan statistic. Communication in Statistics: Theory and Methods, 26, 1481–1496.
Kuldorff, M. (1999). An isotonic spatial scan statistic for geographical disease surveillance. Journal of the National Institute of Public Health, 48, 94–101.
Kuldorff, M., Heffernan, R., Hartman, J., Assuncao, R. M., & Mostashari, F. (2005). A space-time permutation scan statistic for the early detection of disease outbreaks. PloS Medicine, 2, 216–224.
Kuldorff, M., & Nagarwala, N. (1995). Spatial disease clusters: Detection and inference. Statistics in Medicine, 14, 799–810.
Le Sage, J., & Pace, K. R. (2004). Models for spatially-dependent missing data. Journal of Real Estate Finance and Economics, 29(2), 233–254.
Little, R., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
Little, R., & Schenker, N. (1994). Missing data. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook for statistical modeling in the social and behavioral sciences (pp. 39–75). New York: Plenum.
Long, J. C. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks: Sage.
Lunn, D., Simpson, S., Diamond, I., & Middleton, L. (1998). The accuracy of age-specific population estimates for small areas in Britain. Population Studies, 52, 327–344.
Murdock, S., & Ellis, D. (1991). Applied demography: An introduction to basic concepts, methods, and data. Boulder: Westview Press.
National Research Council. (1980). Estimating population and income of small areas. Washington, DC: National Academy Press.
National Research Council. (2010). Coverage measurement in the 2010 Census. In Robert Bell & Michael Cohen (Eds.). Washington DC: National Academies of Science.
Naus, J. L. (1965). Clustering of random points in two dimensions. Biometrika, 52, 263–267.
Neill, D. B. (2009). An empirical comparison of spatial scan statistics for outbreak detection. International Journal of Health Geographics, 8(20), 1–16.
Neter, J., Kutner, M., Wasserman, M., & Nachtshem, C. (1999). Applied linear statistical models (4th ed.). New York: McGraw-Hill.
Oliver, M. N. (2005). Geographic bias related to geocoding in epidemiologic studies. International Journal of Health Geographics. 4(29):Online.
Perrone, S. (2008). Address coverage improvement and evaluation program—2005 National estimate for coverage of the master address file. In S. H. Murdock & D. Swanson (Eds.), Applied demography in the 21 st century (pp. 37–85). New York: Springer.
Pollack, L. A., Gotway, C. A., Bates, J. H., Parihk-Patel, A., Richards, T., Seef, L. C., et al. (2006). Use of the spatial scan statistic to identify geographic variations in late-stage colorectal cancer in California (United States). Cancer Causes and Control, 17, 449–457.
Ratcliffe, J. H. (2001). On the accuracy of Tiger-type geocoded address data in relation to cadastral and census area units. International Journal of Geographical Information Science, 15(5), 473–485.
Ruan, X. M., Alcantara, A., Baker, J. (2008). Potential and pitfalls of geocoding for spatial demography and population estimates. The Map Legend. December.
Rushton, G. (2006). Geocoding in cancer research: A review. American Journal of Preventitive Medicine, 30(2S), S16–S24.
Samford, M. R. (1967). On sampling without replacement with unequal probabilities of selection. Biometrika, 45, 499–513.
Shahidullah, M., & Flotow, M. (2005). Criteria for selecting a suitable method for producing post-2000 county population estimates: A case study of population estimates in Illinois. Population Research and Policy Review, 24, 215–229.
Smith, S., & Mandell, M. (1984). A comparison of population estimation methods: Housing unit versus component II, ratio correlation, and administrative records. Journal of the American Statistical Association, 79(386), 282–289.
Smith, S., & Shahidullah, M. (1995). An evaluation of projection errors for census tracts. Journal of the American Statistical Association, 90(429), 64–71.
Smith, S., Tayman, J., & Swanson, D. (1999). State and local population projections: Methodology and analysis. New York: Plenum.
Sprott, J. C. (2004). A method for approximating missing data in spatial patterns. Computers and Graphics, 28, 113–117.
Starcynik, D., & Zitter, M. (1968). Accuracy of the housing unit method in preparing population estimates for cities. Demography, 5, 475–484.
Swanson, D., & Pol, L. (2005). Contemporary developments in applied demography within the United States. Journal of Applied Sociology, 21(2), 26–56.
Tayman, J., & Swanson, D. (1999). On the validity of MAPE as a measure of population forecast accuracy. Population Research and Policy Review, 18, 299–322.
Turnbull, B. W., Iwano, E. J., Burnett, W. S., Howe, H. L., & Clark, L. C. (1990). Monitoring for clustering of disease: Application to leukemia incidence in upstate New York. American Journal of Epidemiology, 67, 425–428.
Voss, P. (2007). Demography as a spatial social science. Population Research and Policy Review, 26, 457–476.
Wallenstein, S., Naus, J., & Glas, J. (1993). Power of the scan statistic for detection of clustering. Statistics in Medicine, 12, 1829–1843.
Weinstock, M. A. (1981). A generalized scan statistic test for the detection of clusters. International Journal of Epidemiology, 10, 289–293.
Witmer, J. A., & Samuels, M. L. (1998). Statistics for the life sciences. New York: Sinauer.
Zandbergen, P. (2009). Geocoding quality and implications for spatial analysis. The Geography Compass, 3(2), 647–680.
Zandbergen, P., & Ignizio, D. (2010). Comparison of dasymetric mapping techniques for small area population estimates. Cartography and Geographic Information Science, 37(3), 199–214.
Zhang, J., & Yu, K. F. (1998). What’s the relative risk? A method for correcting the odds ratio in cohort studies of common outcomes. Journal of the American Medical Association, 280(19), 1690–1691.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Baker, J., Alcantara, A., Ruan, X. et al. The impact of incomplete geocoding on small area population estimates. J Pop Research 29, 91–112 (2012). https://doi.org/10.1007/s12546-011-9077-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12546-011-9077-y