The impact of incomplete geocoding on small area population estimates

Baker, Jack; Alcantara, Adelamar; Ruan, Xiaomin; Watkins, Kendra

doi:10.1007/s12546-011-9077-y

The impact of incomplete geocoding on small area population estimates

Published: 23 December 2011

Volume 29, pages 91–112, (2012)
Cite this article

Journal of Population Research Aims and scope Submit manuscript

Jack Baker¹,
Adelamar Alcantara²,
Xiaomin Ruan¹ &
…
Kendra Watkins³

155 Accesses
6 Citations
Explore all metrics

Abstract

Small-area population estimates are often made using geocoded address data in conjunction with the housing-unit method. Previous research, however, suggests that these data are subject to systematic incompleteness that biases estimates of race, ethnicity, and other important demographic characteristics. This incompleteness is driven largely by an inability to complete georeference address-based datasets. Given these challenges, small-area demographers need further, and to date largely unavailable, information on the amount of error typically introduced by using incompletely geocoded data to estimate population. More specifically, we argue that applied demographers should like to know if these errors are statistically significant, spatially patterned, or systematically related to specific population characteristics. This paper evaluates the impact of incomplete geocoding on accuracy in small-area population estimates, using a Vintage 2000 set of block-group estimates of the household population for the Albuquerque, NM metro area. Precise estimates of the impact of incomplete geocoding on the accuracy of estimates are made, associations with specific demographic characteristics are considered, and a simple potential remediation based on Horvitz-Thompson theory is presented. The implications of these results for the practice of applied demography are reviewed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A typology of U.S. metropolises by rent burden and its major drivers

Article 06 June 2023

The Relationship of Historical Redlining with Present-Day Neighborhood Environmental and Health Outcomes: A Scoping Review and Conceptual Model

Article 01 August 2022

Social Inequality and Spatial Segregation in Cape Town

References

Aschengrau, A., & Seage, G. (2003). Essentials of epidemiology in public health (2nd ed.). Sudbury: Jones-Bartlett.
Google Scholar
Baer, W. C. (1990). Aging of the housing stock and components of inventory change. In D. Myers (Ed.), Housing demography: Linking demographic structure and housing markets (pp. 249–273). Madison: University of Wisconsin.
Google Scholar
Baker, J. (2010). Estimating New Mexico municipalities: The devil is in the details (of data). New Mexico Business: Current Report. August.
Belsley, D. A., Kuh, E., & Welch, R. (1980). Regression diagnostics: Identifying influential data and source of collinearity. New York: Wiley.
Book Google Scholar
Berke, O. (2005). Exploratory spatial relative risk mapping. Preventative Veterinary Medicine, 71, 173–182.
Article Google Scholar
Boscoe, F. P., McLaughlin, C., Shymura, M. J., & Kelb, C. L. (2003). Visualization for the spatial scan statistic using nested circles. Health and Place, 9, 273–277.
Article Google Scholar
Brown, W. (2008). Changes to the housing unit stock: Loss of housing units. Presentation at the New York State Data Center Affiliate Meeting. May 15, 2008. New York: West Point.
Google Scholar
Bryan, T. (2000). US Census Bureau population estimates and evaluations with loss functions. Statistics in Transition, 4(4), 537–548.
Google Scholar
Bryan, T. (2004). Population estimates. In J. Siegel & D. Swanson (Eds.), The methods and materials of demography. New York: Springer.
Google Scholar
Casella, G., & George, E. (1992). Explaining the Gibbs sampler. The American Statistician, 46(3), 167–174.
Article Google Scholar
Christensen, R. (1996). Log-linear models and logistic regression (2nd ed.). New York: Springer.
Google Scholar
Coleman, J. S. (1964). Introduction to mathematical sociology. New York: Free Press.
Google Scholar
De Bruin, S., & Bregt, A. (2001). Assessing fitness for use: The expected value of spatial datasets. International Journal of Geographical Information Science, 15(5), 457–471.
Article Google Scholar
Drummond, W. J. (1995). Address matching: GIS technology for mapping human activity patterns. Journal of the American Planning Association, 61(2), 240–251.
Article Google Scholar
Dwass, M. (1957). Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28, 181–187.
Article Google Scholar
ESRI. (2009). Creating a composite address locator. Online at: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00250000003r000000.htm.
Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically-weighted regression: The analysis of spatially-varying relationships. West Sussex: Wiley.
Google Scholar
Gilboa, S. M. (2006). Comparison of residential geocoding methods in a population-based study of air quality and birth defects. Environmental Research, 101, 256–262.
Article Google Scholar
Goldberg, D. W., Wilson, J. P., & Knoblock, C. A. (2007). From text to geographic coordinates: The current state of geocoding. URISA Journal, 19(1), 33–46.
Google Scholar
Haining, R. (2003). Spatial data analysis: Theory and practice. New York: Cambridge.
Book Google Scholar
Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.
Article Google Scholar
Hough, G., & Swanson, D. (2006). An evaluation of the American community survey: Results from the oregon test site. Population Research and Policy Review, 25, 257–273.
Google Scholar
Jarosz, B. (2008). Using assessor parcel data to maintain housing unit counts for small area population estimates. In S. Murdock & D. Swanson (Eds.), Applied demography in the 21 ^st century. (pp. 89–101). New York: Springer.
Judson, D., & Popoff, C. A. (2004). Selected general methods. In J. S. Siegel & D. Swanson (Eds.), The methods and materials of demography (pp. 644–675). New York: Springer.
Jung, I., Kuldorff, M., & Klassen, A. (2007). A spatial scan statistic for ordinal data. Statistics in Medicine, 26, 1594–1607.
Article Google Scholar
Karimi, H. A., & Durcik, M. (2004). Evaluation of uncertainties associated with geocoding techniques. Computer-Aided Civil and Infrastructural Engineering, 19, 170–185.
Article Google Scholar
Kuldorff, M. (1997). A spatial scan statistic. Communication in Statistics: Theory and Methods, 26, 1481–1496.
Article Google Scholar
Kuldorff, M. (1999). An isotonic spatial scan statistic for geographical disease surveillance. Journal of the National Institute of Public Health, 48, 94–101.
Google Scholar
Kuldorff, M., Heffernan, R., Hartman, J., Assuncao, R. M., & Mostashari, F. (2005). A space-time permutation scan statistic for the early detection of disease outbreaks. PloS Medicine, 2, 216–224.
Article Google Scholar
Kuldorff, M., & Nagarwala, N. (1995). Spatial disease clusters: Detection and inference. Statistics in Medicine, 14, 799–810.
Article Google Scholar
Le Sage, J., & Pace, K. R. (2004). Models for spatially-dependent missing data. Journal of Real Estate Finance and Economics, 29(2), 233–254.
Article Google Scholar
Little, R., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
Google Scholar
Little, R., & Schenker, N. (1994). Missing data. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook for statistical modeling in the social and behavioral sciences (pp. 39–75). New York: Plenum.
Google Scholar
Long, J. C. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks: Sage.
Google Scholar
Lunn, D., Simpson, S., Diamond, I., & Middleton, L. (1998). The accuracy of age-specific population estimates for small areas in Britain. Population Studies, 52, 327–344.
Article Google Scholar
Murdock, S., & Ellis, D. (1991). Applied demography: An introduction to basic concepts, methods, and data. Boulder: Westview Press.
Google Scholar
National Research Council. (1980). Estimating population and income of small areas. Washington, DC: National Academy Press.
Google Scholar
National Research Council. (2010). Coverage measurement in the 2010 Census. In Robert Bell & Michael Cohen (Eds.). Washington DC: National Academies of Science.
Naus, J. L. (1965). Clustering of random points in two dimensions. Biometrika, 52, 263–267.
Article Google Scholar
Neill, D. B. (2009). An empirical comparison of spatial scan statistics for outbreak detection. International Journal of Health Geographics, 8(20), 1–16.
Google Scholar
Neter, J., Kutner, M., Wasserman, M., & Nachtshem, C. (1999). Applied linear statistical models (4th ed.). New York: McGraw-Hill.
Google Scholar
Oliver, M. N. (2005). Geographic bias related to geocoding in epidemiologic studies. International Journal of Health Geographics. 4(29):Online.
Google Scholar
Perrone, S. (2008). Address coverage improvement and evaluation program—2005 National estimate for coverage of the master address file. In S. H. Murdock & D. Swanson (Eds.), Applied demography in the 21 ^st century (pp. 37–85). New York: Springer.
Chapter Google Scholar
Pollack, L. A., Gotway, C. A., Bates, J. H., Parihk-Patel, A., Richards, T., Seef, L. C., et al. (2006). Use of the spatial scan statistic to identify geographic variations in late-stage colorectal cancer in California (United States). Cancer Causes and Control, 17, 449–457.
Article Google Scholar
Ratcliffe, J. H. (2001). On the accuracy of Tiger-type geocoded address data in relation to cadastral and census area units. International Journal of Geographical Information Science, 15(5), 473–485.
Article Google Scholar
Ruan, X. M., Alcantara, A., Baker, J. (2008). Potential and pitfalls of geocoding for spatial demography and population estimates. The Map Legend. December.
Rushton, G. (2006). Geocoding in cancer research: A review. American Journal of Preventitive Medicine, 30(2S), S16–S24.
Article Google Scholar
Samford, M. R. (1967). On sampling without replacement with unequal probabilities of selection. Biometrika, 45, 499–513.
Google Scholar
Shahidullah, M., & Flotow, M. (2005). Criteria for selecting a suitable method for producing post-2000 county population estimates: A case study of population estimates in Illinois. Population Research and Policy Review, 24, 215–229.
Article Google Scholar
Smith, S., & Mandell, M. (1984). A comparison of population estimation methods: Housing unit versus component II, ratio correlation, and administrative records. Journal of the American Statistical Association, 79(386), 282–289.
Article Google Scholar
Smith, S., & Shahidullah, M. (1995). An evaluation of projection errors for census tracts. Journal of the American Statistical Association, 90(429), 64–71.
Article Google Scholar
Smith, S., Tayman, J., & Swanson, D. (1999). State and local population projections: Methodology and analysis. New York: Plenum.
Google Scholar
Sprott, J. C. (2004). A method for approximating missing data in spatial patterns. Computers and Graphics, 28, 113–117.
Article Google Scholar
Starcynik, D., & Zitter, M. (1968). Accuracy of the housing unit method in preparing population estimates for cities. Demography, 5, 475–484.
Google Scholar
Swanson, D., & Pol, L. (2005). Contemporary developments in applied demography within the United States. Journal of Applied Sociology, 21(2), 26–56.
Google Scholar
Tayman, J., & Swanson, D. (1999). On the validity of MAPE as a measure of population forecast accuracy. Population Research and Policy Review, 18, 299–322.
Article Google Scholar
Turnbull, B. W., Iwano, E. J., Burnett, W. S., Howe, H. L., & Clark, L. C. (1990). Monitoring for clustering of disease: Application to leukemia incidence in upstate New York. American Journal of Epidemiology, 67, 425–428.
Google Scholar
Voss, P. (2007). Demography as a spatial social science. Population Research and Policy Review, 26, 457–476.
Google Scholar
Wallenstein, S., Naus, J., & Glas, J. (1993). Power of the scan statistic for detection of clustering. Statistics in Medicine, 12, 1829–1843.
Article Google Scholar
Weinstock, M. A. (1981). A generalized scan statistic test for the detection of clusters. International Journal of Epidemiology, 10, 289–293.
Article Google Scholar
Witmer, J. A., & Samuels, M. L. (1998). Statistics for the life sciences. New York: Sinauer.
Google Scholar
Zandbergen, P. (2009). Geocoding quality and implications for spatial analysis. The Geography Compass, 3(2), 647–680.
Google Scholar
Zandbergen, P., & Ignizio, D. (2010). Comparison of dasymetric mapping techniques for small area population estimates. Cartography and Geographic Information Science, 37(3), 199–214.
Article Google Scholar
Zhang, J., & Yu, K. F. (1998). What’s the relative risk? A method for correcting the odds ratio in cohort studies of common outcomes. Journal of the American Medical Association, 280(19), 1690–1691.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Geospatial and Population Studies, University of New Mexico, MSC06 3510, Albuquerque, NM, 871331, USA
Jack Baker & Xiaomin Ruan
Department of Geography, Geospatial and Population Studies, University of New Mexico, Albuquerque, NM, USA
Adelamar Alcantara
Mid-Region Council of Governments, Albuquerque, NM, USA
Kendra Watkins

Authors

Jack Baker
View author publications
You can also search for this author in PubMed Google Scholar
Adelamar Alcantara
View author publications
You can also search for this author in PubMed Google Scholar
Xiaomin Ruan
View author publications
You can also search for this author in PubMed Google Scholar
Kendra Watkins
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jack Baker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baker, J., Alcantara, A., Ruan, X. et al. The impact of incomplete geocoding on small area population estimates. J Pop Research 29, 91–112 (2012). https://doi.org/10.1007/s12546-011-9077-y

Download citation

Published: 23 December 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s12546-011-9077-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The impact of incomplete geocoding on small area population estimates

Abstract

Access this article

Similar content being viewed by others

A typology of U.S. metropolises by rent burden and its major drivers

The Relationship of Historical Redlining with Present-Day Neighborhood Environmental and Health Outcomes: A Scoping Review and Conceptual Model

Social Inequality and Spatial Segregation in Cape Town

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The impact of incomplete geocoding on small area population estimates

Abstract

Access this article

Similar content being viewed by others

A typology of U.S. metropolises by rent burden and its major drivers

The Relationship of Historical Redlining with Present-Day Neighborhood Environmental and Health Outcomes: A Scoping Review and Conceptual Model

Social Inequality and Spatial Segregation in Cape Town

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation