Data Mining pp 130-145 | Cite as

A Probabilistic Geocoding System Utilising a Parcel Based Address File

  • Peter Christen
  • Alan Willmore
  • Tim Churches
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3755)

Abstract

It is estimated that between 80% and 90% of governmental data collections contain address information. Geocoding – the process of assigning geographic coordinates to addresses – is becoming increasingly important in application areas that involve the analysis and mining of such data. In many cases, address records are captured and/or stored in a free-form or inconsistent manner. This fact complicates the task of accurately matching such addresses to spatially-annotated reference data. In this paper we describe a geocoding system that is based on a comprehensive high-quality geocoded national address database. It uses a learning address parser based on hidden Markov models to segment free-form addresses into components, and a rule-based matching engine to determine the best matches to the reference database.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Boulos, M.N.K.: Towards evidence-based, GIS-driven national spatial health information infrastructure and surveillance services in the United Kingdom. International Journal of Health Geographics 2004 3(1) (2004), Available online at http://www.ij-healthgeographics.com/content/3/1/1
  2. 2.
    Cayo, M.R., Talbot, T.O.: Positional error in automated geocoding of residential addresses. International Journal of Health Geographics 2(10) (2003), Available online at http://www.ij-healthgeographics.com/content/2/1/10
  3. 3.
    Christen, P., Churches, T., Hegland, M.: A Parallel Open Source Data Linkage System. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 638–647. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Churches, T., Christen, P., Lim, K., Zhu, J.X.: Preparation of name and address data for record linkage using hidden Markov models. BioMed Central Medical Informatics and Decision Making 2(9) (December 2002), Available online at http://www.biomedcentral.com/1472-6947/2/9/
  5. 5.
    Churches, T., Christen, P.: Some methods for blindfolded record linkage. BioMed Central Medical Informatics and Decision Making 4(9) (June 2004), Available online at http://www.biomedcentral.com/1472-6947/4/9/
  6. 6.
    Shearer, C.: The CRISP-DM Model: The new blueprint for data mining. Journal of Data Warehousing 5(4), 13–22 (Fall 2000)Google Scholar
  7. 7.
    Ester, M., Kriegel, H.-P., Sander, J.: Spatial Data Mining: A Database Approach. In: Scholl, M.O., Voisard, A. (eds.) SSD 1997. LNCS, vol. 1262, pp. 48–66. Springer, Heidelberg (1997)Google Scholar
  8. 8.
    Fellegi, I., Sunter, A.: A Theory for Record Linkage. Journal of the American Statistical Society (1969)Google Scholar
  9. 9.
    Hok, P.: Development of a Blind Geocoding System. Honours thesis, Department of Computer Science, Australian National University, Canberra (November 2004)Google Scholar
  10. 10.
    AutoStan and AutoMatch, User’s Manuals, MatchWare Technologies, Kennebunk, Maine (1998)Google Scholar
  11. 11.
    Centre for Epidemiology and Research, NSW Department of Health. New South Wales Mothers and Babies 2001. NSW Public Health Bull. 13(S-4) (2002)Google Scholar
  12. 12.
    O’Keefe, C.M., Yung, M., Gu, L., Baxter, R.: Privacy-Preserving Data Linkage Protocols. In: Proceedings of the Workshop on Privacy in the Electronic Society (WPES 2004), Washington, DC (October 2004)Google Scholar
  13. 13.
    Paull, D.L.: A geocoded National Address File for Australia: The G-NAF What, Why, Who and When? PSMA Australia Limited, Griffith, ACT, Australia (2003), Available online at http://www.g-naf.com.au/
  14. 14.
    Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc., San Francisco (1999)Google Scholar
  15. 15.
    Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2) (February 1989)Google Scholar
  16. 16.
    Rahm, E., Do, H.H.: Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin (2000)Google Scholar
  17. 17.
    US Federal Geographic Data Committee. Homeland Security and Geographic Information Systems – How GIS and mapping technology can save lives and protect property in post-September 11th America. Public Health GIS News and Information (52), 21–23 (May 2003)Google Scholar
  18. 18.
    Winkler, W.E.: The State of Record Linkage and Current Research Problems. RR99/03, US Bureau of the Census (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Peter Christen
    • 1
  • Alan Willmore
    • 2
  • Tim Churches
    • 2
  1. 1.Department of Computer ScienceAustralian National UniversityCanberraAustralia
  2. 2.New South Wales Department of HealthCentre for Epidemiology and ResearchNorth SydneyAustralia

Personalised recommendations