Abstract
Many spatial applications related to land and titles like land use management, registering, and utility and health service providers are using postal addresses as their main or their supplementary georeferencing method. Evaluation of postal address datasets quality is important when controlling their changes due to manipulations (like add or update), comparing them, or merging them, that is one of the main strategies of developing countries like Iran, to form a unified addressing structure and database. Despite the costly and time consuming formal methods of postal addresses qualification that are based on address matching, the method proposed in this paper provides an evaluation of a postal address quality not requiring any preprocessing like standardization or ancillary data like streets and their addressing scheme data. The proposed method is based on measuring the autocorrelation of a postal address dataset content where higher level of autocorrelation indicates more standardization and less spatial sparsity of the addresses. The method processes the adjacency graph formed measuring Damerau–Levenstein distance between records of a postal address dataset. Evaluation of 5 statistics for 4 postal address datasets of Tehran City of Iran shows that the cumulative frequency of values and the maximum size of the components (sub-graphs) in the adjacency graph could be used. These statistics both show stable S-Shaped patterns that their threshold at the first extremum of their second derivative represents the desired quality of a postal address dataset. The results show that the measured threshold of postal address dataset corresponds with its topological structure of the streets that cover its addresses. The method can define characteristics of a standard address structure for one or more postal address datasets as the results propose 5 components for the standard address of the evaluated datasets which is the same as the number of components defined for Iranian national structure of postal addresses.
Similar content being viewed by others
References
Christen, P. (2006). A comparison of personal name matching: Techniques and practical issues. Australia: The Australian National University.
Christen, P. (2012). Data matching: Concepts and techniques for record linkage, entity resolution, and duplicate detection. Berlin: Springer.
Christen, P., & Belacic, D. (2005). Automated probabilistic address standardisation and verification. Paper presented at the Australasian Data Mining Conference (AusDM’05), Sydney, Australia, December, 2005.
Coetzee, S., & Rademeyer, M. (2009). Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place name. Paper presented at the 24th international cartographic conference, Santiago, Chile, 15–21 November 2009.
Damerau, F. J. (1964). A technique for computer detection and correction of spelling errors. Communications of the ACM,7(3), 171–176.
Davis, C. A., Fonseca, F. T., & Borges, K. A. D. V. (2003). A flexible addressing system for approximate geocoding. Paper presented at the GeoInfo2003.
Drummond, W. J. (1995). Address matching: GIS technology for mapping human activity patterns. Journal of the American Planning Association,61(2), 240–251.
ESRI. (2010). Customizing Locators in ArcGIS 10. https://egis3.lacounty.gov/eGIS/wp-content/uploads/2011/05/Customizing-Locators-in-ArcGIS-10.pdf.
Goldberg, D. W. (2008). A geocoding best practices guide. Springfield, IL: North American Association of Central Cancer Registries.
Goldberg, D. W., Wilson, J. P., Cockburn, M. G. (2010). Toward quantitative geocode accuracy metrics. In N. J. Tate & P. F. Fisher (Eds.), Proceedings of the ninth international symposium on spatial accuracy assessment in natural resources and environmental sciences, Leicester, United Kingdom, July 20–23, 2010.
Google Maps. (2018). Tehran map. Retrieved from https://www.google.com/maps/@35.6911917,51.355276,11z. Accessed Aug 2018.
IPTT. (2018). Introduction of Iranian GNAF from http://gnaf.post.ir/. Accessed July 2018.
Longley, P. A., Goodchild, M. F., Maguire, D. J., & Rhind, D. W. (2005). Geographical information systems and science. Chichester: Wiley.
McDonald, Y. J., Schwind, M., Goldberg, D. W., Lampley, A., & Wheeler, C. M. (2017). An analysis of the process and results of manual geocoding correction. Geospatial Health,12(1), 84–89.
Patman, F., & Shaefer, L. (2001). Is Soundex good enough for you? On the hidden risks of Soundex-based name searching. Herndon: Language Analysis Systems Inc.
Paull, D. (2003). A geocoded national address file for Australia: The GNAF what, why, who and when. Canberra: PSMA Australia.
PSMA. (2018). The foundation geocoded address database for Australian businesses and governments. From https://www.psma.com.au/products/g-naf.
Snae, C., & Bruckner, M. (2009). Novel phonetic name matching algorithm with a statistical ontology for analysing names given in accordance with thai astrology. Issues in Informing Science and Information Technology,6, 497–515. https://doi.org/10.28945/3347.
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature,393, 440–442. https://doi.org/10.1038/30918.
Zandbergen, P. A. (2009). Geocoding quality and implications for spatial analysis. Geography Compass,3(2), 647–680.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rezayan, H., Sadidi, J. & Hosseini, V. Quality evaluation of postal address datasets measuring their autocorrelation. GeoJournal 84, 1617–1625 (2019). https://doi.org/10.1007/s10708-018-9940-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10708-018-9940-x