Abstract
A bottleneck of constructing location-based web searches is that most web-pages do not contain any explicit geocoding such as geotags. Alternative solution can be based on ad-hoc georeferencing which relies on street addresses, but the problem is how to extract and validate the address strings from free-form text. We propose a rule-based pattern matching solution that detects address-based locations using a gazetteer and street-name prefix trees created from the gazetteer. We compare this approach against a method that doesn’t require a gazetteer (a heuristic method that assumes that street-name has a certain structure) and a method that also uses data structures created from the gazetteer in the form of street-name arrays. Experiments using our location based search engine prototype (MOPSI) for Finland and Singapore, show that the proposed prefix-tree solution is twice as fast and 10% more accurate than its rule-based alternative and 10 times faster if an array structure is used when accessing the gazetteer.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ahlers, D., Boll, S.: Retrieving address-based locations from the web. In: Int. Workshop on Geographic Information Retrieval, Napa Valey, CA, pp. 27–34 (2008a)
Ahlers, D., Boll, S.: Urban Web Crawling. In: ACM Int.workshop on Location and the web., Beijing, China, vol. 300, pp. 25–32 (2008b)
Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: ACM SIGIR Conf. on Research and Development in Information Retrieval, Sheffield, UK, pp. 273–280 (2004)
Borges, K., Laender, A., Medeiros, C., Davis Jr., C.: Discovering geographic locations in web pages using urban addresses. In: ACM Workshop on Geographical Information Retrieval. Lisbon, Portugal, pp. 31–36 (2007)
Buyukkokten, O., Cho, J., Garcia-Molina, H., Gravano, L., Shivakumar, N.: Exploiting geographical location information of web pages. In: WebDB (Informal Proceedings) (1999), dbpubs.stanford.edu
Cai, W., Wang, S., Jiang, Q.: Address Extraction: Extraction of Location-Based Information from the Web. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 925–937. Springer, Heidelberg (2005)
Can, L., Qian, Z., Xiaofeng, M., Wenyin, L.: Postal address detection from web documents. In: Web Information Retrieval and Integration. Int. Workshop on Challenges in Web Information Retrieval and Integration, pp. 40–45 (2005)
Fränti, P., Kuittinen, J., Tabarcea, A., Sakala, L.: MOPSI Location-based Search Engine: Concept, Architecture and Prototype. In: ACM Symposium on Applied Computing, Sierre, Switzerland (2010)
Gravano, L., Hatzivassiloglou, V., Lichtenstein, R.: Categorizing web queries according to geographical locality. In: Int. Conf. on Information and Knowledge Management, New Orleans, LA, pp. 325–333 (2003)
Hariharan, G., Fränti, P., Mehta, S.: Data mining for personal navigation. In: SPIE Conf. on Data Mining and Knowledge Discovery, vol. 4730, pp. 355–365 (2002)
Hill, L., Frew, J., Zheng, Q.: Geographic names: The implementation of a gazetteer in a georeferenced digital library. D-Lib Mag. 5(1) (January 1999)
Jones, C.B., Abdelmoty, A.I., Finch, D., Fu, G., Vaid, S.: The SPIRIT spatial search engine: Architecture, ontologies and spatial indexing. LNCS. Springer, Heidelberg (2004)
Kuittinen, J.: Using location information in search engines. MSc thesis, Univ. of Joensuu (2006) (in Finnish)
Lee, H.C., Liu, H., Miller, R.J.: Geographically-Sensitive Link Analysis. In: IEEE/WIC/ACM Int. Conf. on Web Intelligence, Silicon Valley, CA, pp. 628–634 (2007)
McCurley, K.S.: Geospatial mapping and navigation of the web. In: Int. Conf. on WWW, pp. 221–229 (2001)
Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: Conf. on European Chapter of the Association for Computational Linguistics, Bergen, Norway, pp. 1–8 (1999)
Morimoto, Y., Aono, M., Houle, M.E., McCurley, K.S.: Extracting spatial knowledge from the web. In: Symposium on Applications and the Internet, pp. 326–333 (2003)
Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings. Cambridge University Press, Cambridge (2002)
Silva, M.J., Martins, B., Chaves, M., Afonso, A.P., Cardoso, N.: Adding geographic scopes to web resources. Computers Environment and Urban Systems 30(4), 378–399 (2006)
Souza, L.A., Davis Jr., C.A., Borges, K.A.V., Delboni, T.M., Laender, A.H.F.: The role of gazetteers in geographic knowledge discovery on the Web. In: 3rd Latin American Web Congress, vol. 9 (2005)
Viola, P., Narasimhan, M.: Learning to extract information from semi-structured text using a discriminative context free grammar. In: ACM SIGIR Conf. on Research and Development in Information Retrieval, Salvador, Brazil, pp. 330–337 (2005)
Wang, C., Xie, X., Wang, L., Lu, Y., Ma, W.Y.: Detecting geographic locations from web resources. In: Workshop on Geographic Information Retrieval, Bremen, Germany, pp. 17–24 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tabarcea, A., Hautamäki, V., Fränti, P. (2011). Ad-Hoc Georeferencing of Web-Pages Using Street-Name Prefix Trees. In: Filipe, J., Cordeiro, J. (eds) Web Information Systems and Technologies. WEBIST 2010. Lecture Notes in Business Information Processing, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22810-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-22810-0_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22809-4
Online ISBN: 978-3-642-22810-0
eBook Packages: Computer ScienceComputer Science (R0)