Skip to main content

Ad-Hoc Georeferencing of Web-Pages Using Street-Name Prefix Trees

  • Conference paper

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 75))

Abstract

A bottleneck of constructing location-based web searches is that most web-pages do not contain any explicit geocoding such as geotags. Alternative solution can be based on ad-hoc georeferencing which relies on street addresses, but the problem is how to extract and validate the address strings from free-form text. We propose a rule-based pattern matching solution that detects address-based locations using a gazetteer and street-name prefix trees created from the gazetteer. We compare this approach against a method that doesn’t require a gazetteer (a heuristic method that assumes that street-name has a certain structure) and a method that also uses data structures created from the gazetteer in the form of street-name arrays. Experiments using our location based search engine prototype (MOPSI) for Finland and Singapore, show that the proposed prefix-tree solution is twice as fast and 10% more accurate than its rule-based alternative and 10 times faster if an array structure is used when accessing the gazetteer.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahlers, D., Boll, S.: Retrieving address-based locations from the web. In: Int. Workshop on Geographic Information Retrieval, Napa Valey, CA, pp. 27–34 (2008a)

    Google Scholar 

  2. Ahlers, D., Boll, S.: Urban Web Crawling. In: ACM Int.workshop on Location and the web., Beijing, China, vol. 300, pp. 25–32 (2008b)

    Google Scholar 

  3. Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: ACM SIGIR Conf. on Research and Development in Information Retrieval, Sheffield, UK, pp. 273–280 (2004)

    Google Scholar 

  4. Borges, K., Laender, A., Medeiros, C., Davis Jr., C.: Discovering geographic locations in web pages using urban addresses. In: ACM Workshop on Geographical Information Retrieval. Lisbon, Portugal, pp. 31–36 (2007)

    Google Scholar 

  5. Buyukkokten, O., Cho, J., Garcia-Molina, H., Gravano, L., Shivakumar, N.: Exploiting geographical location information of web pages. In: WebDB (Informal Proceedings) (1999), dbpubs.stanford.edu

  6. Cai, W., Wang, S., Jiang, Q.: Address Extraction: Extraction of Location-Based Information from the Web. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 925–937. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Can, L., Qian, Z., Xiaofeng, M., Wenyin, L.: Postal address detection from web documents. In: Web Information Retrieval and Integration. Int. Workshop on Challenges in Web Information Retrieval and Integration, pp. 40–45 (2005)

    Google Scholar 

  8. Fränti, P., Kuittinen, J., Tabarcea, A., Sakala, L.: MOPSI Location-based Search Engine: Concept, Architecture and Prototype. In: ACM Symposium on Applied Computing, Sierre, Switzerland (2010)

    Google Scholar 

  9. Gravano, L., Hatzivassiloglou, V., Lichtenstein, R.: Categorizing web queries according to geographical locality. In: Int. Conf. on Information and Knowledge Management, New Orleans, LA, pp. 325–333 (2003)

    Google Scholar 

  10. Hariharan, G., Fränti, P., Mehta, S.: Data mining for personal navigation. In: SPIE Conf. on Data Mining and Knowledge Discovery, vol. 4730, pp. 355–365 (2002)

    Google Scholar 

  11. Hill, L., Frew, J., Zheng, Q.: Geographic names: The implementation of a gazetteer in a georeferenced digital library. D-Lib Mag. 5(1) (January 1999)

    Google Scholar 

  12. Jones, C.B., Abdelmoty, A.I., Finch, D., Fu, G., Vaid, S.: The SPIRIT spatial search engine: Architecture, ontologies and spatial indexing. LNCS. Springer, Heidelberg (2004)

    Google Scholar 

  13. Kuittinen, J.: Using location information in search engines. MSc thesis, Univ. of Joensuu (2006) (in Finnish)

    Google Scholar 

  14. Lee, H.C., Liu, H., Miller, R.J.: Geographically-Sensitive Link Analysis. In: IEEE/WIC/ACM Int. Conf. on Web Intelligence, Silicon Valley, CA, pp. 628–634 (2007)

    Google Scholar 

  15. McCurley, K.S.: Geospatial mapping and navigation of the web. In: Int. Conf. on WWW, pp. 221–229 (2001)

    Google Scholar 

  16. Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: Conf. on European Chapter of the Association for Computational Linguistics, Bergen, Norway, pp. 1–8 (1999)

    Google Scholar 

  17. Morimoto, Y., Aono, M., Houle, M.E., McCurley, K.S.: Extracting spatial knowledge from the web. In: Symposium on Applications and the Internet, pp. 326–333 (2003)

    Google Scholar 

  18. Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings. Cambridge University Press, Cambridge (2002)

    Book  MATH  Google Scholar 

  19. Silva, M.J., Martins, B., Chaves, M., Afonso, A.P., Cardoso, N.: Adding geographic scopes to web resources. Computers Environment and Urban Systems 30(4), 378–399 (2006)

    Article  Google Scholar 

  20. Souza, L.A., Davis Jr., C.A., Borges, K.A.V., Delboni, T.M., Laender, A.H.F.: The role of gazetteers in geographic knowledge discovery on the Web. In: 3rd Latin American Web Congress, vol. 9 (2005)

    Google Scholar 

  21. Viola, P., Narasimhan, M.: Learning to extract information from semi-structured text using a discriminative context free grammar. In: ACM SIGIR Conf. on Research and Development in Information Retrieval, Salvador, Brazil, pp. 330–337 (2005)

    Google Scholar 

  22. Wang, C., Xie, X., Wang, L., Lu, Y., Ma, W.Y.: Detecting geographic locations from web resources. In: Workshop on Geographic Information Retrieval, Bremen, Germany, pp. 17–24 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tabarcea, A., Hautamäki, V., Fränti, P. (2011). Ad-Hoc Georeferencing of Web-Pages Using Street-Name Prefix Trees. In: Filipe, J., Cordeiro, J. (eds) Web Information Systems and Technologies. WEBIST 2010. Lecture Notes in Business Information Processing, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22810-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22810-0_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22809-4

  • Online ISBN: 978-3-642-22810-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics