Skip to main content

Geospatial Data Mining on the Web: Discovering Locations of Emergency Service Facilities

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7713)


Identifying location-based information from the WWW, such as street addresses of emergency service facilities, has become increasingly popular. However, current Web-mining tools such as Google’s crawler are designed to index webpages on the Internet instead of considering location information with a smaller granularity as an indexable object. This always leads to low recall of the search results. In order to retrieve the location-based information on the ever-expanding Internet with almost-unstructured Web data, there is a need of an effective Web-mining mechanism that is capable of extracting desired spatial data on the right webpages within the right scope. In this paper, we report our efforts towards automated location-information retrieval by developing a knowledge-based Web mining tool, CyberMiner, that adopts (1) a geospatial taxonomy to determine the starting URLs and domains for the spatial Web mining, (2) a rule-based forward and backward screening algorithm for efficient address extraction, and (3) inductive-learning-based semantic analysis to discover patterns of street addresses of interest. The retrieval of locations of all fire stations within Los Angeles County, California is used as a case study.


  • Emergency service facilities
  • Web data mining
  • information extraction
  • information retrieval
  • ontology
  • inductive learning
  • location-based services

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Buyukokkten, O., Cho, J., Garcia-Molina, H., Gravano, L., Shivakumar, N.: Exploiting geographical location information of Web pages. In: Proceedings of Workshop on Web Databases (WebDB 1999) held in Conjunction with ACM SIGMOD 1999, Philadephia, Pennsylvania, USA (1999)

    Google Scholar 

  2. Cai, W., Wang, S., Jiang, Q.: Address Extraction: Extraction of Location-Based Information from the Web. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 925–937. Springer, Heidelberg (2005)

    CrossRef  Google Scholar 

  3. Chang, G., Healey, M.J., McHugh, J.A.M., Wang, J.T.L.: Mining the World Wide Web, vol. 10. Kluwer Academic Publishers, Norwell (2001)

    CrossRef  MATH  Google Scholar 

  4. Glendora: City of Glendora Government Website (2012), (last Access Date: July 27, 2012)

  5. Goodchild, M.F.: Citizens as sensors: the world of volunteered geography. Geo Journal 69, 211–221 (2007)

    Google Scholar 

  6. Gulli, A., Signorini, A.: The indexable web is more than 11.5 billion pages. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 902–903. ACM, Chiba (2005)

    CrossRef  Google Scholar 

  7. Han, J., Kamber, M.: Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  8. Kofahl, M., Wilde, E.: Location concepts for the web. In: King, I., Baeza-Yates, R. (eds.) Weaving Services and People on the World Wide Web, pp. 147–168. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  9. Loos, B., Biemann, C.: Supporting web-based address extraction with unsupervised tagging. In: Data Analysis, Machine Learning and Applications 2008, pp. 577–584 (2008)

    Google Scholar 

  10. Li, W., Goodchild, M.F., Raskin, R.: Towards geospatial semantic search: exploiting latent semantic analysis among geospatial data. International Journal of Digital Earth (2012), doi:10.1080/17538947.2012.674561

    Google Scholar 

  11. Li, W., Yang, C.W., Sun, D.: Mining geophysical parameters through decision-tree analysis to determine correlation with tropical cyclone development. Computers & Geosciences 35, 309–316 (2009)

    CrossRef  Google Scholar 

  12. Li, W., Yang, C., Zhou, B.: Internet-Based Spatial Information Retrieval. In: Shekhar, S., Xiong, H. (eds.) Encyclopedia of GIS, pp. 596–599. Springer, NYC (2008)

    CrossRef  Google Scholar 

  13. Li, W., Yang, C.W., Yang, C.J.: An active crawler for discovering geospatial Web services and their distribution pattern - A case study of OGC Web Map Service. International Journal of Geographical Information Science 24, 1127–1147 (2010)

    CrossRef  Google Scholar 

  14. Ligiane, A.S., Clodoveu Jr., A.D., Karla, A.V.B., Tiago, M.D., Alberto, H.F.L.: The Role of Gazetteers in Geographic Knowledge Discovery on the Web. In: Proceedings of the Third Latin American Web Congress, p. 157. IEEE Computer Society (2005)

    Google Scholar 

  15. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)

    Google Scholar 

  16. Rogers, J.D.: GVU 9th WWW User Survey, vol. 2012 (2012), (last Access Date: July 27, 2012)

  17. Sanjay Kumar, M., Sourav, S.B., Wee Keong, N., Ee-Peng, L.: Research Issues in Web Data Mining. In: Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery, pp. 303–312. Springer (1999)

    Google Scholar 

  18. Szalay, A., Gray, J.: Science in an exponential world. Nature 440, 413–414 (2006)

    CrossRef  Google Scholar 

  19. Taghva, K., Coombs, J., Pereda, R., Nartker, T.: Address extraction using hidden markov models. In: Proceedings of IS&T/SPIE 2005 Int. Symposium on Electronic Imaging Science and Technology, San Jose, California, pp. 119–126 (2005)

    Google Scholar 

  20. USCB: GCT-PH1 - Population, Housing Units, Area, and Density: 2010 - State - Place and (in selected states) County Subdivision (2012), (last Access Date: July 27,2012)

  21. Wray, R.: Internet data heads for 500bn gigabytes. The guardian, Vol. 2012. Guardian News and Media, London (2009), (last Access Date: July 27,2012)

  22. Yasuhiko, M., Masaki, A., Michael, E.H., Kevin, S.M.: Extracting Spatial Knowledge from the Web. In: Proceedings of the 2003 Symposium on Applications, p. 326. IEEE Computer Society (2003)

    Google Scholar 

  23. Yu, Z.: High accuracy postal address extraction from web pages. Thesis for Master of Computer Science. 61p. Dalhousie University, Halifax, Nova Scotia (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, W., Goodchild, M.F., Church, R.L., Zhou, B. (2012). Geospatial Data Mining on the Web: Discovering Locations of Emergency Service Facilities. In: Zhou, S., Zhang, S., Karypis, G. (eds) Advanced Data Mining and Applications. ADMA 2012. Lecture Notes in Computer Science(), vol 7713. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35526-4

  • Online ISBN: 978-3-642-35527-1

  • eBook Packages: Computer ScienceComputer Science (R0)