A Multi-strategy Approach to Geo-Entity Recognition

  • Ruituraj Gandhi
  • David C. Wilson
Part of the Studies in Computational Intelligence book series (SCI, volume 251)

Abstract

Geographic location or place information has become an increasingly integrated and important element in web and online interaction, which is evident in the increasing sophistication and adoption of online mapping, navigational GPS, and location-aware search. A significant proportion of online location context, however, remains implicit in primarily unstructured document text. In order to leverage this location context, such references need to be extracted into structured knowledge elements defining place. A variety of “named entity” extraction methods have been developed in order to identify unstructured location references, alongside other references such as for persons or organizations, but geographic entity extraction remains an open problem. This chapter examines a multi-strategy approach to improving the quality of geo-entity extraction. The implemented experimental framework is targeted for web data, and it provides a comparative evaluation of individual approaches and parameterizations of our multi-strategy method. Results show that the multi-strategy approach provides a significant benefit in terms of accuracy, domain independence, and adaptability.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blumberg, R., Atre, S.: The Problem with Unstructured Data. DM Review (2003)Google Scholar
  2. 2.
    Smith, D.A., Crane, G.: Disambiguating Geo- graphic Names in a Historical Digital Library. In: Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries (2001)Google Scholar
  3. 3.
    Li, H., Srihari, R., Niu, C., Li, W.: InfoXtract location normalization: A hybrid approach to geographic references in information extraction. In: Proceedings of the Workshop on the Analysis of Geographic References NAACL-HLT (2003)Google Scholar
  4. 4.
    Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (2004)Google Scholar
  5. 5.
    Rauch, E., Bukatin, M., Baker, K.: A confidence-based framework for disambiguating geographic terms. In: Proceedings of the Workshop on the Analysis of Geographic References NAACL-HLT (2003)Google Scholar
  6. 6.
    Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. Computational Linguistics 21(4), 543–565 (1995)Google Scholar
  7. 7.
    Kazama, J., Miyao, Y., Tsujii, J.: Maximum Entropy Tagger with Unsupervised Hidden Markov Models. In: Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS 2001), pp. 333–340 (2001)Google Scholar
  8. 8.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, ACL 2002 (2002)Google Scholar
  9. 9.
    Uryupina, O.: Linguistically Motivated Sample Selection for Co-reference Resolution. In: Proceedings of the 5th Discourse Anaphora and Anaphor Resolution Colloquium (2004)Google Scholar
  10. 10.
    Nadeau, D.: Balie – Baseline Information Extraction: Multilingual Information Extraction from Text with Machine Learning and Natural Language Techniques. Technical Report (2005), http://balie.sourceforge.net/dnadeau05balie.pdf
  11. 11.
    Borthwick, A.: A maximum entropy approach to named entity recognition. Ph.D. Thesis. NYU (1999)Google Scholar
  12. 12.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning (2001)Google Scholar
  13. 13.
    Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named Entity Recognition through Classifier Combination. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Ruituraj Gandhi
    • 1
  • David C. Wilson
    • 1
  1. 1.College of Computing and InformaticsUniversity of North Carolina at CharlotteCharlotte

Personalised recommendations