Skip to main content

Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping

  • Conference paper
Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

Abstract

One of issues in the bootstrapping for named entity recognition is how to control annotation errors introduced at every iteration. In this paper, we present several heuristics for reducing such errors using external resources such as WordNet, encyclopedia and Web documents. The bootstrapping is applied for identifying and classifying fine-grained geographic named entities, which are useful for applications such as information extraction and question answering, as well as standard named entities such as PERSON and ORGANIZATION. The experiments show the usefulness of the suggested heuristics and the learning curve evaluated at each bootstrapping loop. When our approach was applied to a newspaper corpus, it could achieve 87 F1 value, which is quite promising for the fine-grained named entity recognition task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Chinchor, N., Brown, E., Ferro, L., Robinson, P.: 1999 Named Entity Recognition Task Definition, version 1.4 (1999), http://www.nist.gov/speech/tests/ieer/er_99/doc/ne99_taskdef_v1_4.pdf

  2. Gale, W.A., Church, K.W., Yarowsky, D.: One Sense Per Discourse. In: Proceedings of the 4th DARPA Speech and Natural Language Workshop, pp. 233–237 (1992)

    Google Scholar 

  3. Guthrie, J.A., Guthrie, L., Wilks, Y., Aidinejad, H.: Subject-dependent Co-occurrence and Word Sense Disambiguation. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics (ACL), Berkeley, CA, pp. 146–152 (1991)

    Google Scholar 

  4. Lee, S., Lee, G.G.: A Bootstrapping Approach for Geographic Named Entity Annotation. In: Proceedings of the 2004 Conference on Asia Information Retrieval Symposium (AIRS 2004), Beijing, China, pp. 128–133 (2004)

    Google Scholar 

  5. Li, H., Srihari, R.K., Niu, C., Li, W.: InfoXtract location normalization: a hybrid approach to geographic references in information extraction. In: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, Alberta, Canada, pp. 39–44 (2003)

    Google Scholar 

  6. Manov, D., Kirjakov, A., Popov, B., Bontcheva, K., Maynard, D., Cunningham, H.: Experiments with geographic knowledge for information extraction. In: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, Alberta, Canada, pp. 1–9 (2003)

    Google Scholar 

  7. Miller, G.A.: WordNet: A lexical database for English. Communications of the ACM 38, 39–41 (1995)

    Article  Google Scholar 

  8. Niwa, Y., Nitta, Y.: Co-occurrence Vectors from Corpora vs Distance Vectors from Dictionaries. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING 1994), Kyoto, Japan, pp. 304–309 (1994)

    Google Scholar 

  9. Phillips, W., Riloff, E.: Exploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), Philadelphia, PA, pp. 125–132 (2002)

    Google Scholar 

  10. Shin, S., Soek Choi, Y., Choi, K.S.: Word Sense Disambiguation Using Vectors of Co-occurrence Information. In: Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS 2001), Tokyo, Japan, pp. 49–55 (2001)

    Google Scholar 

  11. Uryupina, O.: Semi-supervised learning of geographical gazetteers from the internet. In: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, Alberta, Canada, pp. 18–25 (2003)

    Google Scholar 

  12. Yangarber, R., Lin, W., Grishman, R.: Unsupervised Learning of Generalized Names. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 1135–1141 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, S., Lee, G.G. (2005). Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_58

Download citation

  • DOI: https://doi.org/10.1007/11562214_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics