Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping

Lee, Seungwoo; Lee, Gary Geunbae

doi:10.1007/11562214_58

Seungwoo Lee²² &
Gary Geunbae Lee²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

International Conference on Natural Language Processing

1572 Accesses
4 Citations

Abstract

One of issues in the bootstrapping for named entity recognition is how to control annotation errors introduced at every iteration. In this paper, we present several heuristics for reducing such errors using external resources such as WordNet, encyclopedia and Web documents. The bootstrapping is applied for identifying and classifying fine-grained geographic named entities, which are useful for applications such as information extraction and question answering, as well as standard named entities such as PERSON and ORGANIZATION. The experiments show the usefulness of the suggested heuristics and the learning curve evaluated at each bootstrapping loop. When our approach was applied to a newspaper corpus, it could achieve 87 F1 value, which is quite promising for the fine-grained named entity recognition task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Benchmark of Nested Named Entity Recognition Approaches in Historical Structured Documents

NECKAr: A Named Entity Classifier for Wikidata

Bootstrapping and Rule-Based Model for Recognizing Vietnamese Named Entity

References

Chinchor, N., Brown, E., Ferro, L., Robinson, P.: 1999 Named Entity Recognition Task Definition, version 1.4 (1999), http://www.nist.gov/speech/tests/ieer/er_99/doc/ne99_taskdef_v1_4.pdf
Gale, W.A., Church, K.W., Yarowsky, D.: One Sense Per Discourse. In: Proceedings of the 4th DARPA Speech and Natural Language Workshop, pp. 233–237 (1992)
Google Scholar
Guthrie, J.A., Guthrie, L., Wilks, Y., Aidinejad, H.: Subject-dependent Co-occurrence and Word Sense Disambiguation. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics (ACL), Berkeley, CA, pp. 146–152 (1991)
Google Scholar
Lee, S., Lee, G.G.: A Bootstrapping Approach for Geographic Named Entity Annotation. In: Proceedings of the 2004 Conference on Asia Information Retrieval Symposium (AIRS 2004), Beijing, China, pp. 128–133 (2004)
Google Scholar
Li, H., Srihari, R.K., Niu, C., Li, W.: InfoXtract location normalization: a hybrid approach to geographic references in information extraction. In: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, Alberta, Canada, pp. 39–44 (2003)
Google Scholar
Manov, D., Kirjakov, A., Popov, B., Bontcheva, K., Maynard, D., Cunningham, H.: Experiments with geographic knowledge for information extraction. In: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, Alberta, Canada, pp. 1–9 (2003)
Google Scholar
Miller, G.A.: WordNet: A lexical database for English. Communications of the ACM 38, 39–41 (1995)
Article Google Scholar
Niwa, Y., Nitta, Y.: Co-occurrence Vectors from Corpora vs Distance Vectors from Dictionaries. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING 1994), Kyoto, Japan, pp. 304–309 (1994)
Google Scholar
Phillips, W., Riloff, E.: Exploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), Philadelphia, PA, pp. 125–132 (2002)
Google Scholar
Shin, S., Soek Choi, Y., Choi, K.S.: Word Sense Disambiguation Using Vectors of Co-occurrence Information. In: Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS 2001), Tokyo, Japan, pp. 49–55 (2001)
Google Scholar
Uryupina, O.: Semi-supervised learning of geographical gazetteers from the internet. In: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, Alberta, Canada, pp. 18–25 (2003)
Google Scholar
Yangarber, R., Lin, W., Grishman, R.: Unsupervised Learning of Generalized Names. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 1135–1141 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, Pohang, 790-784, Republic of Korea
Seungwoo Lee & Gary Geunbae Lee

Authors

Seungwoo Lee
View author publications
You can also search for this author in PubMed Google Scholar
Gary Geunbae Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Language Technology, Macquarie University, 2019, Sydney, NSW, Australia
Robert Dale
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
Institute for Infocomm Research, 21, Heng Mui Keng Terrace, 119613, Singapore
Jian Su
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, S., Lee, G.G. (2005). Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_58

Download citation

DOI: https://doi.org/10.1007/11562214_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Benchmark of Nested Named Entity Recognition Approaches in Historical Structured Documents

NECKAr: A Named Entity Classifier for Wikidata

Bootstrapping and Rule-Based Model for Recognizing Vietnamese Named Entity

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Benchmark of Nested Named Entity Recognition Approaches in Historical Structured Documents

NECKAr: A Named Entity Classifier for Wikidata

Bootstrapping and Rule-Based Model for Recognizing Vietnamese Named Entity

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation